Methods for Exploration
Balance exploration/exploitation.
Epsilon-Greedy
Random action with epsilon.
UCB
Upper confidence bound. Optimism in face of uncertainty.
Boltzmann
Softmax over Q-values.
Key Takeaways
- Epsilon-greedy baseline
- UCB for balanced exploration
- Boltzmann exploration