Direct Policy Optimization
Optimize policy directly.
REINFORCE
Monte Carlo policy gradient.
Actor-Critic
Value baseline reduces variance.
Variance Reduction
Baseline. Control variates.
Key Takeaways
- REINFORCE algorithm
- Baseline reduces variance
- Actor-critic methods