Combine Value and Policy
Actor-critic architecture.
A2C/A3C
Asynchronous advantage actor-critic.
PPO
Proximal policy optimization. Clipped objective.
SAC
Soft actor-critic. Entropy regularization.
Key Takeaways
- Actor-critic combines value and policy
- PPO uses clipped objective
- SAC maximizes entropy