← Back to Data Science

All Topics

Advertisement

Learn/Data Science/Machine Learning

Policy Gradient Methods

Topic: RL

Advertisement

Direct Policy Optimization

Optimize policy directly.

REINFORCE

Monte Carlo policy gradient.

Actor-Critic

Value baseline reduces variance.

Variance Reduction

Baseline. Control variates.

Key Takeaways

  1. REINFORCE algorithm
  2. Baseline reduces variance
  3. Actor-critic methods

Advertisement

Advertisement

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →