MC for RL
Sample-based learning.
On-Policy
Monte Carlo control. Epsilon-greedy.
Off-Policy
Importance sampling. Off-policy MC.
Every-Visit/First-Visit
Different estimation methods.
Key Takeaways
- Monte Carlo RL
- On-policy vs off-policy
- Every-visit vs first-visit