Foundation of RL
Bellman optimality equations.
V Function
V(s) = max_a Σ P(s'|s,a)[R(s,a,s') + γV(s')]
Q Function
Q(s,a) = Σ P(s'|s,a)[R(s,a,s') + γ max_a' Q(s',a')]
Equations
Foundations for dynamic programming.
Key Takeaways
- Bellman equations for V and Q
- Optimality conditions
- DP foundations