MDP Fundamentals
Foundation for RL problems.
Components
States. Actions. Transitions. Rewards. γ discount factor.
Markov Property
Future depends only on current state.
Goal
Maximize expected cumulative reward.
Key Takeaways
- MDP components
- Markov property
- Maximize cumulative reward