RL Fundamentals
Agent learns to maximize reward through interaction.
Components
Agent: learner. Environment: interacts with. Action: agent choices. State: current situation. Reward: feedback.
Value Functions
V(s): expected return from state s. Q(s,a): expected return from state s, action a. Bellman equation: V(s) = max_a [R(s,a) + γV(s')].
Algorithms
Q-learning: off-policy, learns optimal Q. Policy gradient: directly optimize policy. DQN: deep Q-network.
Key Takeaways
- RL learns from interaction
- Value functions estimate future rewards
- DQN combines Q-learning with deep learning