Reinforcement Learning Basics

Topic: RL

RL Fundamentals

Agent learns to maximize reward through interaction.

Agent: learner. Environment: interacts with. Action: agent choices. State: current situation. Reward: feedback.

V(s): expected return from state s. Q(s,a): expected return from state s, action a. Bellman equation: V(s) = max_a [R(s,a) + γV(s')].

Q-learning: off-policy, learns optimal Q. Policy gradient: directly optimize policy. DQN: deep Q-network.

Get personalized data science help from ChatWhole's AI-powered platform.