03ReinforcementLearning2.4, Monte-Carlo Methods for Reinforcement Learning

views comments

Whereas TD-methods exploit the Bellman equation to arrive at an estimation of Q-values or V-Values, Monte-Carlo methods directly estimate values by averaging over the empirically measured returns. The two approaches are compared.