04ReinforcementLearning3.1, First steps toward deep reinforcement learning

views comments

After a rapid review of reinforcement learning with TD methods, exploiting Q-values and V-values in deep networks, we pose the central question of this lecture series: can we learn the policy directly, i.e., without a detour via Q-values and V-values?