DeepRL1.3, Actor-Critic Architecture and Advantage-Actor-Critic

views comments

The standard actor-critic network (in the narrow sense) combines TD learning of the value function (critic) with policy gradient for the actor. The combination of TD learning with the actor-critic architecture is also called the advantage actor critic.