DeepRL1.2, From Policy Gradient to Deep Reinforcement Learning
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
The REINFORCE algorithm with baseline can be implemented by a neural network with an actor-critic architecture where one set of outputs (actor) decides upon the actions and a further output models the baseline to be subtracted in policy gradient. The second output is sometimes called a critic (in the broad sense of actor-critic).