04ReinforcementLearning3.6, Subtracting the mean reward via the value function

views comments

Policy gradient methods are much more efficient if the mean reward is subtracted. Here it is explained why you do it and how you do it.