04ReinforcementLearning3.6, Subtracting the mean reward via the value function
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
Policy gradient methods are much more efficient if the mean reward is subtracted. Here it is explained why you do it and how you do it.
EPFL video portal by SWITCH | Terms of service | Disclaimer | EPFL Privacy policy |