04ReinforcementLearning3.6, Subtracting the mean reward via the value function

From Wulfram Gerstner  

views comments