04ReinforcementLearning3.5, Multiple time steps
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
Policy gradient methods for problems that extend over multiple time steps are derived here.