04ReinforcementLearning3.4B, Example (1-step horizon) revisited
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
With the log-likelihood trick, the online rule for the binary actor with 1-step horizon can be found rapidly. We discuss several interpretations of the resulting online rule.