04ReinforcementLearning3.4B, Example (1-step horizon) revisited

views comments

With the log-likelihood trick, the online rule for the binary actor with 1-step horizon can be found rapidly. We discuss several interpretations of the resulting online rule.