04ReinforcementLearning3.2, Example: Binary actor with 1-step horizon

views comments

Policy gradient methods are best introduced with a simple example: a single neuron (=actor) with binary output: either action 1 or action 2.