04ReinforcementLearning3.2, Example: Binary actor with 1-step horizon
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
Policy gradient methods are best introduced with a simple example: a single neuron (=actor) with binary output: either action 1 or action 2.
EPFL video portal by SWITCH | Terms of service | Disclaimer | EPFL Privacy policy |