02ReinforcementLearning1.3, One-step horizon (bandit problems)
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
In the simplest scenario of Reinforcement Learning you have to choose between different actions (starting always from the same state) and you immediately get your reward. These situations are called Bandit Problems.
EPFL video portal by SWITCH | Terms of service | Disclaimer | EPFL Privacy policy |