02ReinforcementLearning1.3, One-step horizon (bandit problems)

views comments

In the simplest scenario of Reinforcement Learning you have to choose between different actions (starting always from the same state) and you immediately get your reward. These situations are called Bandit Problems.