DeepRL2.3C, MuZero

views comments

MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning, in particular the reward and the next action.