DeepRL2.3C, MuZero
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning, in particular the reward and the next action.
EPFL video portal by SWITCH | Terms of service | Disclaimer | EPFL Privacy policy |