DeepRL1.4B, How do eligibility traces arise in policy gradient algorithms?
From Wulfram Gerstner
views
comments
From Wulfram Gerstner
Optimizing the return by policy-gradient in a multi-step environment naturally leads to eligibility traces. A few important mathematical steps are sketched here.
Mediaspace will be updated on Saturday, March 29th. Users may experience minor access and performance restrictions.
(close this alert with x ↓↓ )