Students: Federico Rocca, Florian Tanguy
Supervision: Tingting Ni, Kai Ren
Learned a control policy for a wheeled robot to navigate a maze while avoiding obstacles. Formulated as a constrained Markov decision process and a Lagrangian-PPO algorithm, the policy is trained in a customized simulator (shown in the video) calibrated through real-robot system identification. Transferred the policy to JetBots using OptiTrack motion caption state observations.