Fig. S1. The Actor-Critic architecture used to learn optimal control policies for the microrobot.