The local neighborhood sensory input is obtained by first constructing a cube of width
W=11 centering on the microrobot and aligned with its orientation and then extracting a
W ´
W ´
W binary 3D image with a pixel resolution of
U=2
a (1 when there is an overlap with an RBC and 0 otherwise). Target positions are represented in local coordinate system of the motor. RBC is modeled by biconcave shape [12] with random position and orientation. The RBCs generated in the simulation have diameter randomly sampled between 6um and 8um. We employed an approximate RBC shape to enable fast computation of sensory input and collision dynamics.
Both Actor and Critic neural networks employ 3D convolution neural layers to process local sensory information, represented by a W×W×W binary 3D image (W = 11) with a pixel resolution of U=2a.
The designed sensor for the local neighborhood has the following considerations. A large vision field allows the microrobot to detect obstacles early and take paths that avoid clashing with obstacles. A large vision field also captures the rich configuration that allows the learning of better and robust navigation strategies. However, a significant large vision field contains information not essential for local path planning, increases the learning computational cost and sensor hardware design difficulties. In terms of sensor resolution, high resolution will increase the computation cost while low resolution may disable robots to detect small trapping features of an RBC.
The central axis of the curved vessel is characterized by a 3D parameteric curve function (xc, yc, zc) given by ,where L controls the length of the vessel (e.g., L=500a). The section radius Rc of the vessel is varying around the axis line, which is given by ,where Ravg controls its average radius. We have considered two cases in Fig. 4 (H) and (I) in main text, where we use the same k1 = 0.05 a-1, k2 = 0.02 a-1, R0 = 10a, Ravg = 25a, L = 500a, but with R1 = 5a for (H) and R1 = 15a for (I).
By exploiting symmetry existing in the system, we can reuse the control policy p obtained at one set of hyperparameters (v*SP, w*max) to another hyperparameter setting (vSP, wmax) and save the re-training cost. We can write the control policy p(rt, r, p, f(s); vSP, wmax) as a function of observational variables rt, r, p, and f(s), which characterize the system state and the hyperparameters vSP and wmax, which specify the physical parameters of the microrobot.
Now we discuss two scenarios where we can reuse a control policy p obtained at one set of hyperparameters (v*SP, w*max). Since the microrobot is constantly propelling and the rotation decision w ultimately affect the trajectory’s minimum radius of curvature vsp/wmax, without loss of generality, we consider the policy mapping when Rvw ¹ R*vw, where R*vw = v*SP/w*max. As the rotational decision on w aims to proactively adjust directions, a mimicking strategy is that a microrobot with hyperparameter Rvw mimics the trajectory of the baseline microrobot with hyperparameter R*vw as much as feasible, until the rotation reaches its limitation wmax. The policy mapping in terms of magnitude of (w1, w2) under hyperparameter R can be expressed as
where wi(R*vw) is the magnitude of in-plane rotation (i = 1) and out-of-plane rotation (i = 2) from control policy learned under hyperparameter R*vw.
Now consider there is an ambient flow field (or any other external force causing a constant drift of microrobot), whose velocity is characterized by vf, that modifies the velocity of the microrobot. The deterministic velocity of the microrobot now is the sum of original propulsion velocity vsp×p and the dirft velocity vf. Equivalently, we can define a modified propulsion direction pf and the corresponding modified self-propulsion speed vSP,f via
.
We can treat a microrobot under external flow field as if a microrobot without external flow field but with a modified hyperparameter (vSP,f, wmax). Accordingly, the new control policy with flow field is now given by p (rt, r, p, f(s); vSP,f, wmax), where we can employ Eq. to reuse the policy.
We estimate the shortest path distance between arbitrary two points in the blood environment [Fig. 6 in main text] using an approximate graph algorithm. We first created a set of 3D lattice points, with step size of a in x, y, and z directions, respectively, to span the space of the test environments. We then remove lattice points that are outside the vessel or are overlapping with RBCs (we assume lattice point has a radius of a, same size as the microrobot). We then construct a weighted K-nearest neighbor graph (K=26), where each lattice point is a graph node, nodes are connected by edges if they are within the 26 nearest neighbors, and the edge weight is the distance between the connected nodes. Given a start point and a target, we associate them with nearest lattice points in the graph and then use Dijkstra algorithm to compute their shortest distance in the graph. The computed shortest path distance in the graph is used as the approximate to the shortest path distance between the query points.
In Fig. S4, we tested the policy mapping formula [Eq. (S3)] under different hyperparameter settings. The neural network is trained under one hyperparameter setting (R*vw = 1). The mapped policies are still effective under other hyperparameters Rvw = 2 and Rvw = 4. Note that here we only need to consider the case Rvw > R*vw, since Eq. (3) says that rotational decisions remain unchanged when Rvw < R*vw.
In Fig. S5, we tested the policy mapping formula [Eq. (S3) and (S4)] under different hyperparameter settings. The neural network is trained under one hyperparameter setting (Rvw = 1) and no external flow field. The mapped policies are still effective under external flow fields vf = 0.5vsp and vf = 0.8vsp.
In Fig. S6, we tested the policy mapping formula [Eq. (S3) and (S4)] under external flow field when a microrobot is navigating inside a blood vessel with RBCs. When external flow fields are small (vf £ 0.5 vsp) and RBCs are dilute (e.g., 5% ), the mapping formula can enable the microrobot to achieve targets without getting frequent traps. When external flow fields are large, microrobots can get trapped easily. This is because the existence of RBCs breaks space symmetry and therefore the correct policy in a crowded RBC environment is beyond the simple formula in Eq. (S3) and (S4).