Initialize replay memory M to capacity NM Initialize actor network m with random weight qm and critic network Q with random weights qQ Initialize target actor network m′ and critic network Q′ with random weights qm′ and qQ′ For episode 1, NE do Initialize particle state s0 and target position Obtain initial observation f(s1) For n =1, maxStep do Select an action an from actor network plus additional perturbation sample from an OU process. Execute action an using simulation and observe new state sn+1 and reward r(sn+1) Generate observation state f(sn+1) at state sn+1 Store transition (f(sn), an, r(sn+1), f(sn+1)) in M Store extra hindsight experience in M every H step Sample random mini-batch transitions (f(sj), aj, r(sj+1), f(sj+1)) of size B from M Set target value |