Initialize replay memory M to capacity NM Initialize actor network m with random weight qm  and critic network Q with random weights qQ Initialize target actor network mand critic network Q with random weights qm and qQ For episode 1, NE do   Initialize particle state s0 and target position   Obtain initial observation f(s1)   For n =1, maxStep do        Select an action an from actor network plus additional perturbation sample from an OU process.        Execute action an using simulation and observe new state sn+1 and reward r(sn+1)        Generate observation state f(sn+1) at state sn+1        Store transition (f(sn), an, r(sn+1), f(sn+1)) in M        Store extra hindsight experience in M every H step        Sample random mini-batch transitions (f(sj), aj, r(sj+1), f(sj+1)) of size B from M        Set target value