Imperfect models are often used for forecasting and state estimation of complex dynamical systems, typically by mapping a reference initial state into model phase space, making a forecast, and then mapping back to the reference space. In many cases these mappings are implicit, and forecast errors thus reflect a combination of model forecast errors and mapping errors. Techniques to infer parameterizations and parameters to reduce model bias have been the subject of intense scrutiny; however, we lack a general framework for discovering optimal mappings between system and model attractors. Here we propose a novel Machine Learning paradigm for inferring cross-attractor transformations (CATs) that minimize forecast error. CATs are pairs of transformations from the phase space of a reference system to the phase space of a model and vice versa that serve as a bridge between the attractors of a true system and an imperfect model. A computationally efficient analog approximation to tangent linear and adjoint models is developed to enable efficient stochastic gradient descent algorithms to train CAT parameters. Neural networks constructed with a custom analog-adjoint layer permit specification of affine transformations as well as more general nonlinear transformations.