Despite advancements in computational science, nonlinear geophysical processes still present important modeling challenges. Physical sensors (such as satellites, AUVs, or buoys) can collect data at specific points or regions, but these are often scarce or inaccurate. Here, we present a method to build improved spatio-temporal models that combine dynamics, inferred from high-fidelity numerical models (reanalysis data), and data from sensors. We are motivated by a data set of ocean temperature where sensor measurements are only available at the surface of the ocean. We first employ reanalysis data in the form of a 3D temperature field, and apply standard principal component analysis (PCA) at every ocean surface coordinate. For each coordinate, the vertical structure of the field can be represented with just two PCA modes and their corresponding time coefficients, significantly reducing the dimensionality of the data. Next, a conditionally Gaussian model, implemented through a temporal convolutional neural network, is built to predict the time coefficients of the PCA modes (i.e. vertical structure), as well as their variance, as a function of the surface temperature. These probabilistic predictions are made with the satellite data as input, and they are used with the PCA modes to stochastically reconstruct the full temperature field. The estimated temperature field is then combined with data from buoys through a multi-fidelity Gaussian process regression scheme, where the buoys have the highest fidelity and the satellite-based predictions have lower fidelity. The techniques described provide a framework for building less expensive and more accurate models of conditionally Gaussian estimates for full 3D fields, and they can be applied to geophysical systems where data from both sensors and numerical simulations are available. We implement these techniques to estimate the full 3D temperature field of the Massachusetts and Cape Cod Bay where temperature can serve as a useful indicator for ocean acidification. Finally, we discuss how the developed ideas can be leveraged to make more informed decisions about optimal in-situ sampling and path planning.