Christopher O'Dell

OCO-2, launched in 2014, uses reflected solar spectra and other retrieved geophysical variables to estimate (“retrieve”) the column averaged dry air mole fraction of CO2, termed XCO2. A critical issue in satellite estimates of trace greenhouse gasses and remote sensing at large is the error distribution of an estimated target variable which arises from instrument artifacts as well as the under-determined nature of the retrieval of the quantities of interest. A large portion of the error is often incurred during inference from measurement of retrieved physical variables. These residual errors are typically corrected using ground truth observations of the target variable or some other truth proxy. Previous studies used multilinear regression to model the error distribution with a few covariates from the retrieved state vector, sometimes termed “features.” This presentation will cover the bias correction of XCO2 error attributed to retrieved covariates with a novel approach utilizing explainable Machine Learning methods (XAI) on simulated sounding retrievals from GeoCarb. Utilization of non-linear models (Zhou, Grassotti 2020) or models that can capture non-linearity implicitly (Lorente et al. 2021) have been shown to improve on linear methods in operation. Our approach uses a gradient boosted decision tree ensemble method, XGBoost, that captures non-linear relations between input features and the target variable. XGBoost also incorporates regularization to prevent overfitting, while also remaining resilient to noise and large outliers – a feature missing from other ensemble DT methods. Decision Tree based models provide inherent feature importance that allows for high interpretability. We also approach post training analysis with model agnostic, explainable methods (XAI). XAI methods allow for rigorous insight into the causes of a model’s decision (Gilpin et al. 2018). By applying these techniques, we will demonstrate our approach provides reduced residual errors relative to the operational method as well as yielding an uncertainty estimate in bias corrected XCO2, which is currently not treated separately from the posterior uncertainty estimate derived from the retrieval algorithm.