Licheng LIU

and 11 more

Nitrous oxide (N2O) is one of the important greenhouse gases (GHGs), with its global warming potential 265 times greater than that of carbon dioxide (CO2). About 60% of the anthropogenic N2O emission is from agriculture production. To date, estimating N2O emissions from cropland remains a challenging task because the related microbial origin processes (e.g. incomplete nitrification and denitrification) are controlled by a diverse factors of climate, soil, plant and human activities. In this study, we developed a ML model with physical/biogeochemical domain knowledge, namely knowledge guided machine learning (KGML), for simulating daily N2O fluxes from the agriculture ecosystem. The Gated Recurrent Unit (GRU) was used as the basis to build the model structure. A range of ideas have been implemented to optimize the model performance, including 1) hierarchical structure based on variable causal relations, 2) intermediate variable (IMV) prediction and transfer, 3) inputting IMV initials for constraints, 4) model pretrain/retrain, and 5) multitask learning. The developed KGML was pre-trained by millions of synthetic data generated by an advanced PB model, ecosys, and then re-trained by observations from six mesocosm chambers during three growing seasons. Six other pure ML models were developed using the same data from mesocosm chambers to serve as the benchmark for the KGML model. The results show that KGML can always outperform the PB model in efficiency and ML models in prediction accuracy of capturing N2O flux magnitude and dynamics. Besides, the reasonable predictions of IMVs increase the interpretability of KGML. We believe the footprint of KGML development in this study will stimulate a new body of research on interpretable machine learning for biogeochemistry and other related geoscience processes.