ESS Meetings - Authorea

by author

by title

by keyword

Earthquake Aftershocks Pattern Prediction

Alemayehu Belay Kassa

and 4 more

February 13, 2023

Large earthquakes, especially those occurring in a city or population centers, create devastation and havoc, and often times kindle several deaths and injuries, and significant infrastructure damage that lead to several billions of dollars in losses. Marine earthquakes are the leading cause of large tsunamis which cause deaths, destruction, displacement of population, and a possible nuclear meltdown. Thus, prediction of earthquake or its aftershocks or earthquake early warning system has a great potential to mitigate the loss of life as well as different kinds of damage. Earthquake prediction would mean forecasting the occurrence of an earthquake by providing both its magnitude estimate and accurate location. Earthquake prediction has been an important area of seismology research for quite a while, and it looks like it will continue to be an important area of research. Recently, with the implementation of deep learning in seismology, scientists have been able to detect, predict, and model seismic waves and earthquake aftershocks. Earthquake aftershocks are generally triggered by changes in stress formed by large earthquakes that happen within, or surrounding a given fault network system. The main goal of this study is to investigate the improvement of aftershock pattern predictions with the implementation of tuning and optimizing of deep learning parameters. To achieve these goals, we have developed an algorithm that can help first gather mainshock-aftershock sequence data. Some of the criteria used in identifying earthquakes that initiate an aftershock is to look at earthquakes that happen within a certain radius, the values we attempted are within about 0.5 degrees range, and within a certain period, from few seconds to several weeks of the occurrence of the main shock. For the sequence identification, we have been using seismic data from the United States Geological Survey (USGS)-National Earthquake Information Center (NEIC). We are also looking at different open-source data gathered by researchers for a similar study. The deep neural networks we are implementing make use of Keras python Toolkit, and Theano and Tensorflow libraries, with a plan to use PyTorch python library instead of Theano library in the future because of some maintenance issues. To this point our attempts have shown a good progress.

Slope Mass Rating (SMR) charts for onsite classification of rock slopes

Raj Kiran Dhiman

and 1 more

December 07, 2021

The slope mass rating (SMR) method is universally used for the characterization and classification of rock slopes. SMR is calculated by reducing the value of basic rock mass rating (RMRb) by subtracting three adjustment factors F1, F2, and F3 based on the geometrical relationship between the slope and discontinuity and adding one adjustment factor F4 depending upon the excavation method used. These adjustment factors (F1, F2, and F3) are mathematical functions (continuous/discrete) that require post-processing of field data on a computer for their derivation. Less work has been done to develop the charts for the direct calculation of SMR in the field. In this paper, SMR charts are developed for the onsite classification of rock slopes. With the aid of SMR charts, an engineering geologist can easily assess the onsite SMR class of rock slopes by plotting discontinuity dip amount (plunge amount in case of wedge failure) and strike parallelism between slope dip direction and discontinuity dip direction (slope dip direction and trend direction of intersection line in case of wedge failure). Using SMR charts for any project (open-pit mines, road cut slopes, natural slopes, etc.) onsite suggestions of proper remedial and preventive measures for the rock slopes can be given, which accelerates the overall preliminary slope mass classification process. The proposed SMR charts are straightforward to use and can be adopted as useful tools for the preliminary rock slope stability assessment.

Evidences of Large pyramid-like structure predating 10,000 Year BP at Mount Padang, W...

Danny Hilman Natawidjaja

and 4 more

December 11, 2018

Mount Padang archeological site has been known since late nineteen century as a megalithic complex that sits on top. Our studies proves that the structure does not cover just the top but also wrap around the slopes covering about 15 ha area at least. Comprehensive geophysical surveys combining ground penetration radar (GPR) and multi-channel resistivity methods, seismic tomography augmented by bore-holes coring data and archeological excavations, show further that the structures are not only superficial but rooted into greater depth. The structures are not built at once but consisting several layers from consecutive periods. The uppermost layer on the surface consists of horizontal piles of basaltic columnar rocks forming step-structure terraces and decorated by exotic arrangements of stand-up rock columns forming walls, paths and spaces. The second layer, which had been previously misinterpreted as natural rock formation, buried 1-3 meters beneath the ground surface, is a several-meter thick fills consisting of more compact and advance arrangement of similar columnar rocks in fine-grain matrix. The third layer is also artificial arrangement of rock fragments with various kinds that extent down to about 15 meter deep. The third layer sits on fractured, massive basaltic lava tongue. The survey also reveals evidences of large underground cavities or chambers. Results of preliminary radiocarbon dating indicates that the first layer was built around Cal BP 3,000. The second layer was built around about Cal BP 7,000. The third layer was built prior to Cal.BP. 9,500, and could be as old as Cal.BP.13,000 to 28,000 years old.

Simplifying PlantCV workflows with multiple objects

Haley Schuhl

and 4 more

November 04, 2022

Imaging of plants using multi-camera arrays in high-density growth environments is a strategy for affordable high-throughput phenotyping. In multi-camera systems, simultaneous imaging of hundreds to thousands of plants eliminates the time delay in measurements between plants seen in plant-to-camera or camera-to-plant systems, which allows for the analysis of plant growth, development, and environmental responses at a high temporal resolution. On the other hand, high plant density, camera-to-camera variation, and other trade-offs increase the complexity of data analysis. Here we present two recent updates to the PlantCV image analysis package to improve usability when working with multi-plant datasets. First, we introduce a method to automate detection of plants organized in a grid layout, reducing the need to make separate workflows for each camera in a multi-camera system. Second, we reduced the number of input and output parameters for functions handling the shape and location of plants and introduce automatic iteration over multiple objects of interest (e.g. plants), reducing the level of programming needed to build workflows.

Self-Supervised Representation Learning for Digital Agriculture

Sudhir Sornapudi

February 07, 2023

The true bottleneck of artificial intelligence (AI) is not access to the data, but rather labeling this data. We have tons of raw agriculture image data coming from various sources and manual labelling remains to be a crucial step to keep the data well organized which requires considerable amount of time, money, and labor. This process can be made more efficient if we can automatically label the raw data. We propose contrastive learning representations for agriculture images (AgCLR) model that uses self-supervised representation learning approach on unlabeled real-world agriculture field data, to learn the useful image feature representations from the images. Contrastive learning is a self-supervised approach that enables model to learn attributes by contrasting samples against each other without the use of labels. AgCLR leverages the state-of-the-art SimCLRv2 framework to learn representations by maximizing the agreement between differently augmented views of same sample. We have incorporated critical enablers like mixed precision, multi-GPU distributed parallel computing, and use of Google Cloud's Tensor Processing Units (TPU) for optimizing the training process. We achieved 80.2% accuracy while classifying the test data. We further applied AgCLR to unrelated task to determine the alleys and rows in corn field videos for corn phenotyping and we observed two cluster formations for alleys and rows when plotted embeddings in a 3-dimensional space. We also developed a content-based image retrieval tool (pixel affinity) to identify similar images in our database and results were visually very promising.

Development of Big Seismic Data Processing Tools

Alemayehu Belay Kassa

and 4 more

February 09, 2023

Seismology is a data-driven science with a huge amount of data gathered for over a century. Though seismic data recording started in 1900, the growth of seismic data has obviously been exponentially in the last three decades. This data growth can be easily noticed if one takes a close look at just one of the largest seismological data centers in the US, the Data Management Center (DMC) of the Integrated Research Institutions in Seismology (IRIS). Data at the DMC grew from less than 10 Tebibytes in 1992 to about 800 Tebibytes in 2022. With the availability of such a large amount of seismic data, it is paramount to develop new seismic data processing and management tools to help analyze and find new and better seismic models. Developing new big seismic data processing and management tools will be helpful to make the best use of such growing big seismological data sets. The main goal of this investigation is the development of efficient data manipulation and processing tools for retrieval, processing, merging, aggregation, and management of big seismic data from disparate data sources. In this study, such big seismic data processing tools are being developed using python programming language and open source python libraries, and the tools we are developing will be helpful to extract, split, and convert, merge and process big seismic data. In addition, python is very suitable for data science and has powerful libraries to process and manage data and applications. Significant contributions have been made in recent python based libraries for seismic data processing, though there are still some rooms for improvement when it comes to seismic applications to merge, convert, manage and process big seismic data from disparate data sources and converting different file formats. Seismic data from different networks surrounding the Rio Grande Valley have been collected from different data sources. Our attempt is to test the developed tools and evaluate their performance. This study has made important progress in this regard and the results are promising.

M³: Towards Long-Term Acoustic Monitoring of Gas Emissions using Underwater Cabled Ob...

Yann Marcon

and 6 more

February 23, 2018

Natural methane gas release from the seafloor is a widespread phenomenon that occurs at cold seeps along most continental margins. Since their discovery in the early 1980s, seeps have been the focus of intensive research, partly aimed to refine the global carbon budget. However, deep-sea research is challenging and expensive and, to date, few works have successfully monitored the variability of methane gas release over long time periods (> 1 yr). Long-term monitoring is necessary to study the mechanisms that control seabed gas release. The M³ project, funded by the German Ministry of Education and Research, aims to study the temporal and spatial variability of gas emissions at the Southern Hydrate Ridge (SHR) by acoustically monitoring and quantifying gas effluxes over several years. Located 850 m deep on the Cascadia accretionary prism offshore Oregon, the SHR is one of the most studied seep sites and persistent but variable gas release has been observed for more than 20 years. Since 2015, the Ocean Observatories Initiative’s (OOI) Cabled Array observatory, provides power supply and two-way communication to the SHR, making it an ideal site for continuous long-term monitoring work. In this work, we present how we will take advantage of the OOI infrastructure and deploy several instruments on the seabed for at least 1.5 year. A multi-beam “overview” sonar mounted on a rotor will identify every gas bubble stream located within 200 m from the sonar location. A scanning “quantification” sonar will be used to estimate the amount of gas that is released from discrete gas streams. A camera system and a CTD probe will help process and analyze the hydro-acoustic data. All instruments will be powered and controlled from land through the OOI infrastructure. We present the instrument design, the operation protocol, as well as the data processing steps and expected results.

Neural Network model for classification of net CO2 fluxes scenarios in Tapajós Forest...

Lucas Bauer

and 5 more

September 27, 2022

The Amazon rainforest has a great influence on the global energy balance and carbon fluxes, responsible for the net removal of approximately 4 million tons of carbon per year, via photosynthetic activity. Climate change and deforestation have impacts on the carbon budget in Amazonia, transforming CO2 sink areas into sources. Given the complexity of the factors that govern the carbon exchange in the Amazon and its influence on biological processes, the use of Data science strategies can promote a better understanding about the main environmental factors for different scenarios, and also, assist in public policies to mitigate the global warming effects. This study aims to identify the environmental factors that determine the temporal variability of carbon exchanges between the biosphere and the atmosphere in the Tapajós National Forest, in the Amazon, applying Data Science strategies in an integrated set of environmental data from energy and carbon fluxes and remote sensing data. The specific objective is to assess the influence of a selected set of environmental variables on the variability of carbon exchanges, with the use of an artificial neural networks classification model to identify the variables with great impact on source, sink and neutrality scenarios in Tapajós National Forest. Data Science strategies were applied to an integrated dataset of ground-based carbon flux measurements and remote sensing data, considering the period between 2002 and 2006. An artificial neural network (ANN) classification model was developed to identify the environmental variables with great impact on carbon source, sink and neutrality conditions. The average global score of ANN model was 65%. It was possible to identify the predictor variables with greatest impact to the carbon sink condition: radiation at the top of the atmosphere, sensible and latent energy fluxes and leaf area index. Thus, the ANN model with an ensemble of Data Science strategies can improve a better understanding of variability CO2 fluxes and be a powerful tool to promote new knowledge.

Development of IoT-based camera system for automated in-field monitoring to support c...

Worasit Sangjan

and 4 more

November 04, 2022

Automated monitoring and evaluation systems for plant phenotyping are one of the keys to advance and strengthen crop breeding programs. In this study, the improvements of the camera-based sensor system and a weather station from a previous study-assembled mainly from Raspberry Pi products-board with dual cameras (RGB and NoIR) providing high spatial and temporal resolution data-is outlined. Hardware for the internet connection and the power supply system of the sensor were upgraded. Previously, the sensor could automatically capture plant images following user-defined time points; thus, an image processing algorithm (edge computing) was developed and installed to extract digital phenotypic traits from the images after capturing process. With the development, the new sensor system could be integrated with the internet, and a cloud server was configured to store data online (digital traits and raw images). A real-time monitoring system was created to visualize the time series data of a trait development and plant images throughout the season. With such a system, plant breeders will be able to monitor multiple trials for timely crop management and decision-making process, which is also resources efficiency.

Garbage-In Garbage-Out (GIGO): The Use and Abuse of Combustion Modeling and Recent U....

PattiMichelle Sheaffer

December 20, 2021

Although adequately detailed kerosene chemical-combustion Arrhenius reaction-rate suites were not readily available for combustion modeling until ca. the 1990’s (e.g., Marinov [1998]), it was already known from mass-spectrometer measurements during the early Apollo era that fuel-rich liquid oxygen + kerosene (RP-1) gas generators yield large quantities (e.g., several percent of total fuel flows) of complex hydrocarbons such as benzene, butadiene, toluene, anthracene, fluoranthene, etc. (Thompson [1966]), which are formed concomitantly with soot (Pugmire [2001]). By the 1960’s, virtually every fuel-oxidizer combination for liquid-fueled rocket engines had been tested, and the impact of gas phase combustion-efficiency governing the rocket-nozzle efficiency factor had been empirically well-determined (Clark [1972]). Up until relatively recently, spacelaunch and orbital-transfer engines were increasingly designed for high efficiency, to maximize orbital parameters while minimizing fuels and structural masses: Preburners and high-energy atomization have been used to pre-gasify fuels to increase (gas-phase) combustion efficiency, decreasing the yield of complex/aromatic hydrocarbons (which limit rocket-nozzle efficiency and overall engine efficiency) in hydrocarbon-fueled engine exhausts, thereby maximizing system launch and orbital-maneuver capability (Clark; Sutton; Sutton/Yang). The rocket combustion community has been aware that the choice of Arrhenius reaction-rate suite is critical to computer engine-model outputs. Specific combustion suites are required to estimate the yield of high-molecular-weight/reactive/toxic hydrocarbons in the rocket engine combustion chamber, nonetheless such GIGO errors can be seen in recent documents. Low-efficiency launch vehicles (SpaceX, Hanwha) therefore also need larger fuels loads to achieve the same launched/transferred mass, further increasing the yield of complex hydrocarbons and radicals deposited by low-efficiency rocket engines along launch trajectories and into the stratospheric ozone layer, the mesosphere, and above. With increasing launch rates from low-efficiency systems, these persistent (Ross/Sheaffer [2014]; Sheaffer [2016]), reactive chemical species must have a growing impact on critical, poorly-understood upper-atmosphere chemistry systems.

NAPPN Annual Conference Abstract: Integrating Live Confocal Microscope Imagery of Sto...

Joseph Crawford

and 4 more

January 06, 2023

Stomata are the microscopic pores on plant leaves that open or close to regulate the flux of water from leaves. Guard cells of stomata are known to react to environmental conditions such as light and CO2 in order to optimize CO2 uptake and water loss. Stomatal anatomy (aperture, length, width, etc.) influences leaf-level physiology traits including conductance to water. Stomatal anatomy can be visualized in situ by microscopy, but the difficulty of regulating the atmospheric environment of a microscope stage means that the conditions under which imaging is done are rarely physiologically relevant. Alternatively, portable photosynthesis measuring instruments offer a non-destructive estimate of leaf gas exchange, including stomatal conductance, while the leaf experiences tightly controlled steady-state or dynamic environmental conditions. However, these measurements reflect stomatal characteristics in aggregate on a leaf area basis, which are heavily influenced by the mesophyll as well as epidermal structure and function. Observing the behavior of stomata by microscopy simultaneous to controlling the leaf environment and measuring gas exchange fluxes would allow advances in the understanding of leaf structure-function relationships. To reconcile the microscopic stomatal characteristics with leaf-level gas exchange we have combined laser scanning confocal microscopy and gas exchange instruments to simultaneously observe stomatal characteristics (e.g. stomatal aperture, pore depth, closing speed) and leaf-level traits like photosynthesis, transpiration, and stomatal conductance. Results are presented for the use of this approach on diverse plant species.

QAnalySeries -- a cross-platform time series tuning and analysis tool

Sergey Kotov

and 1 more

December 18, 2018

We introduce a computer program for time series tuning and analysis. The well known in paleo-climatic community program AnalySeries after Paillard et al. is restricted to the Mac OS (32-bit) and, according to Apple plans to move entirely onto the 64-bit system, it will not be supported in the future upgrades of Macintosh OS. QAnalySeries is an attempt to re-implement the major functionality of AnalySeries thus providing the community with a useful tool. QAnalySeries is written using Qt SDK as a free software and can be run on Macintosh, Windows and Linux systems. Paillard, D., L. Labeyrie and P. Yiou (1996), Macintosh program performs time-series analysis, Eos Trans. AGU, 77: 379.

High-throughput microscopy image analysis of plant stomata

Katie Murphy

and 3 more

November 01, 2022

High-oil tobacco varieties have been recently engineered to produce increased leaf oil content for future food and fuel needs. An engineered variety of Nicotiana tabacum produces ~30 percent of leaf dry weight in lipids in the form of triacylglycerol (TAG), a significant increase relative to the less than 1 percent storage oil normally found in wild-type leaves. This high-oil tobacco also accumulates oil bodies in stomatal guard cells. In order to understand the impact of oil on guard cell shape, aperture, and dynamics, we have co-opted computer vision tools in PlantCV to create an accurate, flexible, and high-throughput method for microscopy image analysis of stomata. To this end, leaf impressions are made with silicone putty; clear nail polish peels of the putty impressions are imaged using light microscopy. Binary thresholding followed by point-and-click regions of interest and morphology calculations provide stomatal counts, aperture, and other shape characteristics. Applying this method to high-oil tobacco demonstrated reduced stomatal aperture but the same number of stomata per unit leaf area, providing a mechanistic explanation of high-oil tobacco responses to high temperature and water deficit stresses.

Contribution of Cropland Wind Erosion to Air Pollution: Case of an Arizona Dust Storm

Janak Joshi

December 27, 2021

Being in an arid zone that is frequently submitted to high winds, south-central Arizona regularly gets impacted by several blowing dust events or dust storms every year. Major consequences of these events are visibility impairment and ensuing road traffic accidents, and a variety of health issues induced by inhalation of polluted air loaded with fine particulate matter produced by wind erosion. Despite such problems, and thus a need for guidance on mitigation efforts, studies dealing with dust source attribution for the region are largely missing. Furthermore, existing dust models exhibit large uncertainties and deficiencies in simulating dust events, rendering them of limited use in attribution studies or early warning systems. Therefore, to address some of these model issues, we have developed a high-resolution (1 km) dust modeling system by building upon an existing modeling framework consisting of Weather Research and Forecasting (WRF), FENGSHA (a dust emission model), and Community Multiscale Air Quality (CMAQ) models. In addition to incorporating new representations in the dust emission scheme, including roughness correction factor, sandblasting efficiency, and dust source mask, we implemented, in the dust model, up-to-date and very high-resolution data on land use, soil texture, and vegetation index. We used the revised dust modeling system to simulate a springtime dust storm (08–09 April 2013) of relatively long duration that caused a regional traffic incident involving minor injuries. The model simulations compared reasonably well against observations of concentration of particulate matter with a diameter of 10 μm and smaller (PM₁₀) and satellite-derived dust optical depth and vertical profile of aerosol subtypes. Interestingly, simulation results revealed that the anthropogenic (cropland) dust sources contributed more than half (~53 % or 260 µg/m³) of total PM₁₀, during the dust storm, over the region including Phoenix and western Pinal County. Contrary to the conventional wisdom that desert is the main dust source, our findings for this region challenge such belief and suggest that the regional air quality modeling over dryland regions should emphasize an improved representation of dust from agricultural lands as well, especially during high wind episodes. Such representations have the potential to inform decision-making in order to reduce windblown dust-related hazards on public health and safety.

Low-Cost Water Wells for Developing Countries

Russell Qualls

November 07, 2022

Environmental justice and equity should include access to clean water for all. It is expensive to drill borehole wells, typically over $10,000 US dollars, and so organizations working to provide wells in developing countries have typically installed community wells at some common gathering place. This requires that many users must walk long distances to access these water sources. This limits the quantity of water available to a family, and also creates vulnerabilities for the family member, usually a woman or child, sent for the water since the journey is often made early in the morning or at night in the dark. I have been drilling wells with a Kenyan team since 2010 using a simple, manual percussion hydraulic method developed by WaterForAllinternational.org whereby we can install a well generally for less than $200 US dollars excluding labor. Through their own participation in the drilling process, this low-cost enables families to pay for and drill their own well. In this way, they gain access to a much larger supply of water at or close to home, and eliminate the need and vulnerability associated with walking long distances to procure water for their family. Both the drilling apparatus and the cased well, including the pump, is constructed from materials available off-the-shelf at local hardware stores. Over the years I have made several modifications to the pump design, other infrastructure, and manufacturing process to improve the longevity, simplicity, and interchangeability of the final product. The drilling method is primarily applicable to aquifers lying above bedrock and it is feasible to drill wells to a depth of several hundred feet. The greatest challenge in the endeavor is earning the trust and cultivating the participation of the local community. This presentation will address the drilling process, the well infrastructure, and some socio-cultural aspects of the project.

Temporal field phenomics allows discovery of nature AND nurture, so can we saturate t...

Seth C Murray

and 3 more

November 04, 2022

An organism's phenome results from expression of its genome (nature) under certain environment and management effects (nurture) and interactions between these factors, as well as measurement error. For over 30 years, DNA sequencing and genomics tools advanced to where it's now feasible to saturate genomes of segregating individuals, such that polymorphisms at nearly any position can be determined from other known positions. This is due to structure, linkage disequilibrium (LD), or linkage and is a powerful tool for genomic prediction and investigating biological phenomena. In contrast, most phenomics to date focuses on automating previously known "traits" as measurable and interpretable phenotypes; akin to focusing on measuring a single DNA marker rather than measuring an entire saturated genome. Viewing phenomics as a platform for discovery, similar to genomics, opens new methods for capturing phenomena in nature and nurture. Saturating a phenome would mean that an individual's fitness, performance, responses to environment and/or specific phenotypes could be accurately predicted in untested environments. To date, our experience with phenomic prediction for cumulative, complex phenotypes such as grain yield suggests it's possible to predict organismal performance in untested environments, possibly better than genomic methods despite less advanced tools and data. Factors limiting to saturating a phenome are evaluating enough individuals and environments, but more importantly, tools and methods to extract or "sequence" more phenomic features. Successfully saturating phenomes will impact every aspect of science and society, in biological disciplines from germplasm curators, physiologists to breeders, to education, the courtroom and policy.

The shape of aroma: measuring and modeling citrus oil gland distribution

Erik Amézquita

and 6 more

October 05, 2022

From preventing scurvy to being part of religious rituals, citrus are intrinsically connected to human health and perception. From tiny mandarins to head-sized pummelos, citrus capability of hybridization provides a vastly diverse array of fruit sizes and shapes, which in turn corresponds to a diversity of flavors and aromas. These sensory qualities are tightly linked to oil glands in the citrus skin. The oil glands are also key to understanding fruit development, and the essential oils contained by them are fundamental in the food and perfume industries. We study the shape of citrus based on 3D X-ray CT scan reconstruction of 163 different citrus samples comprising 58 different species and cultivars, including samples of all fundamental citrus species. First, using the power of X-rays and image processing, we are able to compare and contrast size ratios between different tissues, such as the size of the skin compared to the rind or the flesh. Second, we model the fruit shape as an ellipsoidal surface, and later we study and infer possible oil gland distributions on this surface using principles of directional statistics. We finally compare and contrast these overall fruit shape models along their gland distributions across different citrus species. This morphological modeling will allow us later to link genotype with phenotype, furthering our insight on how the physical shape is genetically specified in DNA.

Knowledge-based Artificial Intelligence for Agroecosystem Carbon Budget and Crop Yiel...

Licheng LIU

and 12 more

July 03, 2023

Improving the estimation of CO2 exchange between the atmosphere and terrestrial ecosystems is critical to reducing the large uncertainty in the global carbon budget. Large amounts of the atmospheric CO2 assimilated by plants return to the atmosphere by ecosystem respiration (Reco), including plant autotrophic respiration (Ra) and soil microbial heterotrophic respiration (Rh). However, Ra and Rh are challenging to be estimated at large regional scales because of the limited understanding of the complex interactions among physical, chemical, and biological processes and the resulting high spatio-temporal dynamics. Traditional approaches for estimating Reco including process-based (PB) models are limited by human knowledge resulting in limited accuracy and efficiency. Accumulation of the in situ observation of net ecosystem exchange (NEE), weather, and soil, and satellite data of GPP, LAI and soil moisture make it possible for applying data driven machine learning (ML) approaches. But the ML model approach has disadvantages of omission of domain knowledge and lack of interpretability. Here we propose a novel knowledge guided machine learning (KGML) method for predicting daily Ra and Rh in the US crop fields. With Gated Recurrent Unit (GRU) as the basis, we develop the KGML models constructing the hierarchical structure of ML with a mass balance constraint. The KGML models were pre-trained using synthetic data generated by an advanced agroecosystem model, ecosys, and re-trained with real-world FLUXNET observation data. We extrapolate the best KGML model to crop fields over the US with the help of satellite data, reanalysis climate forcings, and soil database to reveal the spatio-temporal variations and key controlling factors. We believe this study advances the interpretable machine learning concept for carbon cycle estimation and will shed light on many other process-based biogeochemistry research.