The true bottleneck of artificial intelligence (AI) is not access to the data, but rather labeling this data. We have tons of raw agriculture image data coming from various sources and manual labelling remains to be a crucial step to keep the data well organized which requires considerable amount of time, money, and labor. This process can be made more efficient if we can automatically label the raw data. We propose contrastive learning representations for agriculture images (AgCLR) model that uses self-supervised representation learning approach on unlabeled real-world agriculture field data, to learn the useful image feature representations from the images. Contrastive learning is a self-supervised approach that enables model to learn attributes by contrasting samples against each other without the use of labels. AgCLR leverages the state-of-the-art SimCLRv2 framework to learn representations by maximizing the agreement between differently augmented views of same sample. We have incorporated critical enablers like mixed precision, multi-GPU distributed parallel computing, and use of Google Cloud's Tensor Processing Units (TPU) for optimizing the training process. We achieved 80.2% accuracy while classifying the test data. We further applied AgCLR to unrelated task to determine the alleys and rows in corn field videos for corn phenotyping and we observed two cluster formations for alleys and rows when plotted embeddings in a 3-dimensional space. We also developed a content-based image retrieval tool (pixel affinity) to identify similar images in our database and results were visually very promising.
ORCiD: https://orcid.org/0000-0003-4699-3733 Rootstocks are gaining importance in viticulture as a strategy to combat abiotic challenges, as well as enhancing scion physiology and attributes. Therefore, understanding how the rootstock affects photosynthesis is insightful for genetic improvement of either genotype in the grafted grapevines. Photosynthetic parameters such as maximum rate of carboxylation of RuBP (Vcmax) and the maximum rate of electron transport driving RuBP regeneration (Jmax) has been identified as ideal targets for breeding and genetic studies. However, techniques used to measure these photosynthetic parameters are time consuming and subjective to leaf level which is complex to implement at field scale. Hyperspectral remote sensing uses the optical properties of the entire vine to predict photosynthetic capacity at canopy level. In this study, estimates of Vcmax and Jmax were assessed using different machine learning models: PLS (Partial least Squares), LR (Least Angle Regression), LASSO (Least Absolute Shrinkage and Selection Operator), PCR (Principle Component Regression) based on leaf reflectance metrics obtained with hyperspectral wavelength ranging from 400 to 1000nm. Prediction models were developed for six different rootstock genotypes with common scion Marquette considering three different sampling dates carried out in Brookings, South Dakota in 2021. Preliminary results indicate that each rootstock has distinctly different Vcmax and Jmax profiles across the season. From the model assessment, PLS was found to have robust prediction of Vcmax with R2 of 0.53 and for Jmax with R2 of 0.63. Multiple year trials will be used to validate precise and rapid quantification of photosynthesis using hyperspectral remote sensing.
Root exudation refers to the processes by which plants release compounds called root exudates into the soil. These exudates are primarily carbon-containing compounds that interact with microbial communities in the rhizosphere. Microbial consumption of exudates reduces the concentration of the exudated compounds in the soil, causing the plant to exude more of those compounds. Currently, there is limited understanding of the interaction between plant-root exudation mechanisms and the surrounding microbial communities. Among the Sorghum Association Panel (SAP), an established and genetically characterized sorghum diversity panel, we observed a spectrum of root colors (tan, yellow, red, purple-brown, black) identical to the range of observed sorghum seed colors. Previous studies examining differentially expressed metabolites between colorful seeds showed that flavonoids and anthocyanins were higher in dark seeds than white seeds. Root color is genotype-dependent and consistent over time. We hypothesized that the observed color diversity of sorghum roots was due to differential metabolite profiles in the root exudates across genotypes. We designed an experiment to collect exudates from 15 genotypes (n=60). After three weeks of growth, sorghum roots were washed and submerged in ultrapure water for 24 hours. The hydroponic solution was filtered and incubated with methanol. The whole root system was also ground after exudation. The root exudate solutions and the ground-up roots underwent either HILIC and RPLC analysis to separate and detect polar and hydrophobic metabolites. Through metabolite profiling of root exudates, we aim to identify sorghum genotypes that more efficiently allocate carbon below ground via their root systems.
Hemp (Cannabis sativa L. <0.3% THC) is a versatile crop cultivated for grain, fiber, and cannabinoids used for health and wellness purposes. Following the passage of the 2018 Farm Bill, which removed hemp from the controlled substances list, there is potential for expanded hemp acreage and a concomitant need to breed cultivars with suitable agronomic performance in US growing regions. Understanding the relationships between morphological, biochemical, and spectral traits with respect to yield will allow for high-throughput phenotyping and advanced breeding efforts. For five years, the Cornell Hemp research team has evaluated high-cannabinoid hemp cultivars in replicated field trials, with populations originating from both commercial sources and the Cornell Hemp Breeding Program. These field trials evaluated plants for phenotypic traits, including plant height, morphology, flowering time, cannabinoid concentration, and total biomass yield. We have also determined key morphometric measurements that are correlated with end-of-season biomass yield. Also, both floral and foliar cannabinoid samples were correlated with end-of-season whole plant biomass cannabinoid concentration. To improve phenotyping efficiency, an unmanned aerial vehicle (UAV)-based multispectral system was deployed to characterize morphological and biochemical traits over time. These datasets are being used to develop high-throughput phenotyping methods to predict biomass yield, and in the future cannabinoid concentration and flowering time.
A dry beans (Phaseolus vulgaris L.) cultivar must fit the environment in which it will be grown. Therefore, days to maturity (DM) is the most important physiological component affecting yield and grain quality outcomes. Additionally, dry bean stand count (SC) at early growth stages estimation provides useful information for agronomic decision-making and can measure root rot loss due to damping-off. The visual inspection to determine the accurate maturity date and the final number of emerged plants is labor-intensive, time-demanding, and tedious. Therefore, there is an increasing demand for alternative approaches to estimating DM and SC in a high-throughput phenotyping mode (HTP). In this study, we developed a Deep Learning (DL) HTP pipeline to capture the sequential behavior of time series data for estimating DM and to identify target plants in the early growth stage for SC estimation using field dry bean data obtained from aerial RGB images at the plot-level. A state-of-the-art hybrid model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) was used to extract DM features and capture the sequential behavior of time series data. Faster R-CNN object detection method was deployed to SC. The DL model to estimate DM was tested on five different environments across years, and SC was done comparing different ground sample resolutions in two trials. Results suggest the effectiveness of the CNN-LSTM and Faster R-CNN models employed compared to traditional methods. Furthermore, this study highlighted the technical parameters that can influence the DL model results in the breeding program decision-making.
Using Unoccupied Aerial Systems (UASs) to collect image data for field phenotyping enables the coverage of large spatial scales, but the extraction of phenotypes at the level of an individual plant remains a challenge. Phenotypes are generally extracted at the plot level after manual delineation of the plots in an orthomosaic image. However, this approach is impractical when individual plant phenotypes are of interest, such as in mutant screenings, heterozygous population trials, and other cases where individual plants in a plot or a landscape may be very diverse. For mutant studies in particular, UASs have had limited utility since one typically wants to identify individual plants with outlier phenotypes. Current mutant population screenings require scientists to walk through the field on a regular basis to manually identify and record plants of interest. If UAS software could be designed to identify these plants, much larger studies could be conducted with lower labor costs. Here, we use deep Convolutional Neural Network (CNN) models to detect individual maize plants in raw UAS images followed by the extraction of individual phenotypes. The predictions from the CNN models are used to identify plants in the orthomosaic image to derive information on plot location. For the extraction of phenotypes, plant pixels are segmented from the background and traits including color-based indices and canopy coverage are calculated. Individual plant height is derived from the corresponding Digital Surface Model (DSM). This project will contribute to a more efficient approach for the phenotyping of individual plants using UAS images.
ORCiD: [0000-0001-6507-9985 of Jinyoung Y. Barnaby], [0000-0001-9082-6583 of Scott E. Warnke] Precise assessment of large mapping populations, comprising a few thousand plants including replications (a prerequisite step for breeding) is time-consuming and labor-intensive. Furthermore, phenotyping results tend to be variable and subjective depending on who is doing the scoring. One way to overcome these limitations is by collecting more data in the form of digital images, and precisely evaluating phenotypic variation in stress severity as well as temporal progression of stress symptoms within the population through machine learning methods. 230,400 images representing temporal progression of drought stress symptoms of interspecific turfgrass hybrid mapping population were processed using Python OpenCV and NumPy packages for noise removal, edge-preserving smoothing, color space conversion, contrast enhancement, and identification mapping. Then machine learning-based algorithms and models were developed not only to quantify stress severity but also to monitor temporal progression rate of stress symptoms. Hierarchical clustering was then performed to assess genotypic variation in stress progression. Such machine learning-based high-throughput digital phenotyping platforms can significantly increase the success of quantitative trait locus mapping and candidate gene identification to develop potential molecular markers that will assist in a faster characterization of germplasm to ultimately breed for stress resilient cultivars.
A high-throughput image analysis pipeline was developed to facilitate root phenotyping by reducing time-consuming labeling while maintaining phenotyping accuracy. This pipeline leverages a deep learning-based tool named SLEAP (SLEAP Estimates Animal Poses) which is designed to automate the detection of distinct morphological landmarks. By training SLEAP to detect the root branch points, tips, and midline of each root imaged in a gel cylinder, we were able to robustly and efficiently recover the root system geometry. We trained models to identify these landmarks on primary, lateral, and seminal roots across a range of crop plants, including soybean, rice, canola, and pennycress. We find that our SLEAP models are robust across genotypes and experiments, enabling automated root system quantification at the rate of hundreds of plants per hour. Using predictions of root landmark locations, we developed Python-based pipelines to extract phenotypic traits, including tip depths, root lengths, convex hulls, root angles, measures of curviness, and lateral root distribution (available at https://github.com/talmolab/sleap-roots). In order to extract meaningful patterns from this high-dimensional description of plant phenotypes, we use machine learning-based methods for dimensionality reduction and manifold embedding, allowing us to capture the statistical structure of root phenotypes present in our screens. In future work, we will use these quantitative phenotypic traits as a predictor for root system traits that enhance carbon sequestration capabilities in genome-wide association studies.
Recent advancements in proximal remote sensing have increased the spatial and temporal resolution of data collection, as well as the availability of these technologies for applications to precision agriculture. These sensors have allowed the collection of new and large quantities of data, which have been used to successfully determine phenotypes and parametrize crop growth models. So far, these data streams have been mostly used separately, though they contain unique structural, spatial, and spectral information. Thus, this research aims to integrate these disparate data sources to improve estimations of agronomically important crop traits. In this study, we examine two high-throughput and relatively inexpensive remote platforms: unoccupied ground vehicles (UGV) and unoccupied aerial vehicles (UAV). Data were collected on maize hybrids from the Genomes to Fields initiative over 5 years, from 2018 to 2022, in Aurora, NY. We used ground rovers to collect lidar scans, which were converted to point clouds, to construct the three-dimensional sub-canopy architecture of maize plants. Multispectral sensors, covering red, green, blue, red-edge, and near infrared (NIR) were deployed on a UAV platform to characterize maize canopies. Machine learning methods, including autoencoders, will be used to extract latent phenotypes from the lidar point clouds and multispectral images. Ultimately, these will be used to predict manually measured traits, such as yield, in order to compare the prediction accuracies of models using these measurements separately and jointly.
Resource allocation drives the above-ground distribution of mass in grass plants across discrete developmental units called phytomers. Although the number of phytomers varies in genetically-identical grasses, there frequently isn't an associated variance in some summary phenotypes. To understand what may be driving this, we tracked the growth of 30 S. italica plants from genotypes B100 and A10.1. We experimentally observed that plants from the genotype B100 had between 20 and 22 phytomers, while plants from the genotype A10.1 had between 7 and 9 phytomers. B100 plants with more phytomers (e.g., 22) did not grow taller or have more total leaf length, despite having more leaves than plants with fewer phytomers (e.g., 20). A10.1 plants with more phytomers (e.g., 9) did grow taller and had more total leaf length than those with fewer phytomers (e.g., 7). We developed a dynamical model to determine if these patterns are emergent from the underlying growth structure. The model is parameterized using the number of phytomers and related developmental time parameters: leaf emergence, stem and leaf elongation time, panicle emergence, and flowering time. The model uses the semi-sequential nature of phytomer growth as its structure. The model predicts that differences in timing of the shift to reproductive growth could explain the patterns observed. Experimental measurements suggest this shift is primarily due to tuning the developmental time parameter controlling the units contributing to the stem, rather than leaves.
Hyperspectral imaging (HSI) is being widely applied in plant phenotyping platforms. Some new HSI devices such as LeafSpec was introduced recently which can provide a high signal-over-noise ratio along with higher spectral and spatial resolutions. However, most of the previous image processing algorithms only calculated the averaged spectrum from the leaf, but rarely include the spatially distributed information on the leaf level. Meanwhile, different nutrient stresses could result in different color patterns on the leaf which can be used to furtherly improve the quality of plant phenotyping. This study focused on the development of a new methodology that applies spatial distribution analysis on HSI soybean leaf images. Firstly, a novel way of encoding all the leaf pixels to a new coordinate system called Natural Leaf Coordinate System (NLCS) was developed. NLCS defined the coordinates of every pixel relative to the leaf venation so that the following spatial distribution analysis could be conducted more intuitively. Second, a new nitrogen index based on NLCS called NLCS-N was developed and able to outperform the whole leaf averaged NDVI by having a better correlation with the plants' nitrogen contents, and a more significant differentiation between the nitrogen-sufficient versus the nitrogen-deficient plants.
Unmanned aerial vehicles (UAVs) provide growers and researchers with an efficient way to evaluate fields at high resolution. Flying UAVs and collecting imagery are made easily approachable through high-performance sensors that measure a wide spectrum of light and free flight software available on smartphones and tablets. In contrast to the efficiency of collecting imagery, extracting data from this imagery presents a major hurdle for researchers and growers. Current data analysis options require either an expensive subscription service or complex coding packages, effectively preventing many from utilizing remote sensing data. These solutions also are designed as a "black-box", where imagery goes in and data comes out, making customization and adaptability to the user's needs a challenge. To address these shortcomings, I developed an open-source analysis pipeline that is both approachable and robust. Starting with an orthomosaic and combining stock tools in the QGIS graphical user interface, this pipeline follows a simple step-by-step process to mask out soil and apply any user-defined index. From there the user can segment plots using a fast yet highly customizable gridding system, allowing for plot segmentation in unusual field layouts or planting regimes. This feature has been previously unsupported in many subscription and open-source programs alike. Plot-level data can then be exported for statistical analyses. Ultimately, this pipeline is aimed to attract more researchers and growers towards using remote sensing data in their research.
Bioluminescence is used as marker e.g., in genetic or plant pathological studies. We developed a method to monitor bioluminescence at the whole plant level combined with phenotypic analysis of the plant. Using a CCD camera mounted in a cabinet shielding all external light we can image weak luminescence emissions from samples and map these to RGB images. Image processing delivers temporal and spatial data on the distribution of the luminescence together with phenotypic features of the plants. With this technology, microbial colonization of plants can be monitored. Arabidopsis plants were inoculated with Pseudomonas and Xanthomonas plant-pathogenic bacteria labelled with gene cassette for autonomous luminescence and disease progression was monitored over time. Luminescence imaging revealed accumulation of the bacteria in different plant tissues while the RGB images served to monitor plant growth and occurrence of disease symptoms. Applying this method, resistant plants could be selected from a mutant population. Disease responses of susceptible plants were compared to the responses of resistant plants. In the case of Pseudomonas, bacterial abundance reached its maximum during two to four days after inoculation, at a time when water soaking of the leaves could be observed as well with the RGB camera. At later stages-five to seven days after inoculation, disease symptoms in terms of leaf yellowing and tissue collapse occurred while bacterial populations appeared to decrease. With this method it was possible to monitor pathogen development and disease progression non-invasively at whole-plant level over time.