loading page

PhytoOracle: Scalable, modular phenomic data processing pipelines
  • +8
  • Emmanuel Gonzalez,
  • Ariyan Zarei,
  • Nathanial Hendler,
  • Michele Cosi,
  • Jeffrey Demieville,
  • Sebastian Calleja,
  • Travis Simmons,
  • Holly Ellingson,
  • Nirav Merchant,
  • Eric Lyons,
  • Duke Pauli
Emmanuel Gonzalez
University of Arizona

Corresponding Author:[email protected]

Author Profile
Ariyan Zarei
University of Arizona
Author Profile
Nathanial Hendler
University of Arizona
Author Profile
Michele Cosi
University of Arizona
Author Profile
Jeffrey Demieville
University of Arizona
Author Profile
Sebastian Calleja
University of Arizona
Author Profile
Travis Simmons
University of Arizona
Author Profile
Holly Ellingson
University of Arizona
Author Profile
Nirav Merchant
University of Arizona
Author Profile
Eric Lyons
University of Arizona
Author Profile
Duke Pauli
University of Arizona
Author Profile

Abstract

Previous crop yield improvements have been largely due to the implementation of new management strategies, mechanization, and application of emerging technologies. While these approaches have led to stable, linear improvements, increases in crop yields are currently plateauing. The use and improvement of rapid, automated, and accurate phenomic selection methods leveraging high-resolution data collected throughout a growing season could help identify stress-adaptive traits to meet the growing global food demand. As the capacity of phenomics to generate larger and higher dimensional data sets improves, there is an urgent need to develop and implement robust and scalable data processing pipelines for rapid turnaround of processed results. Current phenomics processing pipelines lack modularity and the ability to exploit the distributed computational infrastructure required for machine learning (ML)-based workloads. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines that aim to improve data processing efficiency for plant science research. PO integrates open-source frameworks for distributed task management on local, cloud, or high-performance computing (HPC) systems. Each pipeline component is available as a standalone container which can be independently deployed or linked into a pipeline. Additionally, researchers can swap between available containers or integrate new ones suited to their specific research. PO extracts phenotype trait values such as volume, height, canopy temperature, and maximum quantum efficiency (F v /F m) of photosystem II from data captured in field settings, enabling the study of phenotypic variation for elucidation of the genetic components of quantitative traits.