Denise Hills

and 17 more

This article is composed of three independent commentaries about the state of ICON principles (Goldman et al., 2021a) in Earth and Space Science Informatics (ESSI) and includes discussion on the opportunities and challenges of adopting them. Each commentary focuses on a different topic: (Section 2) Global collaboration, cyberinfrastructure, and data sharing; (Section 3) Machine learning for multiscale modeling; (Section 4) Aerial and satellite remote sensing for advancing Earth system model development by integrating field and ancillary data. ESSI addresses data management practices, computation and analysis, and hardware and software infrastructure. Our role in ICON science therefore involves collaborative work to assess, design, implement, and promote practices and tools that enable effective data management, discovery, integration, and reuse for interdisciplinary work in Earth and space science disciplines. Networks of diverse people with expertise across Earth, space, and data science disciplines are essential for efficient and ethical exchanges of FAIR research products and practices. Our challenge is then to coordinate the development of standards, curation practices, and tools that enable integrating and reusing multiple data types, software, multi-scale models, and machine learning approaches across disciplines in a way that is as open and/or FAIR as ethically possible. This is a major endeavor that could greatly increase the pace and potential of interdisciplinary scientific discovery.

Valerie C Hendrix

and 13 more

Diverse, complex data are a significant component of Earth Science’s “big data” challenge. Some earth science data, like remote sensing observations, are well understood, are uniformly structured, and have well-developed standards that are adopted broadly within the scientific community. Unfortunately, for other types of Earth Science data, like ecological, geochemical and hydrological observations, few standards exist and their adoption is limited. The synthesis challenge is compounded in interdisciplinary projects in which many disciplines, each with their own cultures, must synthesize data to solve cutting edge research questions. Data synthesis for research analysis is a common, resource intensive bottleneck in data management workflows. We have faced this challenge in several U.S. Department of Energy research projects in which data synthesis is essential to addressing the science. These projects include AmeriFlux, Next Generation Ecosystem Experiment (NGEE) - Tropics, Watershed Function Science Focus Area, Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), and a DOE Early Career project using data-driven approaches to predict water quality. In these projects, we have taken a range of approaches to support (meta)data synthesis. At one end of the spectrum, data providers apply well-defined standards or reporting formats before sharing their data, and at the other, data users apply standards after data acquisition. As these projects continue to evolve, we have gained insights from these experiences, including advantages and disadvantages, how project history and resources led to choice of approach, and enabled data harmonization. In this talk, we discuss the pros and cons of the various approaches, and also present flexible applications of standards to support diverse needs when dealing with complex data.

Dylan O'Ryan

and 8 more

Data standardization can enable data reuse by streamlining the way data are collected, providing descriptive metadata, and enabling machine readability. Standardized open-source data can be more readily reused in interdisciplinary research that requires large amounts of data, such as climate modeling. Despite the importance given to both FAIR (Findable, Accessible, Interoperable, Reusable) data practices and the need for open-source data, a remaining question is how community data standards and open-source data can be adopted by research data providers and ultimately achieve FAIR data practices. In an attempt to answer this question, we used newly created water quality community data reporting formats and applied them to open-source water quality data. The development of this water quality data format was curated with several other related formats (e.g., CSV, Sample metadata reporting formats), aimed at targeting the research community that have historically published water quality data in a variety of formats. The water quality community data format aims to standardize how these types of data are stored in the data repository, ESS-DIVE (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem). Adoption of these formats will also follow FAIR practices, increase machine readability, and increase the reuse of this data. We applied this community format to open-source water quality data produced by the Watershed Function Scientific Focus Area (WFSFA), a large watershed study in the East River Colorado, which involves many national laboratories, institutions, scientists, and disciplines. In this presentation, we provide a demonstration of a relatively efficient process for converting open-source water quality data into a format that adheres to a community data standard. We created examples of water quality data translated to the reporting formats that demonstrated the functionality of these data standards; descriptive metadata and sample names, streamlined data entries, and increased machine readability were products of this translation. As the community data standards are integrated within the WFSFA data collection processes, and ultimately all data providers of ESS-DIVE, these steps may enable interdisciplinary data discovery, increase reuse, and follow FAIR data practices.