Diverse, complex data are a significant component of Earth Science’s “big data” challenge. Some earth science data, like remote sensing observations, are well understood, are uniformly structured, and have well-developed standards that are adopted broadly within the scientific community. Unfortunately, for other types of Earth Science data, like ecological, geochemical and hydrological observations, few standards exist and their adoption is limited. The synthesis challenge is compounded in interdisciplinary projects in which many disciplines, each with their own cultures, must synthesize data to solve cutting edge research questions. Data synthesis for research analysis is a common, resource intensive bottleneck in data management workflows. We have faced this challenge in several U.S. Department of Energy research projects in which data synthesis is essential to addressing the science. These projects include AmeriFlux, Next Generation Ecosystem Experiment (NGEE) - Tropics, Watershed Function Science Focus Area, Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), and a DOE Early Career project using data-driven approaches to predict water quality. In these projects, we have taken a range of approaches to support (meta)data synthesis. At one end of the spectrum, data providers apply well-defined standards or reporting formats before sharing their data, and at the other, data users apply standards after data acquisition. As these projects continue to evolve, we have gained insights from these experiences, including advantages and disadvantages, how project history and resources led to choice of approach, and enabled data harmonization. In this talk, we discuss the pros and cons of the various approaches, and also present flexible applications of standards to support diverse needs when dealing with complex data.
The U.S. Department of Energy’s (DOE) East River community observatory (ER) in the Upper Colorado River Basin was established in 2015 as a representative mountainous, snow-dominated watershed to study hydrobiogeochemical responses to hydrological perturbations in headwater systems. Led by the Watershed Function Science Focus Area (SFA), the ER has both long-term and spatially-extensive observations paired with experimental campaigns. The Watershed Function SFA, led by Berkeley Laboratory, includes researchers from over 30 organizations who conduct cross-disciplinary process-based investigations and mechanistic modeling of watershed behavior in the ER. The data generated at the ER are extremely heterogeneous, and include hydrological, biogeochemical, climate, vegetation, geological, remote sensing, and model data that together comprise an unprecedented collection of data and value-added products within a mountainous watershed, across multiple spatiotemporal scales, compartments, and life zones. Within 5 years of data collection, these datasets have already revealed insights into numerous aspects of watershed function such as factors influencing snow accumulation and melt timing, water balance partitioning, and impacts of floodplain biogeochemistry and hillslope ecohydrology on riverine geochemical exports. Data generated by the SFA are managed and curated through its Data Management Framework. The SFA has an open data policy, and over sixty ER datasets are publicly available through relevant data repositories. A public interactive map of data collection sites run by the SFA is available to inform the broader community about SFA field activities. Here, we describe the ER and the SFA measurement network, present the public data collection generated by the SFA and partner institutions, and highlight the value of collecting multidisciplinary multiscale measurements in representative catchment observatories.