The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?

Jenny Dissen

and 5 more

The National Oceanic and Atmospheric Administration (NOAA) research to operations (R2O) experiment called the Big Data Project (BDP) was envisioned as a scalable approach for disseminating exponentially increasing NOAA observation, model, and research datasets to the public using commercial cloud services. At the start of the project, during the concept development phase, it was unclear how the specifics might work so a spiral development approach was adopted. It was expected that the number of data sets would increase, and the data extent would grow to cover complete records of some holdings, and that format experimentation would be needed to determine optimal cloud offerings. This dissemination model would require a new way for the BDP and NOAA to engage with end-users, who could range from large enterprises to small businesses and individuals. The BDP was expected to change the game-not just by reaching a broad and diverse set of users but by encouraging new ones. As Dr. Kathy Sullivan, former NOAA Administrator under whom the BDP began, noted, “The agency’s aim is to ‘spur innovation’ and to explore how to create a ’global economic return on investment” (Konkel, 2015). This Chapter describes the journey of BDP as it developed, transitioned and evolved from an experiment to an operational enterprise function for NOAA, now known as NOAA Open Data Dissemination (NODD). Obstacles to the Public’s Use of NOAA Environmental Data NOAA’s mission is to understand and predict changes in climate, weather, oceans, and coasts, to share that knowledge and information with others, and to conserve and manage coastal and marine ecosystems and resources. The agency takes seriously the need for communication of NOAA’s research, data, and information for use by the Nation’s businesses and communities to allow preparation, response and resilience to sudden or prolonged changes in our natural systems. This includes climate predictions and projections; weather and water reports, forecasts and warnings; nautical charts and navigational information; and the continuous delivery of a range of Earth observations and scientific data sets for use by public, private, and academic sectors (NOAA About our agency, 2021).