Raza Ahmad

and 5 more

The conduct of reproducible science improves when computations are portable and verifiable. A container provides an isolated environment for running computations and thus is useful for porting applications on new machines. Current container engines, such as Linux Containers (LXC) and Docker, however, have a high learning curve, are resource-intensive, and do not address the entire reproducibility spectrum consisting of portability, repeatability, and replicability. As part of EarthCube, we have developed Sciunit (https://sciunit.run) which encapsulates application dependencies i.e, system binaries, code, data, environment, along with application provenance. The resulting research object can be easily shared and reused amongst collaborators. Sciunit can be used with HydroShare’s JupyterHub CUAHSI notebook environment, and available to the entire community for use. In this poster, we will present three new features in Sciunit which have emerged based on community-provided use cases and discussion. Sciunit is available as a command-line utility. We will: (1) showcase the new Sciunit API. This will allow data facilities to integrate Sciunit as a reproducible environment on portals, (2) show how a Sciunit container can transition to a Docker container and vice versa, and finally, (3) demonstrate the ability to contrast two containers in terms of content and metadata. We will show these capabilities with the Hydrology use case of pySUMMA, a Python API for the Structure for Unifying Multiple Modeling Alternative (SUMMA) hydrologic model.

David Tarboton

and 11 more

HydroShare is a domain specific data and model repository operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) to advance hydrologic science by enabling individual researchers to more easily share products resulting from their research. The community platform supports, not just the scientific publication summarizing a study, but also the data, models and workflow scripts used to create the scientific publication and reproduce the results therein. HydroShare accepts data from anybody, and supports Findable, Accessible, Interoperable and Reusable (FAIR) principles. HydroShare is comprised of two sets of functionality: (1) a repository for users to share and publish data and models, collectively referred to as resources, in a variety of formats, and (2) tools (web apps) that can act on content in HydroShare and support web based access to compute capability. Together these serve as a platform for collaboration and computation that integrates data storage, organization, discovery, and analysis through web applications (web apps) and that allows researchers to employ services beyond the desktop to make data storage and manipulation more reliable and scalable, while improving their ability to collaborate and reproduce results. This presentation will describe the capabilities developed for HydroShare to support the full research data management life cycle. Data can be entered into HydroShare as soon as it is collected, and initially shared only with the team directly working on the data. As analysis proceeds, tools, scripts and models that act on the data to produce research results may be stored in HydroShare resources alongside the data. At the time of publication these resources may be permanently published and receive digital object identifiers and cited in research papers. Resources may themselves include citations to the research papers, thereby linking the publications to the supporting data, scripts and models. HydroShare design choices and capabilities for establishing relationships and versioning, based on simplicity, and ease of use, and some of the challenges encountered, will be discussed.