Mike Smit - Authorea

Journals, funding agencies, and researchers are more frequently expecting manuscripts to include links to shared research data. Effective data sharing requires that data be findable, accessible, interoperable, and reusable (FAIR), and is thus predicated on establishing a common understanding on how to communicate: data exchange standards, common data formats, controlled vocabularies, and a communal data repository. When conducting research, we still communicate in shorthand that is effective for everyone on the team who understands our context, but is lost when data is shared in the absence of that context. “Water temperature” means only one thing to my research team, yet can mean dozens of things outside of that context. Data sharing is thus an exercise in sharing not just the data, which is typically readily available, but also the context of that data, which requires additional effort. This effort is one of the barriers to sharing data. We’ll describe an alternative model for accepting data to a repository: the immediate ingestion of data regardless of its metadata quality, then behavioural nudges and crowd-sourcing features that ensure this data meets appropriate standards prior to publication. We’ll show a work-in-progress prototype software tool that supports this alternative model, capable of accepting and standardizing a research data set to use CF conventions and ISO 8601 dates.