Justin Buck - Authorea

A gap in community practice on data citation that emerged during the AGU fall meeting 2020 Data FAIR Town Hall, “Why Is Citing Data Still Hard?” with the goal of addressing the use case of citing a large number of datasets such that credit for individual datasets is assigned properly. The discussion included the concept of a “Data Collection” and the infrastructure and guidance still needed to fully implement the capability so it is easier for researchers to use and receive credit when their data are cited in this manner. Such collections of data may contain thousands to millions of elements with a citation needing to include subsets of elements potentially from multiple collections. Such citations will be crucial to enable reproducible research and credit to data and digital object creators. To address this gap, the data citation community of practice formed including members from data centres, research journals, informatics research communities, and data citation infrastructure. The community has the goal of recommending an approach that is realistic for researchers to use and for each stakeholder to implement that leverages existing infrastructure. To achieve data citation of these subsets of large data collections the concept of a “reliquary” is introduced. In this context the reliquary is a container of persistent identifiers (PIDs) or references defining the objects used in a research study. This can include any number of elements. The reliquary can then be cited as a single entity in academic publications. The reliquary concept will enable data citation use cases such as the citation of elements within a data collection that are formed from numerous underlying datasets that have their own PIDs, unambiguous citation of data used in IPCC Assessment Reports, and citing the subsets of collections of research data that contain millions of elements. The discussions over the course of 2021 have developed a theoretical concept, at the time of writing formal use cases and initial applications are being defined. The recommendation developed by this effort will be available for review and comment by communities such as ESIP and RDA. All are welcome.