Justin Buck

and 6 more

The British Oceanographic Data Centre (BODC) celebrated its 50th anniversary in 2019. It holds data collected from 1773 to the present day. Holdings are multidisciplinary, heterogeneous data reflecting the full range of disciplines, platforms, temporal and spatial fieldwork scales typically encountered in oceanographic research and monitoring. These collections vary in granularity and contain data which are at different stages of curation ranging from raw data to standardised data products. BODC need to improve data services to meet the developing the expectations of the research community. These include the FAIR data principles, TRUSTed repository guidelines and CoreTrustSeal accreditation. This is a significant challenge within the constraints of resource available (both financial and human). The initial focus for BODC is making holdings citable with the following aspirations: Application of DOIs to data at the point of receipt by BODC. Publication of data papers and publication of DOIs for data products. Application of persistent identifiers to low level data granules where DOIs are not feasible. Application of persistent identifiers to datasets included in BODC API services and versioning of these data. Work with organisations or groups who include data curated by BODC in their products to enable the provenance of data to be unambiguous. Work with communities on joint data papers where BODC are a partner organisation. This will enable each type of data served by BODC to be unambiguously citable. The initial effort is being directed towards the application of DOIs to data submissions and publication of data papers for BODC curated data products.

Justin Buck

and 11 more

A gap in community practice on data citation that emerged during the AGU fall meeting 2020 Data FAIR Town Hall, “Why Is Citing Data Still Hard?” with the goal of addressing the use case of citing a large number of datasets such that credit for individual datasets is assigned properly. The discussion included the concept of a “Data Collection” and the infrastructure and guidance still needed to fully implement the capability so it is easier for researchers to use and receive credit when their data are cited in this manner. Such collections of data may contain thousands to millions of elements with a citation needing to include subsets of elements potentially from multiple collections. Such citations will be crucial to enable reproducible research and credit to data and digital object creators. To address this gap, the data citation community of practice formed including members from data centres, research journals, informatics research communities, and data citation infrastructure. The community has the goal of recommending an approach that is realistic for researchers to use and for each stakeholder to implement that leverages existing infrastructure. To achieve data citation of these subsets of large data collections the concept of a “reliquary” is introduced. In this context the reliquary is a container of persistent identifiers (PIDs) or references defining the objects used in a research study. This can include any number of elements. The reliquary can then be cited as a single entity in academic publications. The reliquary concept will enable data citation use cases such as the citation of elements within a data collection that are formed from numerous underlying datasets that have their own PIDs, unambiguous citation of data used in IPCC Assessment Reports, and citing the subsets of collections of research data that contain millions of elements. The discussions over the course of 2021 have developed a theoretical concept, at the time of writing formal use cases and initial applications are being defined. The recommendation developed by this effort will be available for review and comment by communities such as ESIP and RDA. All are welcome.