Karsten Peters

and 2 more

The full-featured and CoreTrustSeal certified long term archiving service LTA WDCC (World Data Centre for Climate) at DKRZ (German Climate Computing Center, Hamburg) offers long term preservation for datasets relevant for climate and Earth System research. The WDCC collects, stores, and disseminates Earth System data with a focus on climate simulation data and climate related data products. It has established itself as a staple infrastructure for the global climate modelling research community. Data preservation in LTA WDCC is preceded by a thorough technical quality control and provides intense data curation for storage periods longer than 10 years. During the preservation period, long term findability, searchability and reusability of the data are ensured. Long term findability of the curated data is enabled through the possibility of assigning DataCite DOI’s to archived datasets. The data undergo additional quality checks before being eligible for DOI assignment. This process is performed in close collaboration with the data providers. The focus of these quality checks is to ensure the unambigous (inter-)disciplinary reusability of the preserved datasets and includes checking for proper documentation, adherence to domain specific (meta)data standards, uncertainty analysis and cross-referencing. Only then can a high level of reusability of the data be achieved, justifying the involved effort. The perceived need for research data repositories to comply with the 2016-published FAIR Guiding Principles has motivated us to perform an even-handed and systematic self assessment of LTA WDCC FAIRness. Due to lack of a standardised evaluation framework, this assessment reflects our specific, albeit objective, interpretation of the principles. Our assessment, published on the DKRZ webpages, shows that the native philosophy behind DKRZ’s LTA WDCC service – especially the focus on reusability – reflects the FAIR Guiding Principles by design and even goes beyond them by ensuring very long-term (>10 years) preservation and therefore reusability of archived data.
From a research data repositories’ perspective, offering data management services in-line with the FAIR principles is becoming more and more of a selling point to compete on the market. In order to do so, the services offered must be evaluated and credited following transparent and credible procedures. Several FAIRness evaluation methods are openly available for being applied to archived (meta)data. However, there exists no standardized and globally accepted FAIRness testing procedure to date. Here, we apply an ensemble of 5 FAIRness evaluation approaches to selected datasets archived in the WDCC. The selection represents the majority of WDCC-archived datasets (by volume) and reflects the entire spectrum of data curation levels. Two tests are purely automatic, two are purely manual and one test applies a hybrid method (manual and automatic combined) for evaluation. The results of our evaluation show a mean FAIR score of 0.67 of 1. Manual approaches show higher scores than automated ones. The hybrid approach shows the highest score. Computed statistics show agreement between the tests at the data collection level. None of the five evaluation approaches is fully fit-for-purpose to evaluate (discipline-specific) FAIRness, but all have their merit. Manual testing captures domain- and repository-specific aspects of FAIR. Machine-actionability of archived (meta)data is judged by the evaluator. Automatic approaches evaluate the machine-actionable features of archived (meta)data. These have to be accessible by an automated agent and comply with globally established standards. An evaluation of contextual metadata (essential for reusability) is not possible. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation. We recommend that future operational FAIRness evaluation be based on a mature hybrid approach. The automatic part of the evaluation would retrieve and evaluate as much machine-actionable discipline specific (meta)data content as possible and be then complemented by a manual evaluation focusing on the contextual aspects of FAIR. Design and adoption of the discipline-specific aspects will have to be conducted in concerted community efforts. We illustrate a possible structure of this process with an example from climate research.