For DOI links, we used
requests and the doi.org API to resolve the DOI links, and attempted to follow the link to the original source to retrieve the page domain and source. In cases where we were unable to resolve a link (for example, automatically extracted but badly-formed links, or if the connection to the source page timed out), we labelled the domain as "doi.org (unresolved)".
It is worth noting that prominent repositories, for example those provided by ebi.ac.uk and nih.gov, use accession numbers not DOIs, and so any accession numbers present in DASs were not included.Finally, we grouped and filtered domains by journal using the
Wiley Online Library hierarchy of subject classifications (Table
\ref{133180}, and data in the associated CSV file), and plotted the results using
Plotly. Table
\ref{133180} shows the WOL Level 1 and Level 2 subject areas that reported the largest number of identified domains for each WOL Level 1 category. "WOL Level 1" is the top-level of that classification hierarchy. "WOL Level 2" is the second-level, and is used to provide more specific information by more discrete description of disciplines and subject areas through our analysis. Each WOL Level 1 group contains multiple WOL Level 2 subject groups (for example, the Life Sciences WOL Level 1 category encompasses subject areas such as Cell Biology, Microbiology, Genetics, and others). Figures
\ref{140748} and
\ref{372269} provide benchmarks, showing the numbers of DASs from submitted articles and repositories for WOL Level 1 and 2 categories.