Methods
All analyses were conducted in R v.4.0.2 (R Core Team, 2022). The
package tidyverse v.1.3.2 (Wickham et al., 2019) was used to clean and
handle the data, sf v.1.0-9 (Pebesma, 2018) and raster v.3.6 (Hijmans,
2023) for spatial data manipulation, and ggplot2 v.3.4.0 for
visualization (Wickham et al., 2019).
Data
Co-occurrence dataset
We used the same non-standardized co-occurrence dataset used by
Pacheco‐Riaño et al., (2023) consisted of 605,637 taxa records of
terrestrial vascular plants (3,617 taxa) from particular locations
without marked spatial boundaries from 1900 until 2007 from Norway.
These records had their origins in a series of field campaigns,
primarily involving relevés, conducted by various collectors across
Norway since the early 1900s. After that, these records were
meticulously documented in the field notes from The Agder naturmuseum og
botaniske hage, University of Oslo, University of Trondheim, and The
Norwegian University of Life Sciences, and integrated into GBIF and
stored as occurrence data. To reconstruct the co-occurrence plots, we
grouped the data by coordinates, year, elevation, and collector as
described in Pacheco-Riaño et al. (2023), and only kept taxa at species
level or merged lower taxonomical units to species level, using GBIF’s
backbone taxonomy tool (The Global Biodiversity Information Facility,
2020). This resulted in 41,993 co-occurrence plots with 2,888 species
covering the time span from 1905 to 2007 (Fig. 1 ).
Presence-only dataset
We downloaded from GBIF all global georeferenced records for those plant
species included in the non-standardized co-occurrence dataset described
above (3,617 taxa, Table S1 ). This included a total of
212,286,166 records from 6,831 datasets (December 8th, 2022,
https://doi.org/10.15468/DL.VZVGK7 GBIF.org, 2022).
We applied six automated cleaning procedures to eliminate known issues
with presence-only data using the package “CoordinateCleaner” (Zizka
et al., 2019): 1) equal coordinates (records with identical longitude
and latitude), 2) zero coordinates (plain zeros in the coordinates and a
radius around), 3) capitals (radius around capital cities), 4) centroids
(radius around country and province centroids), 5) sea coordinates
(non-terrestrial records), and 6) biodiversity institutions (radius
around biodiversity institutions). All flagged records or records with
missing coordinates or missing record years were removed. As for the
co-occurrence dataset, we harmonized the taxonomy using GBIF’s backbone
taxonomy tool (The Global Biodiversity Information Facility, 2020), and
kept records identified at the species and infraspecific level, e.g.,
subspecies, but merged all records to species level. Subsequently,
duplicated records with the same species names, coordinates and year
were removed, and only occurrences within Europe (longitude: -12° to
35°, latitude: 45° to 72°) were kept. Lastly, to prevent any duplication
with the co-occurrence dataset, we excluded records with the same
species names and geographical coordinates as contained the
co-occurrence records. This process resulted in a cleaned, presence-only
dataset of 68,112,130 occurrence records of 2,742 species.
To build “communities” with spatio-temporal information from the
presence-only data, we created “pseudo-plots” by aggregating species
occurrences temporally and spatially by using the same grid cells as for
the temperature dataset described below (i.e., 30 arc seconds
resolution). Only pseudo-plots between 1905-2016 were kept for further
analyses (Fig.1 ).