Introduction

Ongoing climate warming induces range shifts of species which track their optimal temperature conditions (Lenoir et al., 2020; Mamantov et al., 2021; O’Sullivan et al., 2021; Steinbauer et al., 2018). These shifts subsequently lead to local reorganizations of species communities and assemblages (Menéndez et al., 2006; Walther, 2010), often resulting in a relative increase of warm-demanding species and/or a decreasing number of cold-demanding species, a pattern referred to as thermophilization (Gottfried et al., 2012). Thermophilization has been reported globally from a wide variety of habitats and areas, from mountain tops (Gottfried et al., 2012) to forests (Zellweger et al., 2020) and from temperate (Pacheco‐Riaño et al., 2023) to tropical regions (Fadrique et al., 2018).
Studies on multi-decadal vegetation responses to climate change such as thermophilization commonly rely on permanent or semi-permanent plot-based datasets as the main source of information (Bertrand et al., 2011; Chytrý et al., 2014; Fadrique et al., 2018; Freeman et al., 2021; Götzenberger et al., 2012; Kapfer et al., 2017; Richard et al., 2021). Permanent plots are best suited to track vegetation dynamics, multiple initiatives have been set up in the beginning of the 21st century for continuous long-term monitoring (Chytrý et al., 2014; Gottfried et al., 2012; Haider et al., 2022). Although the number of permanent plots is continuously increasing, they are, however, still geographically scattered, and most cover relatively short time spans. Historical co-occurrence plots (herein broadly defined as species records co-occurring in a specific site), were initially done to describe the structure and diversity of vegetation types (e.g., phytosociological plots, also called relevés). Resurveys of these semi-permanent plots have also proven to be a valuable source of information to describe vegetation dynamics over decades (Kapfer et al., 2017), study range shifts (Felde et al., 2012; Lenoir et al., 2008; Rumpf et al., 2018) and thermophilization responses (Pacheco‐Riaño et al., 2023).
To study vegetation dynamics over time, an alternative to co-occurrence datasets is the aggregation of species occurrence records (i.e., presence-only data from individual species observations) such as museum and herbaria collections, or, more recently, observations from various structured or unstructured citizen science projects. Presence-only data have been collected extensively over the last century and their number has increased enormously over the past 20 years (Heberling et al., 2021). This type of data generally has more extensive temporal and spatial coverage compared to co-occurrence data due to the vast network of data collectors but comes at the cost of missing information about absences. The world’s largest biodiversity data network, mediated by The Global Biodiversity Information Facility -GBIF (http://gbif.org), stands as the leading open-access data portal for geo-referenced species occurrence data collected from a myriad of different sources (König et al., 2019; Wüest et al., 2020). GBIF provides access to more than 1.5 billion species records from across the globe and the tree of life. In addition to missing absence data, many records are prone to biases stemming from identification errors and spatially biased sampling due to the diversity of collectors and data sources, among other issues (Beck et al., 2014; Meyer et al., 2016). Therefore, these data are commonly considered unreliable for many community analyses and have so far only been exploited to a limited extent to assess community responses to global warming (Bottin et al., 2020; Duchenne et al., 2021; Feeley, 2012; Lajeunesse & Fourcade, 2023). The availability of presence-only data offers us the opportunity to investigate community responses in various ways. This includes using presence-only data for regions or species that lack historical co-occurrence data. Alternatively, we can utilize presence-only data for regions with limited data availability in the contemporary context. It’s also worth considering a combination of both approaches, where the integration of presence-only data and co-occurrence data can help mitigate some of the biases inherent to each data type.
Assessing thermophilization relies on species co-occurrences in a specific area combined with species-specific thermal indicator values to calculate community temperature indices (CTI), i.e., the (weighted) average of the thermal indicator values for species assemblages. The CTI approach is an effective way to summarize thermophilization trends by comparing changes in CTI over time (Duque et al., 2015; Feeley et al., 2020; Freeman et al., 2021; Richard et al., 2021). It can provide an unbiased estimation of thermophilization regardless of sampling differences, as long as there is no disproportionate collection of warm-demanding or cold-adapted species compared to their actual occurrences in the area (or vice versa). In other words, if the sampling efforts do not favour one type of species over the other in terms of their thermal preferences, the CTI can be assumed to accurately reflect the degree of thermophilization in each community. Presence-only data could thus hold a great potential to fill spatial and temporal gaps in studies of species dynamics, and, therefore, allow for a more comprehensive understanding of climate-driven responses of species across their geographical ranges (König et al., 2019).
One approach to quantify the CTI is a technique from paleoecology, the transfer function approach (Bertrand et al., 2011; Pacheco‐Riaño et al., 2023). Transfer functions are mathematical models that represent the relationship between species occurrences and environmental variables from a certain period, assuming that species have symmetrical, unimodal response curves with an ecological optimum for climate variables (Hutchinson, 1957). If the relationship between species and climate remains constant through time (ecological uniformitarianism) (Rull, 2010), the inferred species-climate relationship can be used to reconstruct past or present climates from community composition (Salonen et al., 2011). Transfer functions have been used extensively by palaeoecologists to reconstruct past climatic conditions from current relationships between species co-occurrences and climatic conditions (Guiot & de Vernal, 2007; Juggins & Birks, 2012; Schäbitz et al., 2013). Subsequently, community compositions from sediment cores are used to reconstruct paleoclimates (Birks & Simpson, 2013). This approach has been used to reconstruct various environmental conditions, from water chemistry (e.g. pH) using diatoms (ter Braak & Juggins, 1993), to temperature (Chevalier et al., 2020) and precipitation (Lu et al., 2019) using fossil pollen. A corresponding approach has recently been utilized in modern ecology to estimate thermophilization by inferring temperature from co-occurrence vegetation data based on a CTI approach (Bertrand et al., 2011; Pacheco-Riaño et al., 2023). In this case historical co-occurrence plots sampled prior to major climatic changes were used to calibrate a transfer function, which was subsequently used to project the CTI based on more recent vegetation plot data (Bertrand et al., 2011; Bhatta et al., 2018; Pacheco‐Riaño et al., 2023). Based on this approach, thermophilization can be estimated as the difference between the floristically inferred temperature (i.e., CTI) and the observed temperature from the calibrating period (Pacheco-Riaño et al., 2023).
Exploiting the vast amount of presence-only data to analyse the responses of communities to climate warming requires, however, a rigorous evaluation of robustness and reliability (Bayraktarov et al., 2019). Therefore, our aim was to answer two key questions. First, we wanted to determine if changes in community dynamics due to climate warming, as deduced from presence-only data from GBIF, corresponded with co-occurrence plot data. Second, we aimed to assess whether these two datasets could be used interchangeably, either individually or in combination, to yield similar community responses during the model calibration or prediction phases. In our study, we incorporated co-occurrence plot data from Norway alongside spatially and temporally aggregated presence-only data (referred to as pseudo-plots) in Europe. We intentionally employed a broader geographical scope for the presence-only data to avoid niche truncation, a benefit provided by the GBIF spatial coverage. Within this context, we assessed the difference in CTI and thermophilization index. We hypothesized that both types of data would exhibit a consistent pattern and could be employed interchangeably for our analyses.