Discussion

Our assessment demonstrated the potential of employing presence-only data for evaluating community responses to climate warming, specifically by quantifying the Community Temperature Index (CTI) and thermophilization. The aggregation of presence-only data in pseudo-plots yields CTI values remarkably consistent with those obtained by co-occurrence plots, which are conventionally perceived as more reliable (Bertrand et al., 2011; Pacheco‐Riaño et al., 2023). Presence-only data typically originate from museum collections and citizen science projects. As such, they often suffer from different spatial and taxonomical biases, e.g., variable sampling density per area and low-quality control of species identification (Beck et al., 2014). Despite the inherent biases and errors associated with presence-only data, our study reveals a consistent estimate of the CTI for the paired plots and a consistent temporal trend in thermophilization from the two data sets. This consistency can be attributed to the absence of biases in species observations regarding their temperature indicator values. In simpler terms, there is no tendency to record more cold-adapted or warm-adapted species in presence-only data compared to co-occurrence data. Even if species in an area is under-sampled in the presence-only pseudo-plots, the likelihood of observing a species, regardless of its temperature indicator value, remains constant. Likewise for the misidentification of species, there is no relationship with consistently higher or lower thermal indicator values. In addition, even if a proportion of the species was misidentified, we expect this to be minor compared to the substantial number of correct identifications and thus expect misidentification to have a minor impact on our analyses.
By including presence-only data in CTI analysis, we are able to cover larger geographic areas than based on traditional co-occurrence plots alone (König et al., 2019). This can be useful towards global completeness and representativeness of species data and producing more realistic projections of the community’s responses. Larger temporal and spatial coverage also provide improved opportunities to unravel the impacts of various climatic drivers and how climate interacts with other variables. This becomes particularly valuable in situations where co-occurrence surveys are restricted, either in terms of spatial or temporal coverage. The inclusion of presence-only data also provides a cost-effective way of monitoring ongoing dynamics in species communities, an alternative to more intensive co-occurrence surveys, which are often considered information of the highest quality but can be time-consuming and expensive (Dengler et al., 2011). This can also be particularly advantageous in areas where field data is difficult to obtain or in cases where intensive fieldwork is not feasible. However, in both datasets, there are still some underrepresented areas like the tropics or continents like Africa, particularly those associated with colder climates and more remote areas, such as higher latitudes and elevations.
Although the CTI values were very similar, we observed that the differences in predicted CTI values between the two datasets (Δ CTI) were more pronounced towards the colder end of the thermal gradient. This deviation could potentially be attributed to the distinct methodologies employed in data collection. Co-occurrence data is often gathered through more or less organized expeditions and may as such cover a more representable distribution of the topographic relief in an area including higher and more remote areas. The presence-only data, however, are often compiled from more random observations made in unplanned citizen science projects. In colder areas, which in most cases will entail mountain regions in our study area, the more accessible parts are in the valley bottoms (where roads are placed) resulting in a bias towards lower elevations in topographically heterogenous areas. Our ad hoc analysis substantiated this expectation (Fig. 3 ), revealing that co-occurrence plots are predominantly situated at higher elevations in areas where CTI from pseudo-plots is overestimated compared to the co-occurrence plots. Being aware of this, it would be possible to adjust for this potential bias by also incorporating elevation when aggregating species to pseudo-plots, or by adjusting for the bias in CTI by using the observed elevations of the presence-only data. This would be especially important when comparing pseudo-plots, or when combining co-occurrence plots and presence-only data in areas of different topographical relief.
Overall, we noticed that the variations among the CTIs are comparable when calibrating the transfer functions using all three types of datasets. However, the difference observed in the model relying only on the co-occurrence dataset can be attributed to the extent of the dataset employed for calibrating the model. We found that including plant communities from a larger area improved the transfer function, notably when including communities with thermophilic species from lower latitudes (e.g., from central Europe). This inclusion effectively addresses the issue of niche truncation in the warmer end of the thermal range. However, including presence-only data from outside our focus area did not improve the accuracy of the CTI values at the colder end; this is likely due to the lack of species in the dataset from higher latitudinal areas or elevations. This consideration is of particular significance given the ongoing global warming trend, as adjusting the overestimation of the cold end would be less necessary as there will be a higher representation of thermophilic species in the communities. However, giving special attention to species adapted to warmer conditions holds significant relevance in Norway, given its predominantly cold climate. In contrast, in locations with milder temperatures, especially those not situated at high latitudes or elevations, relying on presence-only data would help mitigate the truncation of species at both the cold and warm extremes. Moreover, we saw that the differences were larger for older assemblages than for more recent ones, which could be attributed to the improved accuracy of the newer information.
It is important to note that the accuracy of the CTI values produced by pseudo-plots will depend on the quality of the presence-only data being used. The exponential growth of presence-only data records in the last two decades has resulted in greater public access to these records (Jin & Yang, 2020). However, before using these data for ecological analyses, cleaning and standardization procedures must be undertaken. This step is especially important due to the varying sources and properties of the data, as temporal, spatial, and taxonomic criteria all need to be considered (Meyer et al., 2016).
Integrating presence-only data into species distribution modelling has been widely used (Beck et al., 2014; Pacifici et al., 2017; Smith et al., 2023). However, very few studies explore their use in understanding community responses, such as thermophilization, to environmental changes (Feeley, 2012). Our study demonstrates that by including presence-only data, we can better understand and learn from their advantages and drawbacks in biogeographical analyses. Although the outcomes produced by this method may have some flaws, they could be the first approximation for many regions and taxonomical groups and provide a good starting point for further research. As demonstrated here, open data portals, such as GBIF, can be utilized to consolidate datasets that are used to analyse communities’ responses to environmental change. To bridge existing data gaps, the digitization and mobilization of scientific biological collections and personal archives of researchers must be continued. This will help to monitor species composition, abundance, and diversity changes and identify potential threats to ecosystem functioning, leading to a better understanding of the environmental impact of global climate change.
Our study compellingly indicates that presence-only data can be used to estimate community indicators (e.g., CTI) accurately. It serves as an additional source of information to complement more traditional co-occurrence plot-based datasets. Our findings suggest that the data integration of presence-only data can be used to improve the calibration of transfer function models and our understanding of vegetation responses to climate change. Nevertheless, the overall patterns and trends of thermophilization remain largely consistent across the two datasets, suggesting that the thermophilization values are not significantly different when using pseudo-plots. Even more important is the fact that both datasets show a consistent trend in thermophilization, independent of the calibration dataset. We additionally presented an outline that can be used to study community responses for global change research. Our main findings, therefore, demonstrate that presence-only data can be used to quantify thermophilization. Though some careful attention is needed when integrating traditional co-occurrence plots with presence-only data, there is a substantial potential to unlock new opportunities for rapid and cost-effective monitoring of communities in response to changes in climate.