Using data to enable conservation planning
Most forms of biodiversity data can be used for environmental management
and conservation planning, and tailoring analysis to the available data
is critical. Target development can use two major approaches, either
focusing on species distribution mapping, or, if data of sufficient
resolution, quality and quantity are not available, attempting to map
diversity patterns and reconcile richness patterns in the face of bias.
If sufficient data are available either models or filtered convex
polygons can be used to map species ranges. Whereas when insufficient
data are available and richness is mapped (Liu et al., 2022; Orr et al.,
2021; Potapov et al., 2023), inventories of species richness are used
and richness itself is reprojected using models. These approaches can
reproject richness patterns to a reasonable degree, if sufficient
inventories have been carried out across all major environmental
conditions, and assuming that biogeographic differences will not
influence overall richness patterns. These approaches are useful in
groups where insufficient data are available for higher resolution
analysis, and can also be used to identify areas for further research if
there is a potential for high hidden diversity (Orr et al. 2021; Kass et
al. 2022).
For large spatial or taxonomic scopes, if species-level analysis is
impossible (the majority of global analyses) then interpolation-based
methods are likely to be the most appropriate. In these types of
studies, one should employ either a subsampling approach (Qiao et al.,
2023) or interpolation based on community level inventories (to model
richness overall rather than individual species ranges: Orr et al., 2021
Liu et al., 2022; Potapov et al., 2023). For subsampling, there is still
a minimum data requirement, as most areas lack data. Thus, for well
sampled taxa such as birds, it can be fairly widely applied (e.g. almost
all urban areas), but there is so little sampling for most taxa that an
index-based approach may not be possible without then interpolating.
Approaches based on biodiversity indices (Hill numbers, Shannon,
Simpson, etc.) all require both a minimum number of samples and a
minimum coverage (Qiao et al., 2023). Species area curves are a common
way to estimate completeness for any given region, yet these assume
representative coverage throughout that region and a localised inventory
of a small proportion may asymptote even when it is not representative
of the whole area to which the assessment is applied. Thus, for such
curves to be useful first assessments of the percentage of the area with
data is needed.
For many taxa, including most invertebrates, interpolation based on
modelling is needed. Such methods rely on interpolating richness based
on community-level samples and using species modelling techniques to
relate richness data to conditions present. Inevitably, this method also
involves assumptions about the representativeness of the data. For a
community projection approach, a minimum sample-size and species number
should be used (to remove the possibility of selective sampling or
overrepresentation of generalists to the neglect of specialists), and
all biome types should be represented so that the richness (or richness
index) of these varying biomes and conditions can be assayed. However,
it should be noted that such an approach will assume that there are no
biogeographic variations in drivers between regions, and consequently
cannot be applied to oceanic islands, as such models cannot inherently
incorporate biogeographic processes or dispersal. Thus, for
interpolation approaches to be applied, the number of records per
species, and even to a degree the accuracy of identification within
sites is less important (provided it is consistent within a site), and
provided there is coverage across environmental conditions these
approaches provide a powerful mechanism for global analysis, enabling
analysis even in poorly-known regions.
For species-specific approaches, both the volume and accuracy of the
data must be substantially higher, as they are much more vulnerable to
spatial bias and sensitive to data errors, with even greater
consequences for poorly known species. Firstly, data must be clean and
accurate for any species-level assessment, so cleaning checks and
filtering of bad records is a critical first step (see Box 1). The first
question is whether the data are sufficiently representative for species
level analysis both in terms of taxa, and the region under analysis,
furthermore any form of species level assessment requires sufficient
data to assess range. When examined critically, public data sources
alone are insufficient for modelling most species (Garcia-Rosello et
al., 2023), even across vertebrates, so some of the most diverse regions
might be underestimated.
Sophisticated models can be developed for well-sampled individual
species using approaches such as Maxent, or other species niche
modelling methods can be applied (though many of these will map all
relevant habitats, lacking any geospatial reference point to
differentiate functional and realised niches). However, such models have
very high data requirements, as sufficient and even data must exist from
across a species’ range to effectively model its distribution and pair
it with environmental characteristics. This means that, unless
considerable effort is devoted to collating representative global data
with many partners, or taxa are already well sampled, sophisticated
models may not be representative or appropriate. Assessing these models,
not only using statistical approaches (AUC, Boyce index, AIC) alone is
also not sufficient, and work with experts to assess if ranges capture
species ranges is also likely needed to assess whether they are
reliable, and also recognise biogeographic boundaries (which may be
missed in models, especially in complex areas or where there are major
differences between fundamental and realised niche). MCPs may also be
used when data are scarce or analysis is regional, but understanding how
to curate data is a first essential step before mapping species ranges
(Zizka et al. 2019; Ribeiro et al. 2022; Dorey et al. In review ).
Filtering for
success
For basic analysis of large numbers of species, automated and repeatable
pipelines are critical. Creating an MCP is one method to delimit the
majority of a species’ known range. However, for vertebrates it has been
known for centuries that species have finer-grain habitat requirements,
and even in IUCN maps the need to refine habitat within the range
polygon is becoming a basic standard (Lack, 1953; Brooks et al., 2019);
points may completely surround cities or other unsuitable regions, yet
the species may no longer be present there. Failure to remove clearly
unsuitable habitat would both dramatically increase range size and could
reduce the proportion of range protected (as much of a city is
developed). Coastal filters are also needed, as a failure to
realistically trim MCPs may render oceans suitable for land animals.
Sensible filters can transform species ranges and entirely rearrange
diversity patterns. To demonstrate how decisions on data-refinement and
cleaning impact on range sizes and degree of protection, we selected a
range of species and imposed different levels of filtering on the data,
all of which can be conducted with small datasets, or when some species
may have small volumes of data available. This includes adding spatial
filters, adding a habitat filter, trimming by coastline, and comparing
it to known IUCN ranges for species. It should be noted that most IUCN
ranges are also inaccurate and overinflate species ranges (Li et al.,
2019; Hughes et al, 2021c), yet uncritical MCPs are exponentially larger
(whilst still missing parts of the range as they will not capture
species range limits, where abundance is typically lower). For example,
an IUCN range is only 7-8% the size of those recovered using basic MCPs
for the species shown here (as in Chowdhary et al., 2023a). If these
ranges are being mapped to assess hotspots for protection, or the degree
of protection, then the area covered and the location will entirely
determine the outcomes of assessment, and if care to filter data
appropriately is not applied, then analysis on such data may have little
relationship with the real patterns of distribution or degree of
protection of species.
Even when more carefully delineated ranges (IUCN, birdlife, GARD:
http://www.gardinitiative.org/) are likely to overestimate the degree of
protection, their area is still smaller than an MCP, especially if a
habitat filter is not applied (Table 1). We used a general habitat
filter, so more specialist filters and other steps outlined throughout
could greatly improve range estimates and make them more similar to
those in expert range maps (de Barros et al., 2021; Huang et al., 2020;
Xu et al., 2022). In all cases, the lack of filtering means ranges are
projected as many times larger than they are likely to be. Thus, as we
show here, the cleaning of data can transform where species are mapped,
richness patterns, and the efficacy of protection. We selected a range
of species for which sufficient data exist to map ranges, and where the
IUCN has mapped ranges for comparison (thus most of our examples are
mammals, though one bee, B. dahlbomii , is also present), our
previous work has also examined the prevalence of biases in these types
of data, and how they persist across taxa (Hughes et al2021b, 2021c; Li
et al., 2019).
Table 1. Percentage of species range protected with different filters
applied for species minimum convex polygons (MCP), as well as for
International Union for the Conservation of Nature (IUCN) ranges. The
filters that were applied are noted in column headers: Hem-hemisphere
filter, Coast-removal of ocean areas within the polygon, habitat-a
simple habitat filter based on basic classifications of land-use types.
We used the species Ailuropoda melanoleuca (Carnivora: Ursidae),Bombus dahlbomii (Hymenoptera: Apidae), Panthera onca andPanthera tigris (Carnivora: Felidae), Priodontes maximus(Cingulata: Chlamyphoridae), Tapirus pinchaque and Tapirus
terrestris (Perissodactyla: Tapiridae).