We used random forests to model seabird foraging niches, parameterising
each model with the 10 oceanographic variables shown in Table 2. Random
forests are robust to both challenges present in spatial data (e.g.
autocorrelation) and in hierarchically structured data (e.g. global
trends but local variation)
(Evans
et al. 2011, Doherty et al. 2016). We used two approaches for each
modelled species: 1) a multi-colony model, where datasets from multiple
colonies were combined in a single model; and 2) colony-specific models,
where each colony was modelled separately. The multi-colony model was
designed to learn from the foraging niches of each colony and generalise
a global foraging niche to predict foraging habitat suitability.
Colony-specific models were designed to test the ability of local
foraging niche to predict local foraging habitat suitability,
investigate local adaptation across species ranges, and to make accurate
models with GBR tracking where available (Fig. 2).
To assess model predictive performance, we used the
threshold-independent measure, area under the receiver operating
characteristic curve
(AUC;
Fielding & Bell 1997). AUC values of 0.5 are equivalent to random
prediction, 0.6-0.7 indicate poor performance, 0.7-0.8 moderate
performance, and >0.8 indicate good performance. We
validated models by assessing performance (AUC) when predicting to
different colonies, which we use as our measure of model
transferability. For multi-colony models, we used leave-group-out cross
validation: iterating through n colonies, we trained the model onn -1 colonies and predicted to the remaining colony. For
colony-specific models we predicted to all other colonies. For both
model types, we calculated global transferability (mean predictive
performance across all other colonies). To assess internal model
performance (colony-specific models predicting to their training colony)
we performed internal spatial cross validation. We used the 4-fold clock
method, where each dataset was split spatially into quarters (by k-means
clustering of data coordinates), and models iteratively trained on three
quarters of the data with predictive performance assessed on the
remaining quarter (Guillaumot et al. 2019).
For both multi-colony and colony-specific random forest models,
hyperparameters were tuned to optimise predictive performance between
colonies. Model parameters, mtry (number of environmental covariates
randomly chosen per node: limits tree strength and correlation) and
minimum node size (number of datapoints per tree node: limits tree depth
and thus model complexity) were tuned across values 2, 3, 4, 5, 6 and 5,
10, 20, 50, respectively. Models were firstly tuned for optimal
predictive performance on GBR test data (for available species), and
secondarily to generalise across all other colonies. When assessing
model predictive performance locally, the same hyperparameters were
tuned to optimise predictive performance on the training colony during
internal spatial cross validation. Random forest models were constructed
using the ranger (0.11.2) R package (Wright & Ziegler 2017) and
tuned using the caret (6.0-82) R package (Kuhn 2008).
To investigate local adaptation in each modelled species we first
described inter-colony transferability by summing the two
colony-specific model AUC values of each colony pair
(AUCcolonyA → colonyB + AUCcolonyB →
colonyA), before entering pairwise colony sums into a distance matrix.
We then tested whether geographically closer colonies were more
transferable by correlating the inter-colony transferability matrix with
a matrix of pairwise colony geographical distance, and whether colonies
with more similar oceanographic habitat were more transferable by
correlating the colony transferability matrix with a matrix of pairwise
colony oceanographic distance (matrix of Euclidean distances between
colonies from mean oceanographic variable values within their accessible
habitat). We tested significance of correlations with Mantel tests. To
help visualise groups of colonies that were transferable to each other
we performed hierarchical clustering (using the average method) on the
inter-colony transferability matrix.