We used random forests to model seabird foraging niches, parameterising each model with the 10 oceanographic variables shown in Table 2. Random forests are robust to both challenges present in spatial data (e.g. autocorrelation) and in hierarchically structured data (e.g. global trends but local variation) (Evans et al. 2011, Doherty et al. 2016). We used two approaches for each modelled species: 1) a multi-colony model, where datasets from multiple colonies were combined in a single model; and 2) colony-specific models, where each colony was modelled separately. The multi-colony model was designed to learn from the foraging niches of each colony and generalise a global foraging niche to predict foraging habitat suitability. Colony-specific models were designed to test the ability of local foraging niche to predict local foraging habitat suitability, investigate local adaptation across species ranges, and to make accurate models with GBR tracking where available (Fig. 2).
To assess model predictive performance, we used the threshold-independent measure, area under the receiver operating characteristic curve (AUC; Fielding & Bell 1997). AUC values of 0.5 are equivalent to random prediction, 0.6-0.7 indicate poor performance, 0.7-0.8 moderate performance, and >0.8 indicate good performance. We validated models by assessing performance (AUC) when predicting to different colonies, which we use as our measure of model transferability. For multi-colony models, we used leave-group-out cross validation: iterating through n colonies, we trained the model onn -1 colonies and predicted to the remaining colony. For colony-specific models we predicted to all other colonies. For both model types, we calculated global transferability (mean predictive performance across all other colonies). To assess internal model performance (colony-specific models predicting to their training colony) we performed internal spatial cross validation. We used the 4-fold clock method, where each dataset was split spatially into quarters (by k-means clustering of data coordinates), and models iteratively trained on three quarters of the data with predictive performance assessed on the remaining quarter (Guillaumot et al. 2019).
For both multi-colony and colony-specific random forest models, hyperparameters were tuned to optimise predictive performance between colonies. Model parameters, mtry (number of environmental covariates randomly chosen per node: limits tree strength and correlation) and minimum node size (number of datapoints per tree node: limits tree depth and thus model complexity) were tuned across values 2, 3, 4, 5, 6 and 5, 10, 20, 50, respectively. Models were firstly tuned for optimal predictive performance on GBR test data (for available species), and secondarily to generalise across all other colonies. When assessing model predictive performance locally, the same hyperparameters were tuned to optimise predictive performance on the training colony during internal spatial cross validation. Random forest models were constructed using the ranger (0.11.2) R package (Wright & Ziegler 2017) and tuned using the caret (6.0-82) R package (Kuhn 2008).
To investigate local adaptation in each modelled species we first described inter-colony transferability by summing the two colony-specific model AUC values of each colony pair (AUCcolonyA → colonyB + AUCcolonyB → colonyA), before entering pairwise colony sums into a distance matrix. We then tested whether geographically closer colonies were more transferable by correlating the inter-colony transferability matrix with a matrix of pairwise colony geographical distance, and whether colonies with more similar oceanographic habitat were more transferable by correlating the colony transferability matrix with a matrix of pairwise colony oceanographic distance (matrix of Euclidean distances between colonies from mean oceanographic variable values within their accessible habitat). We tested significance of correlations with Mantel tests. To help visualise groups of colonies that were transferable to each other we performed hierarchical clustering (using the average method) on the inter-colony transferability matrix.