2. METHODS

2.1 Study Area

The Chagos Archipelago is located in the central IO at 6° S and 72° E at the southern limit of the Chagos-Laccadive ridge, and is over 1,500 km from the nearest continental land mass (Carr, 2012). Fifty-five islands are clustered within the atolls of Diego Garcia, Peros Banhos, Salomon, Egmont, and on the Great Chagos Bank (Figure 1a) and constitute combined approximately 60 km2 of land area. The territory encompasses approximately 60,000 km2 of shallow photic reefs, and 580,000 km2 of primarily oceanic habitat, with a maximum depth over 6,000 meters (Carr, 2011; Dumbraveanu & Sheppard, 1999). The climate is tropical, characterised by oceanic conditions and the seasonal reversal monsoon (Sheppard, 1999). Situated in the inter-tropical convergence zone (ITCZ), the archipelago has moderate winds generally from the north-west (October to April) and the south-east (May to September). Sea surface temperature has an approximately bimodal distribution with maxima in December–January and March–April with a yearly mean of 28°C (Pfeiffer, Dullo, Zinke & Garbe-Schönberg, 2009).

2.2 Seabird observations

In order to identify the influence of oceanographic conditions and island rat infestation on seabird distribution, we conducted a multiyear survey of the archipelago of seabirds at sea. The survey was conducted conducted from 2012 to 2017, between November and April, to overlap with the moderate phase of the monsoon. This period generally coincides with peak breeding activity in the Chagos Archipelago (Carr, et al., 2019; Carr, 2011; Carr, 2015). During the months of sampling, the BIOT marine reserve and the IO experienced two seasons of modestly positive IO Dipole (during 2012 and 2013), which was followed by three neutral IO Dipole events (2014-2016) and by one very positive event (2017; NOAA Earth System Research Laboratory [NOAA ESRL], 2017). Seabird count samples (n = 425) were conducted from a marine vessel during six expeditions. Three different sample types were generated: Transect counts (n = 329) were generated during vessel transit, by adapting the method of Tasker, Jones, Dixon & Blaker, (1984). Each transect count had a duration of 30 minutes, during which the vessel typically steamed at 12 knots and travelled c.11 km. Aggregation counts were generated opportunistically during any seabird feeding aggregation (n = 87). The birds within the aggregations were counted until all birds had been counted (median duration 60 min; Letessier et al., 2016). Finally, point counts (n = 9) were generated when the vessel was stationary (nominal count duration 30 min). All samples were generated within a 180° arc forward of the ship, out to approximately 300 meters (Table 1, Figure 1, Appendix S1). All seabird observations were conducted by Pete Carr, a co-author of this manuscript and an expert on seabirds within the Archipelago (e.g. Carr, 2011; Carr, 2012; Carr, 2015). This consistency in observer eliminates a potential source of bias. Observations were predominantly made in proximity to the islands and the shallow reefs (Figure 1b-1g).

2.3 Oceanic habitat modelling

2.3.1 Response variables

In order to model the oceanic distribution of seabirds, we selected the most frequent and abundant seabird families in the BIOT marine protected area as our response variables. This comparatively high-level taxonomic classification allowed us to generate more statistical power by increasing our counts. This grouping approach requires the assumption that taxonomically similar species have similar ecological requirements, in relation to habitat-use or energetic needs (Mannocci, Catalogna, et al., 2014; Mannocci, Laran, et al., 2014). The oceanic seabird distributions were modelled based on geomorphic and oceanographic variables using Generalised Additive Models (GAM; Wood, 2006), accounting for the different sampling types (Appendix S2). The GAMs were fitted using individual family count per sample (a proxy for abundance) as the response variables, against all possible combination of four of six variables (depth, slope, year, sea level anomalies [SLA], sea surface temperature [SST] and chlorophyll-a concentration [CHL]). We avoided highly correlated variables (Spearman coefficient, r > 0.60 and < -0.60) in the same model, following Mannocci, Laran, et al. (2014) and retained the models with the lowest generalised cross-validation score (GVC). We used the explained deviance to evaluate the explanatory power of the models. GAMs were fitted using the mgvc package in R (R Development Core Team 2017 version R version 3.3.3) that determines the degrees of freedom for each smoother internally when fitting the model (Wood, 2006). Splines were limited to three knots in order to maintain ecological sense and to avoid overfitting (Mannocci, Laran et al., 2014).

2.3.2 GAM Predictions

Spatial predictions in unsampled areas were limited to the convex hull defined by the BIOT marine reserve and restricted by the range values of the variables used to build each model. This ensured that predictions were only made in areas with similar environmental conditions. Using this approach, we avoided extrapolating beyond the range of the model, whilst generating meaningful predictions beyond our sampled area (Yates et al., 2018). Whenever [year] was retained, we rendered predictions set at the last year of sampling, in 2017. Uncertainty for each model was derived from the Bayesian covariance matrix of model coefficients (Wood, 2006). We rendered predictions and modelled uncertainty on a 0.4 x 0.4 decimal degree resolved grid. This resolution is considered a reasonable trade-off in order to capture distribution for species with uncertain range sizes (Seo, Thorne, Hannah, & Thuiller, 2008).

2.4 Modelling the effect of rat infestation

We hypothesise that seabird distribution is sensitive to rat-infestation on islands and that this sensitivity restricts seabird distributions in the water adjacent to infested islands. We modelled the effect of rat infestation on seabird distribution at sea using Boosted Regression Trees (BRT). BRT are considered an advanced form of regression (Friedman, Hastie & Tibshirani, 2000) that use boosting to combine and adapt large numbers of relatively simple tree models, enabling performance optimization (Elith, Leathwick & Hastie, 2008). A pair of BRT models were fitted each for the Laridae, Sulidae and Procellariidae families (Appendix S3). The models were fitted using the set of variables selected by the GAMs. In addition, the first model of the pair was built with the additional inclusion of the variable ‘distance to the closest rat-free islands (km)’. This model was considered to represent bird distribution at its theorised maximum abundance, in the absence of any rat effect. The second BRT was built the additional inclusion of the variable ‘distance to the closest rat-infested islands (km)’. For each BRT we also included a nearest island area variable (in m2), to account for the potential effect of landmass availability. To reduce variability, we used transect counts samples only. BRTs were fitted following the methodology and adapting the code in Elith et al. (2008) and the gbm package in R (R Development Core Team 2017 version R version 3.3.3). The BRT models were fitted using a trade-off between learning rate and numbers of trees (Elith et al., 2008; D’agata et al., 2014).
We identified thresholds (break-point) to which seabird distribution is influenced by the distance to rat-free or rat-infested islands, using a Davies’ test. To find significant differences between families and whether the island was rat-free and rat-infested islands, break-points (BP) were determined with the 95% confidence intervals (CI). To determine the net gain in seabird abundance following a scenario of an archipelago wide rat eradication programme, we subtracted the difference of the predictions resulting from the rat-infested models from the predictions of the rat-free models. The predictions were mapped only where the nearest island was rat-infested since we assume that no new islands will be infested, showing net gain and net lost in seabird distribution.