Copepod data
Copepods are identified to species whenever possible, and shared training and staff exchanges amongst surveys ensure comparable data. Names of all copepod taxa were updated using the World Register of Marine Species (WoRMS) (http://www.marinespecies.org/). Data were available for 388 taxa and from 100,326 samples.
To test Bergmann’s Rule, we calculated the mean length of copepods in each sample and related this to environmental drivers. Maximum and minimum lengths of different copepod taxa were obtained from the Marine Planktonic Copepod Database from the Observatoire Océanologique de Banyuls-sur-Mer (https://copepodes.obs-banyuls.fr/en/index.php, Razouls et al. (2020)) and from Richardson et al. (2006). Because juveniles are more difficult to identify to species and reliably assign a size than adults, they were not included in the estimate of mean copepod size for each sample, although adults tend to be more common in CPR samples (Richardson et al. 2006). Each taxon was assigned a single size (the midpoint between their minimum and maximum lengths). We calculated the mean length of copepods in a CPR sample by multiplying abundance of adults of each taxon in the sample by their assigned length, and then dividing their sum by the total abundance of all adults within the sample.
Bergmann’s Rule could also potentially be influenced by spatial differences in the trophic structure of communities because body size is linked to the ecological role of a species (Woodward et al. 2005; Yvon‐Durocher et al. 2011). Because carnivorous copepods are generally larger than omnivorous ones (Mauchline 1998; Fig. S1 in Supporting information), which could affect observed relationships, we distinguished obvious differences in diets among taxa. We assigned a diet (carnivore or omnivore and herbivore combined, hereafter called omnivore) for each copepod taxon using a combination of dietary studies from the literature (see Richardson and Schoeman (2004)) and morphological differences in copepod mouthparts (Huys & Boxshall 1991). To calculate the proportion of omnivores we divided the summed abundance of omnivore within each sample by the total abundance of copepods within samples.
Environmental data
We used sea surface temperature (SST) as an estimate of ocean temperature for the near-surface CPR samples, and chlorophyll-aconcentration (Chl-a ) as a proxy for food availability to copepods. Chl-a is correlated with copepod growth and fecundity in herbivorous and omnivorous species (Richardson & Verheye 1998, 1999; Hirst & Bunker 2003; Bunker & Hirst 2004), and more Chl-a leads to more of these grazers and thus more carnivorous zooplankton (Richardson & Schoeman 2004). As in situ SST and Chl-aare not collected routinely on all CPR tows, we used remotely-sensed data. We matched the location and time of collection of CPR samples with estimates of SST and Chl-a averaged over eight days prior to sampling to limit loss of data due to cloud cover and to represent feeding conditions over the recent past. For SST, we used daily Group for High Resolution SST data, a cloud-free global product based on satellite, buoy and ship data, and interpolated at 0.2°×0.2° resolution (www.ghrsst.org). For surface Chl-afrom September 1997 until the end of 2016, we used daily data from the European Space Agency Ocean Colour Climate Change Initiative data (esa-oceancolour-cci.org). Because this is a merged product from MERIS, Aqua-MODIS, SeaWiFS and VIIRS satellites, these data provide better coverage than products from a single satellite, and are more accurate globally (Mélin et al.2017). For 2017 and 2018, we combined data from Aqua-MODIS and VIIRS, the only available satellites. Chl-a data were retrieved at 4 km resolution and aggregated to 0.2°×0.2° resolution to limit loss of data due to cloud cover and match the resolution of SST data. As there are no robust satellite Chl-a data before 1997, all analyses used CPR samples collected after this time.
We estimated predation rates on copepods by calculating the abundance of larger predatory invertebrates in CPR samples. We summed abundances of the primary invertebrate predators of copepods, including chaetognaths, amphipods, cnidarians, decapods, euphausiids and siphonophores. Although invertebrate predation is a crude indicator of total predation, it is collected concomitantly with the copepod data we used for the size analysis. We could not estimate potential fish predation because there is no global abundance dataset over time and space scales matching the CPR dataset.
To investigate whether oxygen is a driver of Bergmann’s Rule (Forsteret al. 2012; Rollinson & Rowe 2018), we used the World Ocean Atlas 2018 (WOA-18) (www.nodc.noaa.gov/OC5/woa18/). We had to use much coarser-resolution data (monthly climatology averaged at 1°×1° resolution) because we could find no observed global oxygen product at a finer resolution.
Statistical analyses
To test Bergmann’s Rule, we used a model-building approach. We fitted a weighted generalised linear mixed-effect model (GLMM, Bates et al. 2011), with fixed effects for all predictors (SST, Latitude, Oxygen, Chl-a , predator abundance, and the proportion of omnivores (Table S1). We included the proportion of omnivores in the model because omnivorous copepods are generally smaller than carnivorous ones (Fig. S1). Upon visual inspection of model residuals, their magnitude generally increased at higher fitted values. We thus used a gamma error structure with a log-link function (there were no zero values for size). Because estimates of mean length of copepods in a sample are more precise when there are more specimens, we weighted observations in the GLMM by the square-root of total copepod abundance within the sample.
To account for temporal and spatial correlation in the data, we used three random effects associated with the time and location of sampling. The first was a random intercept associated with differences amongst CPR surveys, which accounts for the use of different vessels, large-scale regional effects, and any minor methodological differences. The second was a random intercept of Tow within Survey, which adjusts for both temporal and spatial differences amongst tows. The last was a random slope for days elapsed on Tow within Survey, which accounts for spatiotemporal autocorrelation amongst samples on tows, which arises due to differing weather and collection conditions throughout the period of the tow. This helped remove any general linear trends over the period of the tow.
Due to the large number of samples (>100,000) and thus high statistical power, we did not assess the significance of predictors. Rather, we selected the best model using the Bayesian information criterion (BIC), which is based on the goodness of fit (log-likelihood ratio) relative to model complexity (number of parameters). BIC is suitable to fitting heuristic models with large numbers of observations; it more harshly penalising overfitting than commonly-used Akaike information criterion (Schwarz 1978; Aho et al. 2014).
From a preliminary analysis we found that SST, Latitude and Oxygen were strongly correlated (all r>0.59, Fig. S2) and could not be included together in the model. We thus first assessed their relative importance in separate models with the other predictors (Chl-a , predator abundance, and proportion of omnivores), and then retained the most important variable amongst SST, Latitude and Oxygen for inclusion in subsequent analysis.
We used the pseudo R-squared described in Nakagawa & Schielzeth (2013) to estimate the proportion of variation explained by the gamma GLMMs (Nakagawa et al. 2017). To interpret the ecological relevance of the drivers, we evaluated effect sizes using expected relationships of the mean length with predictors (Nakagawa & Cuthill 2007; Sullivan & Feinn 2012). We also evaluated ecological relevance by converting copepod length estimates to body mass (using Wet-Weight (mg) = 0.03493 × Length (mm)2.9878, Pearre Jr., 1980) because the food that is available to higher-level predators such as fish is related to body mass rather than size.
To assess the effect of influential points within the model, we performed sensitivity tests by iteratively identifying and removing groups of outliers and high-leverage observations; these procedures had no impact on overall results. Results presented thus include all available data. By mapping residuals globally both with and without random effects (Fig. S5), we confirmed that the use of random effects reduced spatial autocorrelation among samples; plotting residuals through time within Surveys and on Tows confirmed that our models reduced temporal autocorrelation. The code for the statistical analyses is publicly available on GitHub (www.github.com/maxcampb/Bergmann-Rule-Copepods).