Temperature and species distribution data sources
We used National Oceanic and Atmospheric Administration (NOAA) long-term surveys from three temperate marine continental shelf regions in the US: the Northeast (annual spring survey 1968-2018), the West Coast (triennial fall survey 1977-2004 and annual fall survey 2003-2018), and the Eastern Bering Sea (annual summer survey 1989-2018; earlier years omitted due to limited spatial extent (Lauth and Conner 2014)). These surveys use trawl gear and a random stratified or fixed station sampling design to target demersal and benthic fishes and invertebrates on the continental shelf, up to several hundred meters deep. The West Coast and Eastern Bering Sea raw datasets were downloaded using the R package “FishData”, and the Northeast data were obtained from the 2019 OceanAdapt release (Stuart and Pinsky 2019), a data portal to access NOAA trawl survey records (Lauth and Conner 2014, Politis et al. 2014, Keller et al. 2017). The Northeast dataset, which was pre-processed by OceanAdapt for quality control and taxonomic accuracy, contained records for 74 species. We limited our West Coast analysis to the 54 species that were recorded in both the triennial and the annual surveys. In the Eastern Bering Sea dataset, we downloaded data on the 100 most frequently observed taxa and proceeded with analysis for the 81 taxa that were identified to species. We retrieved higher taxonomy for all species using the R package “taxize” (Chamberlain and Szöcs 2013) and grouped species as either fishes (belonging to classes Actinopterygii or Elasmobranchii) or invertebrates (everything else). All data processing and analyses were conducted in R version 3.6.0 (R Core Team 2018). All code used is available at: https://github.com/afredston/range-edge-niches.
Throughout our analysis, we compared distribution data for a given species with temperature data from the preceding 12 months—specifically, the 12 months preceding the earliest possible start month for each region’s survey for analysis (March in the Northeast, May in the West Coast, and July in the Eastern Bering Sea). For example, range edges derived from the spring 1999 Northeast survey were compared to temperature records from March 1998 to February 1999.
We used two historical sea surface temperature (SST) datasets. The NOAA NCEI Optimum Interpolation Sea Surface Temperature dataset (NOAA NCEI 2018) is available daily from 1982 onward at 0.25ºx0.25º resolution; we downloaded this dataset for all regions. We also downloaded the Hadley Centre Global Sea Ice and Sea Surface Temperature dataset, available monthly at 1º resolution from 1870 (Rayner 2003) to fill in earlier years for the Northeast and West Coast regions. To ensure comparability between the two data sources, we performed mean bias correction and converted the daily SST records from the higher-resolution OISST dataset into monthly means for each grid cell (see Appendix 1); all temperature metrics described henceforth are based on monthly mean SSTs. We tested for change in mean, minimum, and maximum monthly regional SSTs over time by calculating the annual region-wide mean of grid cell-specific mean, minimum (coldest month), and maximum (warmest month) SSTs, and performing a linear regression of those region-wide means over time for each region with a significance threshold of α = 0.05.
Range edge positions were compared to warm and cold temperature extremes, defined as the warmest and coldest months of the 12 months preceding the survey (i.e., coldest month the previous winter and warmest month the previous summer). To generate edge-specific estimates of warm and cold extreme temperatures (see Range edge analysis ), we constructed generalized additive models (GAMs) of maximum and minimum monthly temperatures in each year along the axis of measurement for each region (see Spatiotemporal reconstruction of species ranges ) using the “mgcv” package in R (Wood 2017). Each regional GAM predicted warm or cold temperature extremes in each year, given a position along the axis (one GAM was fit for all years in a given region with axis position as a smoothed predictor estimated separately for each year, as well as a separate year factor).