Species Distribution Model
To characterize the realized spatial niche for the two groups of
recruits, we used Maxent, a SDM that uses presence-background data to
estimate species’ responses given a set of environmental variables based
on maximum entropy algorithms (Phillips et al. 2006; Phillips &
Dudík 2008). Maxent consistently out-competes other classical,
presence-only and machine learning techniques (Elith et al. 2006, Gastón
and García-Viñas 2011); it performs well for small sample sizes (Wiszet al. 2008; Shcheglovitova & Anderson 2013), and is one of the
most widely used algorithms in species distribution and ecological niche
modelling (Melo-Merino et al. 2020).
We selected the presence (environmental predictors of recruit locations)
and background data (environmental predictors randomly sampled across
the reefscape) using the dense point clouds and the polygon meshes
derived from them. We calculated slope, roughness, TEI, substrate type
and benthic cover in each dense point cloud. Then, we overlapped each
point cloud to its respective mesh and assigned each attribute to each
polygon of the mesh. In the case of invertebrate and substratum type
descriptors, the value for each polygon corresponded to the percentage
of points from the dense point cloud classified as each descriptor.
Next, we used polygons analogous to pixels within a raster in Maxent.
Ten recruits were surrounded by another invertebrate and the percentage
of points classified as that particular invertebrate reached 100%.
However, octocorals and scleractinian recruits are not able to settle on
other live benthic invertebrates. Thus, to prevent the models from
identifying the surface of invertebrates as suitable habitat for
recruitment, we adjusted those percentages from 100% to 90% to account
for the small areas adjacent to the benthic invertebrates where the
recruit settled.
The use of our fine-grain models (sub-cm resolution) presented a
challenge in that our target organisms were approximately the same size
as the faces of the mesh, and thus could be ‘seen’ in the environmental
data itself. To obtain the value for the predictors associated with each
presence, we delimited a 1-cm area around each recruit, and calculated
the median value of each predictor within that area. Next, within that
area, we selected the polygon of the mesh with the closest value to that
median. This essentially assigns the occurrence location to the most
environmentally similar neighboring location. This step would be
unnecessary when applying this procedure for larger organisms, at lower
spatial resolution, and for mobile taxa. In total, we recorded the
biological and geomorphological data associated with 210 octocoral and
113 scleractinian coral recruits. To minimize spatial autocorrelation,
avoid the introduction of biases due to sampling at adjacent localities,
and obtain a more homogeneous representation of the area of study, the
background data were selected from groups of four quadrats within each
of the eight transects. Within each group, we selected 30,000 random
background polygons (N = 240,000) and all recruit presences.
To select the environmental
predictors for model fitting, we ran a Maxent model with all the
environmental variables, and measured their importance for each taxon
using the permutation method included in the vip R package
(Supplementary Methods Figure 2, Greenwell & Boehmke 2020). For model
fitting, we included all the variables with a permutation importance
> 0.01 for at least one taxon (Warren et al. 2019),
and to avoid multicollinearity we performed a Spearman’s rank
correlation analysis and discarded the least important variable, out of
each pair of correlated variables (r >
|0.6|; Supplementary Methods Figure 3, Graham 2003).
The variables: % coral, % octocoral, % sponge, % calcareous rock, %
igneous rock, % sand, slope, and both, roughness and TEI calculated at
10-mm and 100-mm scales were included in model fitting.
We partitioned 70% of the recruits for training the model and the
remaining 30% for validating it (Hastie et al. 2001), and
randomly selected 30,000 background polygons. To prevent model
over-complexity and overfitting, we compared several models with
different combinations of feature classes and regularization multipliers
(Morales et al. 2017). In particular, we tested models with
linear, product, hinge and quadratic features (Merow et al.2013); and 1, 1.5, 2, 2.5, 3, 3.5, 4, regularizations multipliers,
resulting in a total of 35 combinations (Anderson & Gonzalez 2011;
Shcheglovitova & Anderson 2013; Muscarella et al. 2014). The
criteria for selecting optimal combinations of feature classes and
regularization multipliers was based first, on a method for testing
statistical significance of Ecological Niche Models (ENMs) predictions
(Bohl et al. 2019) and second, on the average area under the
Receiver Operating Characteristic curve (AUC). We used random
occurrences to train 100 models, and compared the AUCs from the null
models to the AUC obtained from those built with the empirical
occurrences. For the threshold-independent measure of overfitting we
subtracted the AUC value of the training data from the AUC value of the
test data (AUCDIFF), with values close to zero
indicating low levels of model overfitting (Warren & Seifert 2011).
After model selection, marginal response curves were used to illustrate
the marginal estimate of habitat suitability, and the distribution of
each environmental predictor in the occurrence data and background
points. We quantified variable importance, correlation coefficients,
fitted, selected and evaluated the performance of Maxent models, and
generated responses curves using the R package ENMtools 1.0 (Warrenet al. 2021).