STATISTICAL ANALYSIS
Differences in egg size between high- and low-density sites were assessed using a General Linear Mixed Model (GLMM). This included egg size as the response variable, density as a predictor (fixed effect), location (random effect) and female condition as a covariate. Similarly, differences in sperm length between low- and high-density sites were assessed using sperm length as the response variable, density as a predictor (fixed effect), location (random effect) and male condition as a covariate.
Principal component analysis (PCA) was used to reduce the motility parameters to two PCs with eigenvalues more than 1, explaining respectively 60.75% and 23.35% of the variance (total variance explained: 84.10%). PC1 was predominantly loaded by the three velocity measures (VAP, VSL & VCL) and is therefore interpreted as a measure of sperm swimming speed; PC2 was loaded most strongly by STR and LIN and is therefore interpreted as a measure of sperm swimming straightness. These two PCs (PC1 and PC2) were used in all subsequent analyses, and their relationship with the original motility parameters is detailed in Table 1. We analysed differences in sperm motility between low- and high-density populations at the two sperm concentrations (106 and 108, see above) using a General Linear Mixed Model (GLMM) within the lme4 package of R using PC scores (PC1 and PC2) as dependent variables, population density and sperm density as fixed factors with their interactions, location as a random effect and male condition as a covariate. The same model was run to analyse the percentage of motile sperm (arcsine-transformed before use). Means are presented with their standard errors (SE).
We analysed fertilisation success using Markov-chain Monte-Carlo generalized linear mixed models in R (package MCMCglmm, (Hadfield 2010)). With this Bayesian mixed model approach, we modelled the proportion of normally and abnormally fertilised eggs following a binomial distribution and obtained both an estimate of the components of variance and an estimate of the interval of credibility. All models were run for 1.3 × 107 iterations, with a thinning interval of 100 (i.e., only one iteration from every 100 in the Markov chain was used to estimate the posterior distribution of the parameters to reduce the occurrence of autocorrelation between successive iterations), and a burn-in of 3 × 105 (i.e., we discarded the first 3 × 105 models of the simulation to avoid issues with autocorrelation).
For the Bayesian models we included the fixed effects of population (3 levels, Geelong, Portarlington, and Williamstown), gamete age (time elapsed between spawning and fertilisation), body condition (see above), sperm concentration (fitted both as a linear and quadratic term), density of the male (high vs low), density of the female (high vs low), their interaction with each other and with sperm concentration. The random effects used in the model included block (V block), identity of male (V male), identity of female (V female), and an interaction between male and female IDs (V male × female). The residual variance (Ve ) of the model represents the variance between replicates within a given male and female pair.
A necessary step in Bayesian statistical analyses is to set priors before running the models. The term prior refers to the prior distribution of a parameter before the data are analysed. The level of information of the prior can vary from noninformative to highly informative. When knowledge about the relationship between the variables in the model is low, it is best to run the model with different priors and to check whether these different priors provide different posterior distributions (Hadfield 2010). We therefore ran the models using inverse Wishart priors (equivalent to an inverse gamma distribution with shape=scale=0.001; V=1, nu=0.002), parameter expanded priors (V=1, nu=1, alpha.mu=0, alpha.V=1000), and non-informative improper priors (V=1e-16, nu=-2). Although we present results from the model using parameter expanded priors, the conclusions did not qualitatively change according to prior specifications. We inspected the 95% highest posterior density (HPD) intervals associated with each fixed effect to check whether they overlapped with zero. A 95% HPD interval contains most of the posterior distribution and is analogous to a confidence interval in the frequentist approach; we considered estimates with 95% HPD intervals overlapping with zero as non-significant.
We ran an initial model for the proportion of normally fertilised eggs while excluding data from the highest sperm concentration to facilitate the interaction between sperm concentration and population density (i.e., including the highest sperm concentration made the fertilisation curve very complex). To model the proportion of abnormally fertilised eggs, we ran a second model while keeping data in the highest sperm concentration. The proportion of abnormally fertilised eggs was very low at other sperm concentrations.
We ran a third model, this time including data from all sperm concentrations and allowed variance components to vary with sperm concentration. We used the “idh()” function to model heterogeneous variance components for V male,V female, and V male × female according to each concentration. Hence, there were 20 variance components estimated in this model (i.e., V male,V female, and V male × female estimated separately for sperm concentrations and a singleV block and Ve ). By contrast to Gaussian data, with binomial data it is not recommended to compare different models using the deviance information criteria (an index produced by MCMCglmm models that balances the fit of the model based on the number of parameters used in the model). Thus, we cannot formally test whether model fit was improved by allowing heterogeneous variance components across sperm concentrations. Instead, we inspected the 95% HPD intervals associated with each random effect to check whether they overlapped. We considered estimates with non-overlapping 95% HPD intervals as significantly different (Hadfield 2010). Note that, as the lower limit of a variance component is bound to zero, its lower 95% HPD can be extremely close to, but cannot overlap zero. Thus, inspection of the HPDs cannot be formally used to test whether a variance component is significantly greater than zero (Hadfield 2010). Nevertheless, the 95% credible intervals around the variance estimates provide a measure of the precision of the estimate and allowed us to test whether V male × female differed across sperm concentrations.
Data Availability statement
Data and associated r-code for the fertilisation analysis can be found here: https://datadryad.org/stash/share/Yx-wxEJsD-SjrHkHG9lVShv1Qw-KFTAT55vBMvUBVQ4