Statistical modelling:
Statistical analyses were undertaken using R version 4.1.2 (R Core Team, 2022). Prior to statistical modelling, data exploration was conducted following Zuur et al., (2010). Autocorrelation was observed in the data using ACF plots with itasdug package (van Rij et al., 2015). Generalised Additive Models (GAMs) were conducted using the functionbam within the mgcv package which is optimised to deal with large data sets. GAMs were fitted with autoregressive (AR(1)) correlation structure to account for observed autocorrelation, and a negative binomial error distribution (theta values obtained using function gam and nb distribution), with logarithmic link function, to deal with zero-inflation in the data (Wood, 2011; Woodet al., 2015). The rho values for the AR structure (which control the degree of permitted autocorrelation (Wood, 2017)) were determined using the itsadug package and ACF plots. The parameter gamma was set to 1.2 to reduce potential overfitting of splines.
The data were analysed for every hour and the response variables used were the number of minutes with porpoise detections for each hour (0-60 Detection positive minutes, or DPM) and the number of foraging buzzes (ICI <10ms) recorded per hour. Explanatory variables included diel period as a factor and month, temperature, noise, difference to high tide and tidal range as smooth terms. Circular smoothers were used for month and difference to high tide. Thin-plate regression splines with shrinkage were used for the remaining smooth terms which return the simplest effective spline. Generalized-cross validation and manual knot selection were used, with chosen values visually selected based on the trade-off between the overall simplicity of the model and the explanatory power of smooth graphs. To decide between the appropriate tidal variable for analysis each were included in the full model and models compared based on AIC score. Time difference to high tide resulted in the model with the lowest AIC and was used for further analysis.
The relatedness between the smooth terms in the model were measured using the function concurvity, in a similar manner to variance inflation factors used for Generalised Liner Models (GLM). Relatedness was measured on a scale of 0-1, with 0 indicating no difference and 1 indicating that terms are identifiable from each other. Concurvity was not found, so all terms were retained for analysis. Stepwise model selection was performed where non-significant interactions were dropped from the model (starting with the least significant) and model validation repeated. Models were compared using AIC to choose the best and final model. Model performance was checked using gam.check based on traditional QQ plot and residual plots (Wood, 2006). Model goodness of fit was described by deviance explained, and area under the receiver operator curve (AUC), package caret (Kuhn, 2008). AUC was calculated by predicting a binomial response variable from the fitted model and compared to the observed presence/ absence of the variable. This results in a value ranging from 0-1, with values closer to 1 indicating better model fit (Boyce et al., 2002). Graphical outputs were produced using the mgcViz package (Fasiolo et al ., 2018) and ggplot2 (Wickham, 2009).