Network link prediction
We applied the following network link prediction (NLP) algorithms:
The plug-and-play algorithm (Dallas et al. 2017) predicts missing links based on conditional probability estimation. This model was developed to infer the probabilities of unobserved links being undetected through a set of input parameters.
The Poisson N-mixture link prediction model (Fu et al. 2019) combines the Poisson N-mixture model used in ecological research with a low-rank collaborative filtering approach. Poisson N-mixture models are used in ecological research to account for imperfect detection in field observations (Royle 2004). Meanwhile, low-rank matrix completion–based collaborative filtering methods are a popular approach for NLP in social network studies. Missing entries in a data matrix are completed based on a low number of known entries (low rank matrix), e.g. to predict consumer preferences (Candes & Plan 2010).
We provided ecological, morphological, and phylogenetic input parameters to these models (Table 1). Both NLP models do currently not allow to account for phylogenetic uncertainty. Therefore, we included only the majority-rule consensus host and parasite BI phylogenies and the dendrograms calculated through the algorithm ward.D2 (Murtagh & Legendre 2014), one of the most widely used clustering algorithm (Murtagh & Legendre 2014). To avoid overfitting, we reduced the number of input variables per parameter through principal coordinate analyses (PCoA) of the distance matrices of each parameter. Distance matrices of some parameters (Table 1) were inferred from dendrograms built through clustering methods employed for the host niche dendrograms. Distance matrices were computed through the cophenetic function inR v4.0.0 (R Core Team 2021). To address missing data, we imputed the data matrix (see Dallas et al. 2017) through the expectation-maximisation with bootstrapping as implemented in theR package amelia (Honaker et al. 2011). Overall, we provided 9 input parameters consisting of 25 variables (Table 1).
We determined model accuracies as the Area Under the Receiver Operating Characteristic curve (AUROC) statistic through 10-fold cross validation. Each time, the algorithms were trained on 80% of the interaction matrix to predict the remaining 20%. We implemented the models in Rv4.0.0 (R Core Team 2021) and MATLAB v9.9.0 (MathWorks, Natick, USA) using the provided codes (doi: 10.6084/m9.figshare.4965038; https://github.com/Hutchinson-Lab/Poisson-N-mixture). Following Dallaset al. (2017), we assessed variable importance of theplug-and-play algorithm by measuring the reduction in model performance resulting from 500-fold permutation of each of the input variables. For variable assessment and host-parasite link prediction, the algorithm was trained on the full dataset. This assessment was not performed for the Poisson N-mixture model due to lacking implementation.