Data assembly: Infection data, niche modelling, phylogenies
We assembled infection data through a survey of peer-reviewed literature. This survey resulted in an updated version (Supporting Information) of the list published by Cruz-Laufer et al. (2021a). For abundance weighting in downstream analyses, we also assembled infection parameters including the number of examined hosts, infected hosts, and parasites. If no infection parameters were reported, we considered a report as a single infected specimen.
We built host niche dendrograms based on ecological, geographical, and morphological data (Table 1) available in FishBase (Froese & Pauly 2000) and accessed through the R package rfishbase(Boettiger et al. 2012). Missing trophic level and habitat data were added through a literature survey (see Supporting Information). Dendrograms were built through hierarchical clustering in R (Pavoineet al. 2009) based on a Gower’s distance matrix (Gower 1971). Gower’s distances were calculated using the function dist.ktab in the R package ade4 v1.7.16 (Pavoine et al. 2009). As suggested by Clark & Clegg (2017), we accounted for uncertainty of the host niche by implementing a range of clustering algorithms implemented in the hclust function in R (incl.ward.D2 , single , complete , average ,mcquitty, median , and centroid ) (R Core Team 2021). We tested for topological congruence of the resulting dendrograms using the congruence among distance matrices (CADM) test (Legendre & Lapointe 2004; Campbell et al. 2011) in the R package apev5.4 (Paradis & Schliep 2019).
As no previous phylogenetic study on fishes covers all the species known to host members of Cichlidogyrus , we conducted a new analysis (see Appendix S1.1) based on DNA sequence data accessed on GenBank (Appendix S2) to infer phylogenetic distances between hosts. For the parasites, we included morphometric and phylogenetic data from Cruz-Laufer et al. (2021b), i.e. morphological measurements and 100 randomly sampled Bayesian tree topologies from the post-burn in fraction.