Data assembly: Infection data, niche modelling, phylogenies
We assembled infection data through a survey of peer-reviewed
literature. This survey resulted in an updated version (Supporting
Information) of the list published by Cruz-Laufer et al. (2021a).
For abundance weighting, we also assembled infection parameters
including the number of examined hosts, infected hosts, and parasites.
Yet if no infection parameters were reported (59% of reports, 61% of
interaction), we considered these reports singular observations to take
them into account but minimise their impact on downstream analyses
(eventually constituting 9.6% of infections).
We built host niche dendrograms based on ecological, geographical, and
morphological data (Table 1) accessed on FishBase (Froese & Pauly 2000)
through the R package rfishbase (Boettiger et al.2012). Missing trophic level and habitat data were added through a
literature survey (see Supporting Information). Dendrograms were built
through hierarchical clustering in R (Pavoine et al. 2009)
based on a Gower’s distance matrix (Gower 1971). Gower’s distances were
calculated using the function dist.ktab in the R packageade4 v1.7.16 (Pavoine et al. 2009). As in Clark & Clegg
(2017), we accounted for uncertainty of the host niche by applying
several clustering algorithms implemented in the function hclustin R (incl. ward.D2 , single , complete ,average , mcquitty, median , and centroid ) (R Core
Team 2022). We tested for topological congruence of the resulting
dendrograms using the congruence among distance matrices (CADM) test
(Legendre & Lapointe 2004; Campbell et al. 2011) in the packageape v5.4 (Paradis & Schliep 2019).
As no previous phylogenetic study covers all the species known to host
members of Cichlidogyrus , we conducted a new analysis (see
Appendix S1.1) based on DNA sequence data accessed on GenBank (Appendix
S2) to infer phylogenetic distances between hosts. For the parasites, we
included morphometric and phylogenetic data from Cruz-Laufer et
al. (2021b), i.e. morphological measurements and 100 randomly sampled
Bayesian tree topologies from the post-burn in fraction.