2.8 Gene family analysis
In order to identify gene families, protein sequences from S. peregrina and other nine diptera species, including Sarcophaga bullata , Stomoxys calcitrans , M. domestica , L. cuprina , Ceratitis capitata , Bactrocera oleae andD. melanogaster , with A. aegypti and A. gambiaebeing the family Culicidae as an outgroup, were first aligned using DIAMOND v0.9.30 (Buchfink et al. 2015), and the aligned results were then clustered using OrthoFinder v2.7 (Emms& Kelly 2015; Xuet al. 2019). To further reveal the phylogenetic relationships among S. peregrina and other nine species mentioned above, single-copy families were aligned via the MAFFT v7.0 (Kazutaka& Standley 2013), and then trimmed using the Gblocks v0.91b (Castresana 2000). The phylogenetic tree was inferred using a maximum likelihood method as implemented in RAxML v8.2 with the GTRGAMMA model and 100 bootstrap replicates (Alexandros 2006; Stamatakis 2014). Afterwards, divergence times were estimated under a relaxed clock model by MCMCTREE program implemented in PAML v 4.9e (Yang 2007). The molecular clocks of the family Culicidae (105.91-234.53 Mya), Stomoxys calcitrans andM. domestica (26.97-36.96 Mya) were used for fossil calibration.
According to gene families and phylogenetic relationships, the results were further analyzed to identify the expanded and contracted gene families by Computational Analysis of Gene Family Evolution v4.2 (Tijlet al. 2006). Moreover, in order to identify positively selected genes in the S. peregrina , we retained orthologous groups amongS. peregrina and the remaining seven species (after removal of the outgroup in evolutionary analysis) using Blastall (Camacho et al. 2009). Subsequently, we calculated likelihood ratio tests for selection (P<0.05) using Codeml with the branch-site model as implemented in the PAML package (Yang 2007). Besides, we conducted the chromosome synteny between S. peregrina and D. melanogaster based on genome-scale ortholog alignment using MCScanX (Wang et al. 2012).