2.7 Phylogenetic analysis
We constructed phylogenetic trees using whole-genome sequences ofS. chinensis and eight other aphid species includingDaktulosphaira vitifoliae ,Sipha flava , Aphis glycines , R. maidis , A. pisum , Myzus persicae , Diuraphis noxia , E. lanigerum . The whitefly, Bemisia tabaci was used as the outgroup. The aphid genome sequence and gene structure annotation files were downloaded from the NCBI genome database, and genes containing mRNA information were retained and the CDS was modified. The longest sequence is selected as the representative sequence. Finally, the protein and CDS sequences of all genes were obtained. Orthologous groups were assigned using OrthMCL (v2.0.9) (Li, Stoeckert & Roos, 2003) based on all-versus-all BLASTP results (E-value ≤1×10−5). Single-copy orthologs OrthoMCL results were used to extract single copy ortholog groups according to some standards: as long as it appears in 50% of species, it is considered to be a single copy gene. If the shortest sequence of the single copy ortholog group is greater than 6000, the gene family is filtered out. Multi-sequence alignments of single copy ortholog genes were performed using MAFFT (version 7.221, Katoh, Misawa, Kuma, & Miyata, 2002; Katoh & Standley, 2013) and conserved amino-acid sites were identified by Gblocks (version 0.91, Clore, 2014). RAxML (version 8.1.24) (Stamatakis 2014) were used to construct the phylogenetic tree under the GTRGAMMA model with 1000 bootstrap replicates (Castresana, 2000). The branch length of homologous genes was analyzed using PAML (Yang, 2007), and then compared with the standard tree to eliminate abnormal genes. Then the tree was built using RAxML again (Stamatakis, 2014). By providing the root number and multiple sequence alignment results with calibration point information, species divergence time was calculated using mcmctree (a part of the PAML software, version 14.9). Divergence time within the evolutionary tree was obtained with 95% confidence (Yang, 2007). Divergence times and ages of fossil records were derived from TimeTree (http://www.timetree.org/) and applied as calibration point. The nodal dates of Ac. pisum and Ap. glycines were 28-61 MYA, D. vitifoliae and S. flava were 87-162 MYA and B. tabaci andD. vitifoliae were 245- 351 MYA according to the divergence times from TimeTree (Johnson et al., 2018). 2.8 Gene family expansion and contraction
We used CAFE (version 3.1) (Hahn et al., 2007) to analyze the gene family expansion and contraction by comparing our genome with those from 9 other aphids (D. vitifoliae , S. flava, E. lanigerum, Ap. glycines, R. maidis, Ac. pisum, D. noxia and M. persicae ). Briefly, the quantitative information of gene families of 10 insects was obtained according to OrthoMCL results. The number of gene families in each species and the trees with divergence time were used as the input information of CAFE (with parameter “lambda -s, -t”), best rates for gene birth and death were decided by CAFE and all branches have the same rate of gene birth and death. Expansion and contraction of gene families were measured by CAFE (Hahn, Demuth & Han, 2007). The GO and KEGG enrichment analyses were conducted using Omicshare CloudTools under this tool’s default instructions (http://www.omicshare.com/).