2.7 Phylogenetic analysis
We constructed phylogenetic trees using whole-genome sequences ofS. chinensis and eight other aphid species includingDaktulosphaira vitifoliae ,Sipha flava , Aphis glycines , R. maidis , A.
pisum , Myzus persicae , Diuraphis noxia , E.
lanigerum . The whitefly, Bemisia tabaci was used as the
outgroup. The aphid genome sequence and gene structure annotation files
were downloaded from the NCBI genome database, and genes containing mRNA
information were retained and the CDS was modified. The longest sequence
is selected as the representative sequence. Finally, the protein and CDS
sequences of all genes were obtained. Orthologous groups were assigned
using OrthMCL (v2.0.9) (Li, Stoeckert & Roos, 2003) based on
all-versus-all BLASTP results (E-value ≤1×10−5).
Single-copy orthologs OrthoMCL
results were used to extract single copy ortholog groups according to
some standards: as long as it appears in 50% of species, it is
considered to be a single copy gene. If the shortest sequence of the
single copy ortholog group is greater than 6000, the gene family is
filtered out. Multi-sequence alignments of single copy ortholog genes
were performed using MAFFT (version 7.221,
Katoh, Misawa, Kuma, & Miyata,
2002; Katoh & Standley, 2013) and conserved amino-acid sites were
identified by Gblocks (version 0.91, Clore, 2014). RAxML (version
8.1.24) (Stamatakis 2014) were used to construct the phylogenetic tree
under the GTRGAMMA model with 1000 bootstrap replicates (Castresana,
2000). The branch length of homologous genes was analyzed using PAML
(Yang, 2007), and then compared with the standard tree to eliminate
abnormal genes. Then the tree was built using RAxML again (Stamatakis,
2014). By providing the root number and multiple sequence alignment
results with calibration point information, species divergence time was
calculated using mcmctree (a part of the PAML software, version 14.9).
Divergence time within the evolutionary tree was obtained with 95%
confidence (Yang, 2007). Divergence times and ages of fossil records
were derived from TimeTree (http://www.timetree.org/) and applied as
calibration point. The nodal dates
of Ac. pisum and Ap. glycines were 28-61 MYA, D.
vitifoliae and S. flava were 87-162 MYA and B. tabaci andD. vitifoliae were 245- 351 MYA according to the divergence times
from TimeTree (Johnson et al., 2018). 2.8 Gene family expansion
and contraction
We used CAFE (version 3.1) (Hahn et al., 2007) to analyze the gene
family expansion and contraction by comparing our genome with those from
9 other aphids (D. vitifoliae , S. flava, E. lanigerum, Ap.
glycines, R. maidis, Ac. pisum, D. noxia and M. persicae ).
Briefly, the quantitative information of gene families of 10 insects was
obtained according to OrthoMCL
results. The number of gene families in each species and the trees with
divergence time were used as the input information of
CAFE (with parameter “lambda -s,
-t”), best rates for gene birth and death were decided by CAFE and all
branches have the same rate of gene birth and death. Expansion and
contraction of gene families were measured by CAFE (Hahn, Demuth & Han,
2007). The GO and KEGG enrichment analyses were conducted using
Omicshare CloudTools under this tool’s default instructions
(http://www.omicshare.com/).