3.3 Genome annotation
A total of 124.22 Gb raw data was generated by the Illumina platform. A
total of 261 transcripts (280,520,495 bp in total) were generated by
Trinity (Table S4). A total of 79,136,004 bp repetitive sequences were
obtained in the S. chinensis genome and the proportion of repeats
was 27.14% (Table S6). In total, the number of predicted protein-coding
genes was 14,089 (15,987 transcripts). A total of 97.37% of the
annotated genes were located on the 13 chromosome-level scaffolds. The
average CDS length, exons number per gene, exon length and intron length
were 1,536 bp, 7.3, 212 bp and 910 bp, respectively, similar to most of
those reported aphid species (Table S7, Figure S1). The results that
96.9%, 97.7%, 97.8% and 96.7% of BUSCO genome/gene set could be
identified in the S. chinensis genome in comparison with the
Eukaryota, Arthropod, Hemiptera and Insecta datasets showed completeness
of the gene set (Figure 4B). The percentage of RNA-Seq reads assigned to
a gene feature up to 90%.
Among the 14,078 predicted genes, 12,584 (89.31%) genes were annotated
functionally. This was based on the combination of 8,739 (65.81%) genes
found via GO database and 6,866 (51.71%) genes present in the KEGG
database (Table 2). Additionally, the non-coding RNAs in the S.
chinensis genome, including 128 tRNAs, 32 rRNAs, 29 miRNAs, and 81
snRNAs were identified (Table S8).
3.4 Phylogenomic
analysis
To explore this new genome assembly in a phylogenetic context and to
investigate gene family evolution among aphids, the proteins of S.
chinensis derived from the complete set of annotated protein coding
genes were compared to the proteins form nine other insect species with
fully sequenced genome. The corresponding proteins from the B.
tabaci genome were used to root the tree. A total of 3479 single copy
ortholog groups extracted by OrthoMCL were used to construct the
phylogenetic tree. The results showed that S. chinensis is a
sister taxon to the wooly apple aphid E. lanigerum . The two
Eriosomatinae species diverged from their common ancestor at
approximately 57.16 million years ago (Figure
5). Eriosomatinae and Aphidinae
(including Ap. glycines , R. maidis , Ac. pisum ,M. persicae or D. noxia ) may have diverged from a common
ancestor about 63.22 Mya ago. The results are similar to the previous
reports (Mather et al., 2020). The subfamily Eriosomatinae has a closer
relationship with the subfamily Aphidinae, than the subfamily
Chaitophorinae (including S. flava ) in the family Aphididae.
Significant expansion or contraction of gene families is often related
to adaptive divergence of species. To elucidate key genomic changes
associated with adaptation, significantly expanded and contracted of
gene families were analyzed in all the nine aphids and B. tabaci .
Eriosomatinae showed 40 expanded and 986 contracted gene families
compared to those of the common ancestor of Aphidinae and Eriosomatinae
(Figure S3A). KEGG and GO annotations suggest that most of the expanded
genes were involved in the detoxification of natural xenobiotics from
plants (Figure S3B, S3C). Gene
family evolution analysis indicated that the S. chinensis genome
displayed 235 expanded and 1,037 contracted gene families compared with
gene families of the common ancestor of S. chinensis and E.
lanigerum . The KEGG annotations suggest that most of the expanded genes
were involved in IL-17 signaling pathway, arachidonic acid metabolism,
NF-kappa B signaling pathway, ovarian steroidogenesis, VEGF signaling
pathway, necroptosis, regulation of lipolysis in adipocyte, TNF
signaling pathway, and c-type lectin receptor signaling pathway (Figure
S3E). The GO annotations suggest that most of the expanded genes were
involved in prostaglandin-endoperoxide synthase activity, arachidonate
15-lipoxygenase activity, nuclear nucleosome, ovarian cumulus expansion,
intrinsic apoptotic signaling pathway in response to osmotic stress,
regulation of fever generation, regulation of platelet-derived growth
factor production, response to lead ion, chromatin assembly or
disassembly (Figure S3D, Table S9). S. chinensis expanded gene
families were enriched not only for detoxification but also in immune
system.