3.3 Genome annotation
A total of 124.22 Gb raw data was generated by the Illumina platform. A total of 261 transcripts (280,520,495 bp in total) were generated by Trinity (Table S4). A total of 79,136,004 bp repetitive sequences were obtained in the S. chinensis genome and the proportion of repeats was 27.14% (Table S6). In total, the number of predicted protein-coding genes was 14,089 (15,987 transcripts). A total of 97.37% of the annotated genes were located on the 13 chromosome-level scaffolds. The average CDS length, exons number per gene, exon length and intron length were 1,536 bp, 7.3, 212 bp and 910 bp, respectively, similar to most of those reported aphid species (Table S7, Figure S1). The results that 96.9%, 97.7%, 97.8% and 96.7% of BUSCO genome/gene set could be identified in the S. chinensis genome in comparison with the Eukaryota, Arthropod, Hemiptera and Insecta datasets showed completeness of the gene set (Figure 4B). The percentage of RNA-Seq reads assigned to a gene feature up to 90%.
Among the 14,078 predicted genes, 12,584 (89.31%) genes were annotated functionally. This was based on the combination of 8,739 (65.81%) genes found via GO database and 6,866 (51.71%) genes present in the KEGG database (Table 2). Additionally, the non-coding RNAs in the S. chinensis genome, including 128 tRNAs, 32 rRNAs, 29 miRNAs, and 81 snRNAs were identified (Table S8).
3.4 Phylogenomic analysis
To explore this new genome assembly in a phylogenetic context and to investigate gene family evolution among aphids, the proteins of S. chinensis derived from the complete set of annotated protein coding genes were compared to the proteins form nine other insect species with fully sequenced genome. The corresponding proteins from the B. tabaci genome were used to root the tree. A total of 3479 single copy ortholog groups extracted by OrthoMCL were used to construct the phylogenetic tree. The results showed that S. chinensis is a sister taxon to the wooly apple aphid E. lanigerum . The two Eriosomatinae species diverged from their common ancestor at approximately 57.16 million years ago (Figure 5). Eriosomatinae and Aphidinae (including Ap. glycines , R. maidis , Ac. pisum ,M. persicae or D. noxia ) may have diverged from a common ancestor about 63.22 Mya ago. The results are similar to the previous reports (Mather et al., 2020). The subfamily Eriosomatinae has a closer relationship with the subfamily Aphidinae, than the subfamily Chaitophorinae (including S. flava ) in the family Aphididae. Significant expansion or contraction of gene families is often related to adaptive divergence of species. To elucidate key genomic changes associated with adaptation, significantly expanded and contracted of gene families were analyzed in all the nine aphids and B. tabaci . Eriosomatinae showed 40 expanded and 986 contracted gene families compared to those of the common ancestor of Aphidinae and Eriosomatinae (Figure S3A). KEGG and GO annotations suggest that most of the expanded genes were involved in the detoxification of natural xenobiotics from plants (Figure S3B, S3C). Gene family evolution analysis indicated that the S. chinensis genome displayed 235 expanded and 1,037 contracted gene families compared with gene families of the common ancestor of S. chinensis and E. lanigerum . The KEGG annotations suggest that most of the expanded genes were involved in IL-17 signaling pathway, arachidonic acid metabolism, NF-kappa B signaling pathway, ovarian steroidogenesis, VEGF signaling pathway, necroptosis, regulation of lipolysis in adipocyte, TNF signaling pathway, and c-type lectin receptor signaling pathway (Figure S3E). The GO annotations suggest that most of the expanded genes were involved in prostaglandin-endoperoxide synthase activity, arachidonate 15-lipoxygenase activity, nuclear nucleosome, ovarian cumulus expansion, intrinsic apoptotic signaling pathway in response to osmotic stress, regulation of fever generation, regulation of platelet-derived growth factor production, response to lead ion, chromatin assembly or disassembly (Figure S3D, Table S9). S. chinensis expanded gene families were enriched not only for detoxification but also in immune system.