2.5 Gene annotation
To predict repetitive regions, RepeatMasker (version 4.1.1) (Tarailo-Graovac & Chen, 2009) was used to screen the S. chinensis genome against the Hemiptera ch repeat database, set this parameter to RepeatMasker -pa 4 -e ncbi -species Hemiptera ch -dir. To predict transposons and repetitive regions, an aphid- specific database was generated using RepeatModeler (version 2.0.1, set to default parameters) (Flynn et al., 2020). Statistical results of RepeatMasker and Repeatmodeler analyses were combined. For the RNA-seq assisted method, RNA-seq data generated from Illumina were aligned to the S. chinensis genome using Hisat2 (version2.1.0.5) (Kim et al., 2015). RNA-seq evidence was used for gene structure predictions using GETA (version 2.4.2). Gene structures were also predicted based on homology to those from E. lanigerum , Ac. pisum , Myzus persicae , Aphis glycines , R. maidis by genewise (version 2.4.1) (Birney, Michele, & Durbin, 2004). Comprehensive gene prediction results of RNA-seq and homologous proteins were used to generate accurate and complete gene models for Augustus (version 2.5.5) (Stanke et al., 2006) training, Augustus was used to perform gene prediction (Stanke et al., 2006; Blanco, Parra & Guigó, 2007). Finally, gene prediction results were integrated and sifted by PFAM database. We aligned the genes to seven functional databases to annotate genes in theS. chinensis genome using BLASTP with an E-value cutoff of 1×10−5. The databases used in the study were NCBI Non-Redundant Protein Sequence (Nr), Non-Redundant Nucleotide Sequence Database (Nt), SwissProt, Cluster of Orthologous Groups for eukaryotic complete genomes (KOG), The Integrated Resource of Protein Domains and Functional Sites (InterPro), Conserved Domain Database (CDD), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes, Orthology database (KEGG) and evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG). Potential sequences form bacteria, fungi and other microorganisms were removed by aligning the genome sequences to the Nt database. A localBlast2GO database was built for GO annotation and was processed via Blast2GO (version 2.5). KAAS of KEGG was used to annotate S. chinensis genome sequence quickly, and the pattern of BBH was chosen.