2.5 Gene annotation
To predict repetitive regions,
RepeatMasker (version 4.1.1)
(Tarailo-Graovac & Chen, 2009) was used to screen
the S. chinensis genome
against the Hemiptera ch repeat database, set this parameter to
RepeatMasker -pa 4 -e ncbi -species Hemiptera ch -dir. To predict
transposons and repetitive regions, an aphid- specific database was
generated using RepeatModeler (version 2.0.1, set to default parameters)
(Flynn et al., 2020). Statistical results of RepeatMasker and
Repeatmodeler analyses were combined. For the RNA-seq assisted method,
RNA-seq data generated from Illumina were aligned to the S.
chinensis genome using Hisat2 (version2.1.0.5) (Kim et al., 2015).
RNA-seq evidence was used for gene structure predictions using GETA
(version 2.4.2). Gene structures were also predicted based on homology
to those from E. lanigerum , Ac. pisum , Myzus
persicae , Aphis glycines , R. maidis by genewise (version
2.4.1) (Birney, Michele, & Durbin, 2004). Comprehensive gene prediction
results of RNA-seq and homologous proteins were used to generate
accurate and complete gene models for Augustus (version 2.5.5) (Stanke
et al., 2006) training, Augustus was used to perform gene prediction
(Stanke et al., 2006; Blanco, Parra & Guigó, 2007). Finally, gene
prediction results were integrated and sifted by PFAM database. We
aligned the genes to seven functional databases to annotate genes in theS. chinensis genome using BLASTP with an E-value cutoff of
1×10−5. The databases used in the study were NCBI
Non-Redundant Protein Sequence (Nr), Non-Redundant Nucleotide Sequence
Database (Nt), SwissProt, Cluster of Orthologous Groups for eukaryotic
complete genomes (KOG), The Integrated Resource of Protein Domains and
Functional Sites (InterPro), Conserved Domain Database (CDD), Gene
Ontology (GO), Kyoto Encyclopedia of Genes and Genomes, Orthology
database (KEGG) and evolutionary genealogy of genes: Non-supervised
Orthologous Groups (eggNOG). Potential sequences form bacteria, fungi
and other microorganisms were removed by aligning the genome sequences
to the Nt database. A localBlast2GO
database was built for GO annotation and was processed via Blast2GO
(version 2.5). KAAS of KEGG was
used to annotate S. chinensis genome sequence quickly, and the
pattern of BBH was chosen.