3.2 Genome annotation
A total of 15,710 genes were identified with an average number of 3.94
exons per gene, average transcript length, average CDS length, and
average exon length per gene were 7,635.25 bp, 1,404.03 bp, and 356.4
bp, respectively (Additional file 1: Table S12). Moreover, the total
number of genes in assembled genome is larger than those of five
published genomes mentioned above (A. aegypti , A.s
gambiae , D. melanogaster , L. cuprina , and M.
domestica ) (Additional file 1: Figure S6 and Table S13). 14,476
protein-coding genes were annotated with potential functions, accounting
for 92.14% of all genes in assembled genome (Additional file 1: Table
S14). We identified 11,425 genes that showed homology to proteins in the
InterPro databases. A total of 7,999 genes were assigned to GO
classifications. Based on KEGG analysis, we could annotate 5,236 genes
and 130 KEGG metabolic pathways in the assembled genome. Additionally,
9,332 genes could be annotated in the Swissprot database.
The results of the de novo and homology-based predictions showed
that 256.07 Mb of repetitive sequences were identified, covering 45.70%
of the assembled genome. DNA transposons (69.21Mb) represented the most
abundant TEs, accounting for 12.35% of the genome (Additional file 1:
Table S15 and S16). Furthermore, 456,324 SSRs were detected, including
338,634, 58,559, 43,763, 12,124, 2,835, 409, mono-, di-, tri-, tetra-,
penta-, and hexa-nucleotide repeats, respectively (Additional file 1:
Tables S17 and S18). In addition, 157 miRNAs, 50 rRNAs, 200 snRNAs, and
1,465 tRNAs were identified in the assembled genome (Additional file 1:
Table S19).