3.1 Genome sequencing and de novo assembly
Sequencing of the fundatrigenia genome using the PacBio PS II platform generated 130 Gb of raw data with N50 21,033. The raw contig-level assembly was comprised of 304,774,269 bases with 1,409 contigs and N50 2,961,835 (Table 1). The k-mer (K=17) analysis indicated that the heterozygosity of S. chinensis was 0.79% and the estimated genome size was 273,985,190 bp (Figure S2). The contig-level assembly comprised 271,416,320 bp with 378 contigs, and N50 4,333,385 after removing the heterozygosity (Table 1).
The chromosome-level genome was assembled into a total length of 271,524,833 bp, with a scaffold N50 20,405,002 using PacBio and Hi-C data (Table 1, S1). More than 97% of the total genome bases were successfully anchored to 13 chromosomes containing 97.2% of the total sequences. The chromosomic lengths ranged from 14,859,000 bp to 10,104,278 bp. Three hundred and forty-one small scaffolds make up the 2.8% of the total genome (Table 1; Figure 2, 3A). BUSCO analyses against Eukaryota, Arthropoda, Insecta and Hemiptera datasets were performed. S. chinensis genome assembly contains the highest number of conserved single-copy Arthropoda genes of any published aphid genome, suggesting the completeness and high quality of our genome assembly (Figure 4A). The reads mapped to the assembled genome sequences with 97.70% mapping rate and 20 G average sequence depth (Table S2), and more than 86% of the assembled RNA-seq transcripts mapped to the genome (Table S3).