2.3 Genome assembly
We first estimated the genome size using Illumina data. We selected a k-mer length of 17 bases and used Illumina paired end reads for k-mer analysis to estimate the genome size and heterozygosity. The k-mer number and distribution were calculated by Jellyfish (version 1.1.10, parameters set to -C, -m 17, -s 10G, -t 80) and GenomeScope (version 2.0, parameters set to 12, 150) counted and visualized genomic information (Ranallo-Benavidez, Jaron, & Schatz, 2020, Marcais & Kingsford, 2011). Then, Pacbio sequencing data were used to assemble the draft genome using Wtdbg2 (version 2.5, parameters set to -t 8, -p 21, -S 4, -s 0.05, -g 274m, -L 5000) (Ruan & Li, 2020). Long reads were used to correct sequencing errors using NextPolish (Hu, Fang, Su, & Liu, 2019). In addition, Illumina sequencing data was mapped to draft genome assembly using bowtie2 (version 2.4.4, parameters set to score-min L, -0.3, -0.3 -p 8 -I 0 -X 1000) and was used for error correction in Pilon (version 1.23, with default parameters) (Walker et al., 2014). Finally, HaploMerger2 (set default parameters) and purge_haplotigs (parameters set to -m 4G; -t 60; -l value1, -m value2, -h value3; -t 60, -a 70) was used to remove the heterozygous regions in the genome (Huang, Kang, & Xu, 2017, Roach, Schmidt, & Borneman, 2018).
To construct the chromosome-level genome assembly, Hi-C sequences were aligned with the draft genome assembly using Juicer (version 1.5, with default parameters). An initial assembly was generated via a 3D de novo assembly (3D-DNA) (version 180114) analysis with parameter “-r 3” (Dudchenko et al., 2017). The initial assembly was reviewed using Juicebox Assembly Tools (JBAT, version 1.11.0, with default parameters) (Dudchenko et al., 2018), resulting in a finally chromosome-level genome assembly. The completeness of genome assembly was assessed using BUSCO (v5.1.3) (Waterhouse et al., 2018) to scan universal single-copy orthologous genes selected from Eukaryota, Arthropoda, Insecta and Hemiptera datasets (odb_10). The final assembly was validated using the Illumina short reads and RNA sequencing (RNA-seq) reads. The reads were aligned against the assembled genome sequence using Hisat2 (version2.1.0.5) (Kim et al., 2015).