2.3 Genome assembly
We first estimated the genome size using Illumina data. We selected a
k-mer length of 17 bases and used Illumina paired end reads for k-mer
analysis to estimate the genome size and heterozygosity. The k-mer
number and distribution were calculated by Jellyfish (version 1.1.10,
parameters set to -C, -m 17, -s 10G, -t 80) and GenomeScope (version
2.0, parameters set to 12, 150) counted and visualized genomic
information (Ranallo-Benavidez, Jaron, & Schatz, 2020, Marcais &
Kingsford, 2011). Then, Pacbio sequencing data were used to assemble the
draft genome using Wtdbg2 (version 2.5, parameters set to -t 8, -p 21,
-S 4, -s 0.05, -g 274m, -L 5000) (Ruan & Li, 2020). Long reads were
used to correct sequencing errors using NextPolish (Hu, Fang, Su, &
Liu, 2019). In addition, Illumina sequencing data was mapped to draft
genome assembly using bowtie2 (version 2.4.4, parameters set to
score-min L, -0.3, -0.3 -p 8 -I 0 -X 1000) and was used for error
correction in Pilon (version 1.23, with default parameters) (Walker et
al., 2014). Finally, HaploMerger2 (set default parameters) and
purge_haplotigs (parameters set to -m 4G; -t 60; -l value1, -m value2,
-h value3; -t 60, -a 70) was used to remove the heterozygous regions in
the genome (Huang, Kang, & Xu, 2017, Roach, Schmidt, & Borneman,
2018).
To construct the chromosome-level genome assembly, Hi-C sequences were
aligned with the draft genome assembly using Juicer (version 1.5, with
default parameters). An initial assembly was generated via a 3D de novo
assembly (3D-DNA) (version 180114)
analysis with parameter “-r 3” (Dudchenko et al., 2017). The initial
assembly was reviewed using Juicebox Assembly Tools (JBAT, version
1.11.0, with default parameters) (Dudchenko et al., 2018), resulting in
a finally chromosome-level genome assembly. The completeness of genome
assembly was assessed using BUSCO (v5.1.3) (Waterhouse et al., 2018) to
scan universal single-copy orthologous genes selected from Eukaryota,
Arthropoda, Insecta and Hemiptera datasets (odb_10). The final assembly
was validated using the Illumina short reads and RNA sequencing
(RNA-seq) reads. The reads were aligned against the assembled genome
sequence using Hisat2 (version2.1.0.5) (Kim et al., 2015).