3.2 | Effects of using different in silico mate
pairs on genome assembly of C. batrachus
The assemblies of C.
batrachus generated using only paired-end libraries were
unsatisfactory, the NGA50 only approximating 5.5 Kb and the number of
complete BUSCOs (Benchmarking Universal Single-Copy Orthologs) 1,614
(Table 1). Both the original in silico method (mate pairs
generated using one reference from the same genus) and the optimizedin silico method (conserved mate pairs generated using two
references from the same genus) significantly improved the genome
assembly of C. batrachus . Compared to the original in
silico method (using a single reference from the same genus, ‘mag’:C. magur or ‘mac’: C.
macrocephalus ), the optimized in silico method (using two
reference from the same genus, ‘mag’ and ‘mac’) reduced misassemblies
(mag*:23,519; mac*: 25,442 vs. mag-mac**: 14,535), and yielded a similar
NGA50 (mag*: 74.5 Kb; mac*: 39.1 Kb vs. mag-mac**: 67.3 Kb) and a
similar number of complete BUSCOs (mag**:2,871; mac*: 2,659 vs.
mag-mac**: 2,788).
Compared to the original in
silico method, optimized in silico method of generating
conserved mate pairs using three reference genomes (two from the same
genus ‘mag’, ‘mac’ and one from the same order ‘mel’) drastically
decreased misassemblies (mag*:23,519; mac*: 25,442, mel*:18,552 vs.
mag-mac-mel**:7,671), but did not increase the NGA50 (mag*: 74.5 Kb;
mac*: 39.1 Kb, mel*: 8.2 Kb vs. mag-mac-mel**: 5.5 Kb) or complete
BUSCOs (mag*:2,871; mac*: 2,659, mel*:1,756 vs. mag-mac-mel**: 1,618 ).
We compared the mate pairs generated using one reference genome
(C. batrachus ) with the conserved mate pairs generated using two
reference genomes (C. batrachus and C. macrocephalus ). We
found that the extra mate pairs in the target genome generated using one
reference were mostly inverted (45.76% to 47.21%), while the remaining
mate pairs in the target genome either displayed length deviations or
were mapped to different scaffolds of the target genome (Table S11).