2.1 | Experimental design
We designed three experiments using the published data and simulations. First, we tested the effect of using references with different phylogenetic distances to target species, on the quality of target genome assemblies, using the paired-end data of the walking catfish (Clarias batrachus ) and a puffer fish (Takifugu bimaculatus ) (Table S1). For C. batrachus , genomes of two species, C. magur and C. macrocephalus , from the same genus, and one species, Ameiurus melas , from a different family but the same order, were selected as references. For T. bimaculatus , reference genomes of two species,T. rubripes and T. flavidus from the same genus, one species, Tetradon nigroviridis, from a different genus but the same family, and one species, Mola mola, from a different family but the same order, were selected. Secondly, we optimized the in silico mate-pair method by searching for conserved mate pairs generated using two or more references (Fig. 1) and used them to assemble the genomes via SOAPdenovo2 (Luo et al., 2012). Thirdly, we tested whether the optimized in silico method significantly improved the genome assembly of the mountain nyala (Tragelaphus buxtoni ), a highly degraded sample. Genomes of two species, T. scriptus and T. strepsiceros, from the same genus, one species, Bos grunniens , from a different genus but the same family, and one species,Moschus moschiferus , from a different family but the same order, were selected as references to produce in silico mate pairs for the purpose of assembling the genome of T. buxtoni . Lastly, we simulated single-end ancient DNA reads using T. flavidussequencing data to test the optimized in silico method and compared it with a reference-guided approach, RaGOO.