Assembling the reference genome de novo
We used ALLPATHS-LG version-44099(Gnerre et al., 2011) (parameters PLOIDY=2 and PHRED_64=1) to assemble the draft genome of the black-faced spoonbill. We ran ALLPATHS-LG on a workstation with 32 CPUs (2,199.882 MHz) and 377.8 Gb RAM. (The peak value of in-use memory was 342.1 Gb.)
We used the correction steps CleanCorrectedReads andErrorCorrectJump in ALLPATHS-LG to remove 1.7% of paired-end reads and to correct 69.3% of mate-paired reads with the criterion of low frequency k-mers (K=25 and 96 for paired-end and mate-paired reads respectively). The raw reads for assembling were 50.5 Gb (67.8%) and 28.8 Gb (57.5%) of paired-end and mate-paired reads, respectively (table S1). In total 34,176 contigs with N50 size of 71.0 kb were assembled with a total length of 1.18 Gb. Finally, 2,243 scaffolds (N50= 4.2 Mb) (table S1) were concatenated from these contigs. The draft genome generated by ALLPATHS-LG is a diploid genome. For subsequent analyses, we randomly dropped one of the nucleotides from each heterozygous SNP to generate a pseudo-haploid reference genome.