Processing of genomic data and SNP calling for rice accessions
Demultiplexing of raw GBS data, mapping and SNP calling were implemented in a pipeline using Toggle v0.3.3 (Monat et al., 2015). Reads were demultiplexed with PROCESSRADTAGS and mapped to the IRGSP-1.0 Nipponbare reference genome (Kawahara et al. 2013) using BWA (Li & Durbin, 2009) with option –n 5 for sub-commands aln and SAMSE. The alignments were sorted with picardToolsSortSam and SamtoolsView (http://broadinstitute.github.io/picard/ , Li 2011). The GATK suite (McKenna et al. 2017) was used for downstream treatments. We used Realignertargetcreator to define suitable intervals for local realignments and Indelrealigner to perform local realignment of reads around indels. Markduplicates was used to remove duplicates, available in Picardtools. The output bam files were divided into per chromosome bam files with Bamtools. SNP calling was made with GATK for each chromosome with Gatkhaplotypecaller, while filtering sites with the option Badcigar. High-confidence SNPs were identified using GATK’s Variantfiltration to filter variants based on parameters DP>10, QUAL > 30.
Genomic data from the 216 worldwide rice accessions were mapped against the IRGSP-1.0 Nipponbare reference genome using the same procedure. Mapping data were post-processed as described above. Analyses were conducted on the intersect between the set of SNPs identified with GBS data for YYT landraces and the one identified with whole genome data for worldwide accessions.