Shared and private SNPs across datasets
Virtually all the SNPs in the P1+P2 dataset for each genus are shared with either the corresponding P1 or P2 dataset (Figure 4). However, the much larger P1 and P2 datasets are composed primarily (60-90%) of private SNPs that are not present in the P1+P2 dataset. To investigate the hypothesis that these private SNPs are enriched with homeo-SNPs, we compared SAM files resulting from alignment of unique reads to P1, P2, and P1+P2 genomes, and associated each SNP with its underlying reads in the SAM files based on position. A multi-mapping index was defined for each SNP as the proportion of underlying unique reads that mapped to both parental genomes. Since homeo-SNPs arise from multi-mapped reads, we hypothesized that the reads underlying private SNPs would display a higher incidence of multi-mapping. Indeed, the mean multi-mapping index was ~2X higher for private SNPs than for shared SNPs for both parents of both genera (Figure S1), suggesting that private SNPs are enriched with homeo-SNPs.