Example datasets
SNPfiltR is distributed via CRAN with a provided example dataset. Users can install the package and load this example dataset in a single step, by calling install.packages(“SNPfiltR”); data(vcfR.example) . The small size of this example dataset, containing 500 SNPs from 20 individual samples (10K unique genotypes), allows for its distribution with the SNPfiltR package without pushing the entire distribution over the 1 Megabyte limit for CRAN packages. Nonetheless this example dataset, a subset of a real empirical SNP dataset, retains sufficient resolution for generating informative examples of SNPfiltR functions and is designed to offer rapid testing and validation. For SNPfiltR functions that require an input ‘popmap’ which maps individual samples in the input vcf file to putative species/populations, a popmap for this example vcfR object can be accessed by calling data(popmap) once the package has been successfully installed. A fully documented example SNP filtering pipeline using this small example SNP dataset is publicly available at: (devonderaad.github.io/SNPfiltR/articles/reproducible-vignette.html).
I used additional example datasets to provide fully worked vignettes integrating functions from SNPfiltR and vcfR into fully R-based, customizable SNP filtering pipelines for genomic datasets resulting from Restriction-site Associated DNA sequencing (RADseq) (Davey & Blaxter, 2010) (available at: devonderaad.github.io/SNPfiltR/articles/scrub-jay-RADseq-vignette.html) and the sequencing of Ultra-Conserved Elements (UCE’s) (Faircloth et al., 2012) (available at: devonderaad.github.io/SNPfiltR/articles/scrub-jay-UCE-vignette.html). The RADseq vignette uses as input a vcf file containing 210,336 unfiltered SNPs for 115 individuals, called using Stacks v.2.41 (Rochette et al., 2019). This empirical dataset from throughout the entire distribution of Scrub-Jays (genus Aphelocoma ) across North America, will be publicly released via Dryad, upon publication. The UCE vignette uses as input an unfiltered vcf file containing 44,490 unfiltered SNPs for 28 samples, called using Phyluce (Faircloth, 2016) and GATK (McKenna et al., 2010). This dataset was the focus of McCormack et al. (McCormack et al., 2016), and is publicly available for download via the Dryad repository associated with this paper at: (datadryad.org/stash/dataset/doi:10.5061/dryad.qh8sh).