Example datasets
SNPfiltR is distributed via CRAN with a provided example dataset.
Users can install the package and load this example dataset in a single
step, by calling install.packages(“SNPfiltR”);
data(vcfR.example) . The small size of this example dataset, containing
500 SNPs from 20 individual samples (10K unique genotypes), allows for
its distribution with the SNPfiltR package without pushing the
entire distribution over the 1 Megabyte limit for CRAN packages.
Nonetheless this example dataset, a subset of a real empirical SNP
dataset, retains sufficient resolution for generating informative
examples of SNPfiltR functions and is designed to offer rapid
testing and validation. For SNPfiltR functions that require an
input ‘popmap’ which maps individual samples in the input vcf file to
putative species/populations, a popmap for this example vcfR object can
be accessed by calling data(popmap) once the package has been
successfully installed. A fully documented example SNP filtering
pipeline using this small example SNP dataset is publicly available at:
(devonderaad.github.io/SNPfiltR/articles/reproducible-vignette.html).
I used additional example datasets to provide fully worked vignettes
integrating functions from SNPfiltR and vcfR into fully
R-based, customizable SNP filtering pipelines for genomic datasets
resulting from Restriction-site Associated DNA sequencing (RADseq)
(Davey & Blaxter, 2010) (available at:
devonderaad.github.io/SNPfiltR/articles/scrub-jay-RADseq-vignette.html)
and the sequencing of Ultra-Conserved Elements (UCE’s) (Faircloth et
al., 2012) (available at:
devonderaad.github.io/SNPfiltR/articles/scrub-jay-UCE-vignette.html).
The RADseq vignette uses as input a vcf file containing 210,336
unfiltered SNPs for 115 individuals, called using Stacks v.2.41
(Rochette et al., 2019). This empirical dataset from throughout the
entire distribution of Scrub-Jays (genus Aphelocoma ) across North
America, will be publicly released via Dryad, upon publication. The UCE
vignette uses as input an unfiltered vcf file containing 44,490
unfiltered SNPs for 28 samples, called using Phyluce (Faircloth, 2016)
and GATK (McKenna et al., 2010). This dataset was the focus of McCormack
et al. (McCormack et al., 2016), and is publicly available for download
via the Dryad repository associated with this paper at:
(datadryad.org/stash/dataset/doi:10.5061/dryad.qh8sh).