Performance benchmarking
To evaluate the performance of SNPfiltR , I compared filtering
runtimes with the widely used program VCFtools (Maechler et al.,
2018). VCFtools is a highly efficient command-line based program
written in Perl and C++, which is frequently used for filtering vcf
files according to various quality metrics. VCFtools can parse and
filter a vcf file without having to read the entire file into local
memory, offering an assumed advantage in efficiency over R-based
implementations such as SNPfiltR , especially for larger input
files. To objectively evaluate the utility of SNPfiltR , compared
to a program like VCFtools , I benchmarked performance under a
simple, biologically plausible filtering scenario, setting a minimum
depth per called genotype = 5 and a minimum genotype quality per
genotype = 30. I then compared runtimes across three different
approaches; 1) using the R function SNPfiltR::hard_filter() on a vcf
file that has already been read into the local memory as a vcfR object,
2) wrapping the R function vcfR::read.vcf() inside of a call to
SNPfiltR::hard_filter(), to first read the given vcf file into the
local R working environment as a vcfR object, and then to perform
filtering on the vcfR object, and 3) directly specifying the full path
to the given vcf file to VCFtools to filter the dataset and
output a new, filtered vcf file. For each of these approaches, I
recorded the runtime for filtering each of eight vcf files, subset from
a real empirical vcf file, each containing 100 samples, and varying from
10K to 500K SNPs. All benchmarking was performed on a 2.3 GHz Dual-Core
Intel Core i5 CPU, running MacOS Big Sur 11.5.1, with 8 GB 2133 MHz
LPDDR3 SDRAM (i.e., a personal laptop with typical computing power), and
exact runtimes were recorded with a precision of
1/1000th of a second using the functionmicrobenchmark() from the R package microbenchmark(Mersmann et al., 2015) for iterations executed in R, and the bash
function ‘time’ for iterations executed using VCFtools. A fully
documented example of this benchmarking process is available at:
(devonderaad.github.io/SNPfiltR/articles/performance-benchmarking.html#benchmark-10k-1).