Performance benchmarking
To evaluate the performance of SNPfiltR , I compared filtering runtimes with the widely used program VCFtools (Maechler et al., 2018). VCFtools is a highly efficient command-line based program written in Perl and C++, which is frequently used for filtering vcf files according to various quality metrics. VCFtools can parse and filter a vcf file without having to read the entire file into local memory, offering an assumed advantage in efficiency over R-based implementations such as SNPfiltR , especially for larger input files. To objectively evaluate the utility of SNPfiltR , compared to a program like VCFtools , I benchmarked performance under a simple, biologically plausible filtering scenario, setting a minimum depth per called genotype = 5 and a minimum genotype quality per genotype = 30. I then compared runtimes across three different approaches; 1) using the R function SNPfiltR::hard_filter() on a vcf file that has already been read into the local memory as a vcfR object, 2) wrapping the R function vcfR::read.vcf() inside of a call to SNPfiltR::hard_filter(), to first read the given vcf file into the local R working environment as a vcfR object, and then to perform filtering on the vcfR object, and 3) directly specifying the full path to the given vcf file to VCFtools to filter the dataset and output a new, filtered vcf file. For each of these approaches, I recorded the runtime for filtering each of eight vcf files, subset from a real empirical vcf file, each containing 100 samples, and varying from 10K to 500K SNPs. All benchmarking was performed on a 2.3 GHz Dual-Core Intel Core i5 CPU, running MacOS Big Sur 11.5.1, with 8 GB 2133 MHz LPDDR3 SDRAM (i.e., a personal laptop with typical computing power), and exact runtimes were recorded with a precision of 1/1000th of a second using the functionmicrobenchmark() from the R package microbenchmark(Mersmann et al., 2015) for iterations executed in R, and the bash function ‘time’ for iterations executed using VCFtools. A fully documented example of this benchmarking process is available at: (devonderaad.github.io/SNPfiltR/articles/performance-benchmarking.html#benchmark-10k-1).