Predictors of formalin-fixed sample success
The amount of template DNA extracted from formalin-fixed samples weakly
predicted the number of high-quality SNPs (>5x coverage;
adjusted R2=0.28), suggesting that
~200 ng of extracted template was needed to recover> 25% of SNPs at > 5x coverage
(Fig. 2A; S1; S5). Thus, we recommend dividing extractions into multiple
replicates (from several pieces of tissue) to extract more DNA from
samples that give low DNA yield. However, a greater amount of extracted
endogenous DNA does not necessarily ensure downstream success as a
variety of factors can degrade DNA quality in formalin-fixed samples,
including specimen age, exposure to UV, temperature, and length of
formalin exposure (Hykin et al. 2015; Sawyer et al. 2012). Historical
samples typically contain highly fragmented DNA (Pääbo, 1989; Ewart et
al. 2019), and this could affect library preparation if most fragments
are too short for target probes to bind efficiently, even if relatively
high amounts of DNA were extracted. The large genomes of amphibians may
also require higher extraction yields (~200 ng in this
study) to successfully capture genome-wide targets (McCartney‐Melstad et
al., 2016), whereas studies of formalin-fixed reptiles have reported
successful sequence capture with as little as 1–3 ng/μl (Hykin et al.
2015; Ruane & Austin, 2017).
In addition, formalin-fixed sample extractions may contain high levels
of exogenous DNA, particularly when endogenous DNA yield is low. In the
four formalin-fixed samples with <10% of SNPs, levels of
exogenous sequence were all >30%, and as high as 81%. The
other six samples yielded > 94% endogenous sequence,
suggesting that the level of exogenous sequence is a strong predictor of
sample success. Rates of exogenous DNA from fluid-preserved specimens
have not been quantified in many studies, but Hykin et al. (2015) found
low rates of exogenous sequence in a formalin-fixed lizard (only 0.27%
of reads). By contrast, Lyra et al. (2020) extracted DNA from
ethanol-preserved frogs and identified a high proportion of bacterial
reads (based on BLAST search), and a low fraction of endogenous sequence
(<0.5% mapped to closely-related reference transcriptome).
Thus, it remains an open question how much endogenous DNA should be
expected from formalin-fixed extractions. Two of the samples in this
study with high rates of contamination were larval samples that had been
stored in formalin for several years. The other two samples were adult
specimens, and we are uncertain if the contamination occurred prior to
or during tissue subsampling, or if the tissue subsamples had such low
usable DNA that any exogenous DNA present was preferentially amplified
(Pääbo 1989).
Another factor that may impact sample outcomes is the tissue type used
for extractions. Studies seeking to extract DNA from formalin-fixed
samples typically sample liver or muscle tissue (Hykin et al. 2015,
Ruane & Austin, 2017; Pierson et al. 2020). Hykin et al. (2015)
compared extraction success between these two tissue types and extracted
higher yields from the liver replicates of Anolis lizard samples.
Ruane and Austin (2017) successfully extracted DNA from snake liver
tissues, while Pierson et al. (2020) were unable to extract suitable DNA
for PCR or library preparation from salamander tail muscle. Here we
compared success between muscle and liver replicates of specimen USNM
525133. We inferred double the rate of human contamination in the liver
replicate (6.3%) than in the muscle replicate (2.9%), but by all other
measures the liver replicate outperformed the muscle replicate,
including total DNA extracted, fragment length, total loci, total SNPs,
and average coverage. Taken together, these results suggest that DNA in
formalin-fixed specimens may remain better preserved in liver than in
muscle tissue, but future studies could test this hypothesis with larger
sample sizes and with samples of various ages.