Telomere length estimates correlated between WGS and qPCR
We observed a correlation (r = 0.50 – 0.66) between telomere length estimates using qPCR and WGS data. Previous studies using humans observed a moderate to high correlation between qPCR and WGS data (r = 0.66 – 0.95, . For Populus trees, lower correlations may reflect differences in bioinformatics approaches that filter out interstitial telomeric sequences and tolerate telomere repeat variants.
Telomere studies in humans avoid the inclusion of interstitial telomeric repeats by identifying telomeric reads that are aligned with the telomere region in the reference genome. Programs such as TelSeq , Telomere Hunter , and Telomerecat require alignment of reads to a reference genome prior to telomere calculations (i.e., BAM files) and telomere length estimations are based on reads that align to the telomeric region of the genome. This approach reduces the probability of capturing interstitial telomeric repeats, but the use and efficiency of the program depend heavily on the completeness of the telomere-to-telomere assembly of the reference genome . While telomere-to-telomere assemblies are available in humans most species lack high resolution across telomeric regions. For the Populus trichocarpa reference genome, the telomeric region contains a significant proportion of ambiguous bases (i.e., NNNN) reducing the probability of alignment to the telomere region. In this study, we limited the estimation of telomere length to programs that used unmapped read sequence data. Thus, telomere estimates from both K-seek and TRIP may include some interstitial telomeric repeats. Nonetheless, previous studies have shown that considering consecutive telomeric repeats decreases the probability of capturing interstitial telomeric repeats and increases the correlation between qPCR and TRF telomere estimates . K-seek and TRIP considered only reads with more than four and seven consecutive repeats, respectively, reducing the probability of capturing interstitial telomeres in this study. In contrast, Computel decreases the capture of interstitial telomeres by using reads that align with the telomeric reference created by the program . For non-model species, it is possible to use sequence data unmapped to a reference genome to estimate telomere length, however consideration of potential caveats in telomere length estimations from interstitial repeats is required.
Additional bioinformatic programs such as TelomeHunter and qmotif allow telomere repeats to deviate from the typical human telomere repeat, TTAGGG . Although telomeric repeats are generally conserved within a species, deviations from the typical telomere repeat have been reported in humans and are frequently considered into telomere calculations . In plants, telomere variants that deviate from the Arabidopsis type (TTTAGGG) are reported between taxa, with some families such as Alliaceae exhibiting novel telomere sequences, CTCGGTTATGGG . However, to date there is limited empirical data comparing telomere repeat variation within species. In the present study, we searched only for the telomere repeat TTTAGGG previously reported as the Populustelomeric sequence . Despite this, potential Populus telomere repeat variants can be visually detected through manual inspection of the Populus trichocarpa reference genome. If telomeric variants within Populus were excluded, our correlations between WGS and qPCR may increase. Thus, the identification of intraspecific telomere repeat variation in plants, coupled with new bioinformatic approaches that include telomere repeat diversity will improve telomere estimations.
Telomere length estimates in plants are currently limited to programs that allow modification of telomere repeat pattern and species-specific genome features. Telomere repeat pattern is taxa-dependent with most vertebrates sharing the human telomere repeat pattern, TTAGGG . Multiple programs listed above, including TelomereHunter, Telseq and Telomerecat, were created to identify human telomere repeats limiting the repeat search to the vertebrate telomere type . In addition, telomere estimates for these programs are performed considering human genome features, such as number of chromosomes and genome length. Plants have different telomere repeat patterns, generally TTTAGGG, deviating from the human telomeric type . To our knowledge, the only program that allows the modification of telomere repeat patterns and genomic features is Computel . Computel allows uses species-specific genome features, including telomere pattern, number of chromosomes, and genome size. The greatest correlation between WGS and qPCR (r = 0.66) was observed for Computel. Previous studies indicate that Computel performs similarly to other bioinformatic approaches , but as the field of telomere ecology expands increased flexibility to modify the telomere repeat pattern and include species-specific genome features will be required to extend applications.
Accurate measurement of telomere length is needed to deploy telomeres as potential biomarkers to quantify organismal response to abiotic and biotic stress. Although qPCR has been used extensively due to its accessibility and opportunities for high throughput analysis, it provides only a relative measurement rather than an absolute measure of telomere length Furthermore, qPCR accuracy in assaying telomere length is susceptible to potential variations in the reference control gene, primer efficiency, and inter-assay variability . WGS can provide a high-resolution assessment of the telomeric regions allowing for precise quantification of absolute telomere length. In addition, WGS allows detection of mutations within the telomeric regions and permits telomere length assessment on an individual chromosome basis. Thus, while WGS can be computationally intensive and potentially cost-prohibitive for large-scale studies, WGS can enhance the accuracy of current telomere length methods, particularly for techniques involving subtelomeric primers or probes, using sequence data .