Comparison of telomere length estimates from WGS data
On average, significant differences were observed across bioinformatic approaches that measured telomere length (H = 230.06, df = 2,p < 0.001, Figure 2). However, despite these differences, individual genotype measures were highly correlated across approaches (r = 0.86 – 0.99). This suggests that regardless of the approach, estimates are comparable, but the scale of estimates differs on average. Variation in telomere length estimates may be attributed to differences in the bioinformatic approaches, including telomere identification (i.e., alignment or matching pattern approach), minimum number of consecutive telomeric repeats required in a read, and consideration of genome coverage. Computel was initially designed to estimate mean telomere length in humans but allows species-specific modification of genome features, including genome size, number of chromosomes, and telomere sequence to allow estimation across organisms . Here, we leveraged Computel to estimate telomere length in plants. Computel uses an alignment-based method by mapping reads from WGS data to a telomere reference created within the program. Only reads that align with the telomere reference are considered telomeric reads. In contrast, K-seek was not developed to estimate telomere length but was created to identify and count simple sequence repeats from WGS inDrosophila . We leveraged K-seek to estimate telomere length by identifying and counting the number of predicted Populus telomere repeats from WGS within each genotype. K-seek considers short repeats with a minimum repeating length of 50 bp within a read so that reads containing a minimum of seven telomeric repeats were identified in the analysis. This approach decreases the probability of capturing interstitial telomeres, which are telomeric repeats localized to intrachromosomal sites . However, unlike Computel, K-seek does not include other parameters that are known to influence telomere estimates such as genome coverage . Similarly, TRIP identifies short tandem repeat sequences from WGS, but this program was specifically created for telomere identification in insects . TRIP detects reads with more than four telomeric repeats per read, and like K-seek, does not consider genome coverage in telomere length estimates. Including genome coverage can influence telomere length estimates on average by reducing potential sequencing biases . Nersisyan and Arakelyan (2015) compared human telomere length using short read sequence data with varying degrees of coverage across the same individuals (0.2, 2 and 10x). They observed that the accuracy of telomere estimates improved with higher genome coverage. In our study, we removed one individual from the analyses as an outlier as it exhibited low genome coverage (< 12.31X, Table S1). Individuals assessed in our study had a minimum genome coverage of 15X, suggesting that this may be reasonable requirement to precisely estimate telomere length using WGS. (Table S1). However, further studies comparing the impact of varying genome coverage in telomere estimations are needed to support this recommendation in plants. Therefore, while there may be benefits to the matching approach used in K-seek and TRIP, ensuring that genome coverage is considered will be essential for future comparisons.