Methylation data selection
Across all 74 samples, individuals were sequenced with an average of 51,482,011 (range: 24,942,071 - 82,760,425) reads. After conversion of unmapped bam files to fastq with SamToFastq.jar frompicard v1.118 , the quality of the sequencing reads was checked using Fastqc v0.11.5 (Andrews 2010) and Multiqc v1.8 (Ewels et al. 2016). Spiked-in sequences were used to estimate over- and underconversion using RefFreeDMA (Klughammeret al. 2015). Trimgalore v0.3.3 (implemented inRefFreeDMA; Krueger et al. , 2021) was used to remove adaptors to guarantee good mapping and remove low quality bases (≥20) as well as short read fragments (≥ 16 bp).
We used Bismark v0.22.3 (Krueger & Andrews 2011) to prepare and index the reference genome for subsequent mapping of the bisulfite-converted reads using Bowtie 2 v2.4.1 (Langmead & Salzberg 2012), allowing for one mismatch (–score_min L,0,-0.20). Methylation extraction was conducted with Bismark Extractor v0.22.3 (Krueger & Andrews 2011) with the ‘ignore’ option to remove unmethylated cytosines introduced during the end-repair step. The R-package Methylkit v.1.16.1 in R v.4.0 (Akalin et al. 2012; R Core Team, 2021) and its function methRead were used to load and analyse the methylation calls.
Methylation calls were filtered to those with a minimum coverage of 10 reads per CpG position. CpGs with a coverage >99.9th percentile most likely result from PCR bias and were removed. Across all 74 samples, there were on average 6,361,176 CpG positions before, and 539,538 CpG positions after, filtering for >10x coverage. Methylation call distributions between samples were normalized using thenormalizeCoverage function implemented in Methylkit to reduce the bias of systematic oversampling in some samples. We then merged CpG positions of the different samples using the unitefunction to make sure positions were sequenced in at least 70% (N=52) of the samples to facilitate a longitudinal analysis approach. This also meant we only covered CpG positions on autosomes, as CpGs from the sex chromosomes were excluded. Following Meng et al. (2010) and Sziráki et al. (2018), sites that showed little or no variation were removed as well, applying a threshold of a standard deviation <0.1. This resulted in 1,365,573 observations of 23,647 sites, used to determine methylation levels as the fractional methylation rate per CpG position per individual.