Methylation data selection
Across all 74 samples, individuals were sequenced with an average of
51,482,011 (range: 24,942,071 - 82,760,425) reads. After conversion of
unmapped bam files to fastq with SamToFastq.jar frompicard v1.118 , the quality of the sequencing reads was
checked using Fastqc v0.11.5 (Andrews 2010) and Multiqc
v1.8 (Ewels et al. 2016). Spiked-in sequences were used to
estimate over- and underconversion using RefFreeDMA (Klughammeret al. 2015). Trimgalore v0.3.3 (implemented inRefFreeDMA; Krueger et al. , 2021) was used to remove
adaptors to guarantee good mapping and remove low quality bases (≥20) as
well as short read fragments (≥ 16 bp).
We used Bismark v0.22.3 (Krueger & Andrews 2011) to
prepare and index the reference genome for subsequent mapping of the
bisulfite-converted reads using Bowtie 2 v2.4.1 (Langmead
& Salzberg 2012), allowing for one mismatch (–score_min L,0,-0.20).
Methylation extraction was conducted with Bismark Extractor
v0.22.3 (Krueger & Andrews 2011) with the ‘ignore’ option to remove
unmethylated cytosines introduced during the end-repair step. The
R-package Methylkit v.1.16.1 in R v.4.0 (Akalin et
al. 2012; R Core Team, 2021) and its function methRead were used
to load and analyse the methylation calls.
Methylation calls were filtered to those with a minimum coverage of 10
reads per CpG position. CpGs with a coverage >99.9th
percentile most likely result from PCR bias and were removed. Across all
74 samples, there were on average 6,361,176 CpG positions before, and
539,538 CpG positions after, filtering for >10x coverage.
Methylation call distributions between samples were normalized using thenormalizeCoverage function implemented in Methylkit to
reduce the bias of systematic oversampling in some samples. We then
merged CpG positions of the different samples using the unitefunction to make sure positions were sequenced in at least 70% (N=52)
of the samples to facilitate a longitudinal analysis approach. This also
meant we only covered CpG positions on autosomes, as CpGs from the sex
chromosomes were excluded. Following Meng et al. (2010) and
Sziráki et al. (2018), sites that showed little or no variation
were removed as well, applying a threshold of a standard deviation
<0.1. This resulted in 1,365,573 observations of 23,647 sites,
used to determine methylation levels as the fractional methylation rate
per CpG position per individual.