2 Design and implementation
MeStudio consists of several tools that can be run individually or as part of a pipeline and uses a naive string matching algorithm to map motif sequences to the reference genome. The required input data consist in only three files: i) a FASTA file containing the genome sequence, ii) a genomic annotation file in GFF3 format and iii) another GFF3 containing the methylated nucleotide positions. The latter is automatically generated from the output of the SMRTlink software of Pacific Biosciences DNA sequencers. As a result, MeStudio produces several files including: (i) a text file with summarized statistics concerning the methylation occurrences along the genomic features, (ii) distribution plots and, (iii) BED files containing protein annotation of the genes in which methylated motifs have been found. A workflow is provided in Figure 1.