3 Case study
In order to show the performance of MeStudio, a recently published SMRT
dataset was used (diCenzo et al. , 2022) comparing some of the
methylation features of two Sinorhizobium meliloti strains, 1021
and FSM-MA, grown until stationary phase in minimal medium (Table 2,
Figure 2B) (diCenzo et al. , 2022). On the SMRT assembled reads of
the genomes of the two strains, MeStudio was able to identify a total of
28 motifs (Table 2). All but six motifs (namely AGAAAAT, DCTGCAGGS,
RAGCWGCTY, RAGCWGCTY, RCTGCAGGS, TGGGCA) were common to both strains.
The number of retrieved methylated sites ranged from a few units
(especially for private motifs, those present in one strain only) to
several thousands (as GANTC, which is a classical motif methylated by
the CcrM DNA methylase and its involved in cell cycle regulation
(Mouammine and Collier, 2018). CDS and nCDS showed similar values, as
expected for methylation being present on both DNA strands. Intergenic
sequences (tIG) showed the lowest number of methylated sites, while
upstream sequences to a gene (UP), bona fide corresponding to
putative promoter regions reported values generally one order of
magnitude higher than tIG and in some cases differences in values
between strains ranged around two-fold (e.g., CTYCCAG and GCCAGG).
Finally, the presence of motifs in one strain only, may suggest the
occurrence of strain-specific Restriction-Modification systems, though
the small number of methylated sites may also suggest alternative
hypotheses (i.e., methylation on some genomic regions only related to
regulation of expression at specific loci). Demo files for input and
output are available athttps://github.com/combogenomics/MeStudio.
4 Discussion
We have reported here the description of a novel software (MeStudio) for
the analysis of DNA methylation profiles obtained by single molecule
real time sequencing. MeStudio has several novel and useful features
compared to the few existing tools, as it provides outputs in the form
of GFF and BED files which contain information on the position of
methylated sites and methylated motifs, the number of methylated sites
and profiles for each genomic feature and graphical outputs. The genomic
features analysed include genic and intergenic regions (hence comprising
putative promoters), allowing the formulation of hypotheses related to
the importance of DNA methylation on regulation of gene expression and
on other relevant biological phenomena. Besides being developed for
prokaryotic genomes, MeStudio can handle any kind of sequence, by simply
providing a suitable set of input files (Figure 1). By providing
information on motif occurrence and genomic localization, MeStudio
provides the basis for comparative analyses of DNA methylation profiles
among strains, in terms of evolutionary studies on populations and
species and epigenomic modifications during adaptation and development.
Finally, MeStudio is very user friendly given its easy installation and
its possibility to be run as a pipeline, in a single command line call.
We’ve developed the scripts in a Mac and Linux kernel environments, with
the possibility in the near future to expand to Windows platforms as
well.