2.3 Post-processing
MeStudio implements a post-processing python script namedms_analyzR which takes MeStudio core output as input. In
addition, to integrate comparative genomic analyses a
“gene_presence_abscence.csv” file produced by Roary (Page et
al. , 2015) can be used to define the methylation level and patterns of
core and dispensable genome fractions, as well as to annotate the
genes-coded proteins. ms_analyzR logs the total number of genes
found for each category (CDS, nCDS, tIG, US). Additionally, methylation
data are shown, such as i) total number of methylated sites, ii) total
number of methylated genes, iii) the ID of the most methylated gene
(geneID) and, iv) the product of that gene. Integrating data from Roary
is functional to characterize the geneID associated with the name of the
protein (as annotated by Prokka (Seemann, 2014)) as part of the core or
dispensable genome. All the information is saved into a log file,
together with plots accounting for the distribution of the methylations
(Fig. 2A). To ensure customizability, ms_analyzR also includes
two optional flags named “—make_chrom” and “—make_bed”. The
“—make_chrom” flag saves into the previously specified output
directory the GFFs at “chromosome level” rather than “category
level”. Each GFF produced will be characterized not by category (CDS,
nCDS, tIG and US) but by chromosomes (or contigs), maintaining the
MeStudio core-derived contents and layout unaltered. The
“—make_bed” flag produces a BED file for each feature in which is
reported: i) the chrom column, with the name of each chromosome
or contig, ii) start and iii) end of the feature, iv) the name of the
geneID found in that interval, v) the number of methylations found for
geneID and lastly vi) the protein product of the ID. Information
contained in BED files can be readily used to plot the distribution of
the methylation density for each feature, making use of thecirclize R package (https://github.com/jokergoo/circlize) (Fig.
2B).