2.3 Post-processing
MeStudio implements a post-processing python script namedms_analyzR which takes MeStudio core output as input. In addition, to integrate comparative genomic analyses a “gene_presence_abscence.csv” file produced by Roary (Page et al. , 2015) can be used to define the methylation level and patterns of core and dispensable genome fractions, as well as to annotate the genes-coded proteins. ms_analyzR logs the total number of genes found for each category (CDS, nCDS, tIG, US). Additionally, methylation data are shown, such as i) total number of methylated sites, ii) total number of methylated genes, iii) the ID of the most methylated gene (geneID) and, iv) the product of that gene. Integrating data from Roary is functional to characterize the geneID associated with the name of the protein (as annotated by Prokka (Seemann, 2014)) as part of the core or dispensable genome. All the information is saved into a log file, together with plots accounting for the distribution of the methylations (Fig. 2A). To ensure customizability, ms_analyzR also includes two optional flags named “—make_chrom” and “—make_bed”. The “—make_chrom” flag saves into the previously specified output directory the GFFs at “chromosome level” rather than “category level”. Each GFF produced will be characterized not by category (CDS, nCDS, tIG and US) but by chromosomes (or contigs), maintaining the MeStudio core-derived contents and layout unaltered. The “—make_bed” flag produces a BED file for each feature in which is reported: i) the chrom column, with the name of each chromosome or contig, ii) start and iii) end of the feature, iv) the name of the geneID found in that interval, v) the number of methylations found for geneID and lastly vi) the protein product of the ID. Information contained in BED files can be readily used to plot the distribution of the methylation density for each feature, making use of thecirclize R package (https://github.com/jokergoo/circlize) (Fig. 2B).