Data Science and Machine Learning - Authorea

Meeting date:

2022-10-03

by author

by title

by keyword

Data Science and Its Importance

Chethiya Galkaduwa

and 1 more

February 16, 2024

Data science is an interdisciplinary subject that uses scientific methodologies, data mining techniques, machine-learning algorithms, and large amounts of data to extract information and insights. The article presents an overview of the current state and future possibilities of Data Science in a variety of sectors, explores the benefits, outlines the frameworks and methodologies employed, explains the current obstacles, and offers feasible solutions.

Understanding Natural Language Beyond Surface by LLMs

Yong Yang

January 06, 2024

In recent years, transformer-based models like BERT and ChatGPT/GPT-3/4 have shown remarkable performance in various natural language understanding tasks. However, it's crucial to note that while these models exhibit impressive surface-level language understanding, they may not truly understand the intent and meaning beyond the superficial sentences. This paper is a survey of studies of the popular Large Language Models (LLMs) from various research and industry papers and review the abilities in term of comprehending language understanding like what human have, revealing key challenges and limitations associated with popular LLMs including BERTology and GPT alike models.

Urbanisme tactique : de l'action temporaire à la régénération urbaine durable

Bochra hadj kilani

December 10, 2023

L'urbanisme tactique propose des approches de planification alternatives aux méthodes traditionnelles de conception urbaine. Axés sur la satisfaction des besoins des utilisateurs et la résolution des problèmes socio-urbains, ces projets utilisent des méthodes inclusives, intuitives, collaboratives, expérimentales et sensibles pour aborder la ville. Essentiellement, ces projets "ascendants" inspirent une nouvelle façon de penser la ville en construisant de nouvelles valeurs urbaines collectives. Cette analyse documentaire synthétise diverses perspectives sur l'urbanisme tactique, en explorant sa nature adaptative, son rôle de catalyseur économique, sa dimension d'autonomisation sociale, l'influence narrative des médias et son applicabilité à l'échelle mondiale. Collectivement, ces dimensions décrivent l'urbanisme tactique comme une force à multiples facettes qui façonne les aspects économiques, sociaux et culturels du développement urbain.

AI based Evaluation System for mitigating Supply Chain Risk

Utkarsh Mittal

December 10, 2023

A document by Utkarsh Mittal. Click on the document to view its contents.

Dynamic and Performance Modeling of Multistage Manufacturing Systems using Nonlinear...

Utkarsh Mittal

December 10, 2023

Modern manufacturing enterprises have invested in a variety of sensors and IT infrastructure to increase plant floor information visibility. This offers an unprecedented opportunity to track performances of manufacturing systems from a dynamic, as opposed to static, sense. Conventional static models are inadequate to model manufacturing system performance variations in real-time from these large non-stationary data sources. This paper addresses a physics-based approach to model the performance outputs (e.g., throughputs, uptimes, and yield rates) from a multi-stage manufacturing system. Unlike previous methods, degradation and repair dynamics that influence downtime distributions in such manufacturing systems are explicitly considered. Sigmoid function theory is used to remove discontinuities in the models. The resulting model is validated using real-world datasets acquired from the General Motor's assembly lines, and it is found to capture dynamics of downtime better than traditional exponential distribution based simulation models. Index Terms-nonlinear stochastic differential equation (n-SDE) model, mean time between failure (MTBF), mean time to repair (MTTR), recurrence analysis, multi-stage manufacturing systems

Translations Of Neural Networks based on Fuzzy Weights for Binary Keys within Delayed...

Amitesh Kumar Singam

November 14, 2023

The Key facts of Translations are meant to validated the Fuzzy Weights. Moreover, In our case Fuzzy weights are based on Logical Reasoning and Logical Thinking, even though fuzzy logics are not concerned with data rather it depends on thought process based on human brain Imitation. Generally, at some point mankind depends on his own creations and tries to understand its usage through Machine Language or so called Machine Teaching and this is were humans try to understand fuzzy concepts based on binary language, moreover this translations are meant to become complicated the more we go deeper but here comes our originality of introducing fuzzy weights or logic based on Switching Theory Logical design which produces binary keys instead of values, we concentrated on reducing complexity of Machine Teaching through fuzzy weights.

Data-Driven Machine Learning Approach in Reservoir Parameter Prediction

Vivian O Oguadinma

and 2 more

November 03, 2023

In the realm of reservoir engineering, the application of machine learning has emerged as a transformative force, offering unprecedented insights into reservoir parameter characterization. In this study, we present a comprehensive analysis of four distinct machine learning models, namely Bagging, Extra Tree Regressor, XGBoost, and Ridge, to elucidate their efficacy in predicting permeability, a critical parameter for reservoir characterization. Our findings reveal an understanding of each model's performance. The Bagging model, while demonstrating an impressive trained accuracy of 0.99, exhibits some uncertainty in high permeability predictions, casting slight shadows on its applicability for reservoir characterization. In contrast, the Extra Tree Regressor model outshines the Bagging model with a trained accuracy of 100% and a prediction accuracy of 99.8%. It boasts lower absolute and absolute percentage errors, reinforcing its ability in permeability prediction. However, the XGBoost model takes a unique approach by emphasizing the density-corrected log over gamma-ray and sonic logs. Despite achieving remarkable trained and predicted data accuracy exceeding 99%, its reliance on the corrected density log introduces a mean absolute percentage error above 10, warranting closer scrutiny. In contrast, the Ridge model struggles, evident from its high AIC reading, signifying its limited compatibility with permeability prediction. Joint plots and LMplot analyses further showcases model behaviors. The Extra Tree model exhibits a 99% confidence interval, underscoring its reliability with minimal underpredictions. Conversely, the Bagging and Ridge models show susceptibility to high uncertainties in permeability predictions, particularly at extreme values. Our study concludes that the Extra Tree Regressor model excels in permeability prediction, with potential applications in reservoir interval assessments. The XGBoost model, while competent in sandstone reservoir prediction, bears a higher uncertainty burden. The Bagging and Ridge models, due to their uncertainty challenges, are less suitable for non-reservoir and sandstone reservoir interval predictions. High permeability correlations with elevated porosity, reduced water saturation, and lower gamma ray readings highlight the reservoir intervals' distinct characteristics. These observations underscore the reliability of our models and their potential contributions to reservoir engineering practices.

The use of Computer Algorithm to identify genetic markers related to skin diseases an...

Prof Roberto Grobman

November 03, 2023

Prof. Roberto Grobman Keywords: skin, genetics, algorithms, biomarkers, wrinkles, aging, artificial intelligenceAbstractIntroduction: Skin, being the largest organ system in the body, is of utmost importance when it comes to timely diagnostics and treatment of skin conditions. Diagnostics, in history, have been dependent on symptoms and the doctor’s experience. Today, with advances in technology, is it possible to diagnose skin conditions more accurately and early. Skin imaging and deep learning have contributed immensely in very early diagnosis and hence a better prognosis. Artificial intelligence (AI) techniques have been applied in clinical genomics to identify genetic markers for predisposed conditions such as melanoma, psoriasis etc.Methods and results: Research and analysis of three studies were performed to obtain collective data on the current trends in skin disease diagnosis and mapping of genetic markers. AI shows a lot of promise in prediction of skin conditions and early treatment.Conclusion: Skin disease prognosis has been improved by the use of skinomics, microarray and AI techniques for accurate diagnostics and treatment.IntroductionThe skin is the largest organ of the body, composed of epidermis, dermis, and subcutaneous tissues, containing blood vessels, lymphatic vessels, nerves, and muscles, which can perspire, perceive the external temperature, and protect the body. Covering the entire body, the skin can protect multiple tissues and organs in the body from external invasions including artificial skin damage, chemical damage, adventitious viruses, and individuals’ immune system . Skin diseases have a big impact on everyday life and detecting underlying issues at the earliest is gaining importance. It is necessary to develop automatic methods in order to increase the accuracy of diagnosis for multitype skin diseases.Skin diseases and conditions are extremely prevalent, yet diagnostics are based on symptoms and the experience of the doctor. These are, often, not fool-proof and sometimes require a trial-and-error approach to diagnosis. Over the past few years, the image processing technique has achieved rapid development in medicine . A great example, the skin disease varicella was detected by Oyola and Arroyo through image processing technique’s colour transformation, equalization as well as edge detection, and the image of varicella was eventually collected and classified through Hough transform. The final empirical results demonstrated that a better diagnosis was received in terms of detection on varicella, and preliminary test was also conducted on varicella and herpes zoster on that basis. Sumithra et al. proposed a novel approach for automatic segmentation and classification of skin lesions by using SVM and k-nearest neighbor (k-NN) classifier. Kumar and Singh [20] established the relationship of skin cancer images across different types of neural network. Then, medical images were collected into this skin cancer classification system for training and testing based on the matlab image processing toolbox .Bioinformatics is a research field that uses computer‐based tools to investigate life sciences questions, employing “big data” results from large‐scale DNA sequencing, whole genomes, transcriptomes, metabolomes, populations, and biological systems, which can only be comprehensively viewed in silico. The epidermis was among the earliest targets of bioinformatics studies because it represents one of the most accessible targets for research. Consequently, bioinformatics methods in the fields of skin biology and dermatology generated a large volume of bioinformatics data, which led to origination of the term “skinomics.” Skinomics data are directed toward epidermal differentiation, malignancies, inflammation, allergens, and irritants, the effects of ultraviolet (UV) light, wound healing, the microbiome, stem cells, etc. Cultures of cutaneous cell types, keratinocytes, fibroblasts, melanocytes, etc., as well as skin from human volunteers and from animal models, have been extensively experimented on . We are presenting some combined research information on diagnostic imaging and application of bioinformatics in skin diseases through this article.Methods and resultsBioinformatics is an interdisciplinary field of knowledge that combines computer science, biology and biomedical sciences and statistics. Bioinformatics is oriented to the application and development of new computational methods to expand biological, biomedical or epidemiological knowledge.We used a data set provided by Transceptar Technologies/FullDNA, from Israel. The algorithm developed by Transceptar Technologies TRCPR18 has AI-based technology and allows the analysis of millions of data in a few seconds, taking into account the orientation of the gene and proceeding with various types of predisposition calculations. The Transceptar / FullDNA algorithm analyzes more than 61 skin-related conditions and this information was used to confirm previous research.Recent developments in high-speed technologies have led to a major revolution in biological and biomedical research and where today bioinformatics plays an increasingly central role in the analysis of large amounts of data.Literature from three studies were researched to summarise modern advances in skin disease diagnostics using Artificial Intelligence (AI), bioinformatics, skin imaging and machine learning.Imaging and deep learning applications:A study conducted by Patnaik et al. researched an approach to use various computer vision based techniques (deep learning) to automatically predict the various kinds of skin diseases. The system uses three publicly available image recognition architectures namely Inception V3, Inception Resnet V2, Mobile Net with modifications for skin disease application and successfully predicts the skin disease based on maximum voting from the three networks. The study approach involved development of a widespread plan to test the special features and general functionality on a range of platform combination, initiated by the test process. The method involves use of pre-trained image recognizers with modifications to identify skin images. The use of deep learning and ensembling features, results showed higher accuracy rate along with identification of more diseases. Previous models reported a maximum of six skin diseases with an accuracy level of 75% compared to as many as twenty diseases with an accuracy of 88%, in the study conducted by Patnaik et al. This proves that deep learning algorithms have a huge potential in the real world skin disease diagnosis .Microarray and skinomics applications:The most commonly used and highly preferred methodology in skinomics is DNA microarray technology, such as Affymetrix and Illumina. DNA microarrays are a perfect medium as they simultaneously measure the expression of the entire genome . Printed cDNA arrays, originated by Brown at Stanford , are often homemade, inexpensive, and can compare two samples on the same chip. Commercial alternatives such as oligonucleotide microarrays are available too, but a little expensive. These techniques offer personalized medication and find broad applications in the future. Microarray technology can be applied in skin ageing studies, UV damage studies, transcriptional studies in melanoma and wound healing studies. Genome‐wide association studies, GWAS, comprise examination of many common DNA polymorphisms in a large population cohort to detect association of polymorphisms with a given disease. Such polymorphisms can point to the genes where disease‐causing mutations may map. GWAS are particularly useful in the analysis of diseases, such as psoriasis, which are common and with a strong genetic component .Artificial intelligence in clinical genomics:Most artificial intelligence techniques have been adapted to address the various steps involved in clinical genomic analysis—including variant calling, genome annotation, variant classification, and phenotype-to-genotype correspondence—and perhaps eventually they can also be applied for genotype-to-phenotype predictions . AI has proven to be highly effective in the following areas:Variant Calling : The clinical interpretation of genomes is sensitive to the identification of individual genetic variants among the millions populating each genome, necessitating extreme accuracy. Standard variant-calling tools are prone to systematic errors that are associated with the subtleties of sample preparation, sequencing technology, sequence context, and the sometimes unpredictable influence of biology such as somatic mosaicism . AI algorithms can learn these biases from a single genome with a known gold standard of reference variant calls and produce superior variant calls .Phenotype-to-genotype mapping : The molecular diagnosis of skin disease often requires both the identification of candidate pathogenic variants and a determination of the correspondence between the diseased individual’s phenotype and those expected to result from each candidate pathogenic variant. AI algorithms can significantly enhance the mapping of phenotype to genotype, especially through the extraction of higher-level diagnostic concepts that are embedded in medical images and EHRs .Genotype-to-phenotype prediction : The ultimate purpose of clinical genetics is to provide diagnoses and forecasts of future disease risk. Although, not many successful predictions have been made in literature yet, this shows promise in the fact that a few simple studies have shown to accurately predict conditions .Conclusion:AI systems have surpassed the performance of state-of-the-art methods and have gained FDA clearance for a variety of clinical diagnostics, especially imaging-based diagnostics. The availability of large datasets for training, together with advances in AI algorithms is driving this surge of productivity. Deep-learning algorithms have shown tremendous promise in a variety of clinical genomics tasks such as variant calling, genome annotation, and functional impact prediction. It is possible that more generalized AI tools will become the standard in these areas, especially for clinical genomics tasks where inference from complex data is a frequently recurring task .The application of AI in medicine is a burgeoning area of development in light of the major impact it could potentially have on healthcare provision. The application of machine learning in medical imaging on skin lesions has been the most impactful, and demonstrates the potential for this technology in medical practice .

Satellite-based long-term spatiotemporal trends in ambient NO2 concentrations and att...

Keyong Huang

and 4 more

March 06, 2023

Limited research has assessed the spatio-temporal distribution and chronic health effects of NO2 exposure, especially in developing countries, due to the lack of historical NO2 data. A gap-filling model was first adopted to impute the missing NO2 column densities from satellite, then an ensemble machine learning model incorporating three base learners was developed to estimate the spatiotemporal pattern of monthly mean NO2 concentrations at 0.05° spatial resolution from 2005 to 2020 in China. Further, we applied the exposure dataset with epidemiologically derived exposure response relations to estimate the annual NO2 associated mortality burdens in China. The coverage of satellite NO2 column densities increased from 46.9% to 100% after gap-filling. The ensemble model predictions had good agreement with observations, and the overall, temporal and spatial cross-validation (CV) R2 were 0.88, 0.82 and 0.73, respectively. In addition, our model can provide accurate historical NO2 concentrations, with both by-year CV R2 and external separate year validation R2 achieving 0.80. The estimated national NO2 levels showed a increasing trend during 2005-2011, then decreased gradually until 2020, especially in 2012-2015. The estimated annual mortality burden attributable to long-term NO2 exposure ranged from 305 thousand to 416 thousand, and varied considerably across provinces in China. This satellite-based ensemble model could provide reliable long-term NO2 predictions at a high spatial resolution with complete coverage for environmental and epidemiological studies in China. Our results also highlighted the heavy disease burden by NO2 and call for more targeted policies to reduce the emission of nitrogen oxides in China.

Estimation of socioeconomic indicators through satellite imagery - Analysis of urban...

Carlos Massao Oishi Giuzio

and 5 more

September 26, 2022

The NEXUS area covers approximately 30% of the Brazilian territory. In order to assist preservation and sustainable development policies in that region, this study proposes to replicate the work done by Yeh et al in Africa , in which a convolutional neural network estimates indicators through satellite images, each covering a region of approximately 45 km². This work compares the size and distribution of Brazil’s census tracts with those in Africa to define if the scale of images can be maintained and to define the clusters that will be used. To avoid biasing the model, special care must be taken in selecting clusters, such as keeping a balance between urban and rural sectors and, most importantly, making sure that there is little to no overlap of clusters. To do so, two approaches were proposed. The first one samples tracts in each municipality as centroids for clusters, the second merges neighboring urban tracts into a single group and fits clusters to these groups.

Characterization of deforestation patterns in Amazon

Marcio Teixeira

and 8 more

September 29, 2022

Amazon rainforest has been subject to intensive deforestation in the last decades, for example, illegal logging and creating pasture areas. A characteristic pattern of deforestation seen from space is the “fishbone” shape, which usually appears near roads, rivers and its tributaries. Indeed, others, more subtle, still need to be identified. These fishbone images are spatiotemporal patterns that need to be more explored with feature extraction methods. In computer vision, morphological features such as flatness, compactness, circularity, perimeter, area, and centroid are well-known to characterize the appearance of an object. In this work, we aim to characterize the shapes of deforestation in its early stages and its evolution in time, particularly in the Amazon rainforest. Thus, we propose to analyze satellite images of these regions to crop and segment by using shape features.

Evaluation of Machine Learning Models for Species Distribution Modeling in the Amazon

Renato Miyaji

and 2 more

September 19, 2022

Species Distribution Modelling (SDM) is widely used by ecologists to monitor biodiversity and manage wildlife. In the last decades, Artificial Intelligence (AI) and Machine Learning (ML) techniques became popular and were successfully applied for different tasks, including SDM. The objective of this article was to evaluate Machine Learning models for Species Distribution Modeling in the Amazon Basin region near Manaus (AM), based on meteorological and aerosol data collected by the GoAmazon 2014/15 project. The techniques were evaluated regarding their accuracy and the Decision Tree Classifier and the Maximum Entropy Model obtained good predictive performances.

MULTI-SENSORS DATA QUALITY TOOLS FOR PRECIPITATION ON THE AMAZON REGION

Thomaz Pougy

and 3 more

September 23, 2022

The Brazilian National Institute for Space Research (INPE) produces research that helps to understand climate and weather dynamics in Brazil and in the world, with significant impacts on national public and private strategic planning. Among the essential information for the studies are the rainfall data. In this context, ensuring the quality of this data has a direct impact on the reliability of the forecasts and analysis generated from them. Thus, this study, which is a partnership between INPE, the Laboratory of Atmospheric Physics and Polytechnic School of USP and the ARM-DoE (Atmospheric Radiation Measurement Climate Research Facility), aimed to establish computational tools that could deal with the quality of data from rain in accordance with the main international directives. Thus, it was proposed for this study the development of a specific toolkit for data from the Micro Rain Radar (MRR), disdrometers PARSIVEL2 and RD80, and rain gauge that would help researchers from INPE, USP and partners to: standardize the preparation of raw data for internationally accepted formats; processing figures to support quick analyses; analyze and process data quality and, finally, record metadata and quality analysis for publication in international data repositories.

Deep learning methodology for predicting socioeconomic indicators in Vale do Ribeira...

Isabella de Melo Sousa

and 2 more

September 26, 2022

Key measures of socioeconomic indicators are essential for making informed policy decisions, but due to the high costs and operational difficulties of traditional data collection efforts, obtaining reliable socioeconomic data remains a challenge, particularly in developing countries. This work presents a deep learning methodology to estimate socioeconomic indicators using satellite imagery. The neural network model developed was trained at the Brazilian region of Vale do Ribeira with the goal of analyzing the socioeconomic indicator of income. The preliminary results showed that models using nightlight (NL) or multispectral daytime (MS) imagery performed better than models trained only on RGB bands and that models trained exclusively on NL or MS imagery performed similar to one another and nearly as well as the combined model MS+NL. Finally, the model yielded a low performance (R2 = 0.289), but it is still promising once the dataset employed was considerably smaller than the one used in the original study that attempted to replicate.

Prediction of air pollution events in São Paulo based on surface meteorological varia...

Andre Gomes Bessa Miranda

and 1 more

September 26, 2022

Large urban centers like the Metropolitan Region of São Paulo (MASP) are impacted by air pollution, especially by Inhalable particle matter (PM10). Persistent exceedance events (PEE) are defined as exceedance events that last for many consecutive days and occur simultaneously at many air quality monitoring stations across the MASP. This study aims to develop a predictive model for the occurrence of PEE in the MASP based on surface meteorological variables. Hourly PM10 concentrations from 12 air quality monitoring stations in the MASP between 2005 and 2021 were provided by the São Paulo State Environmental Agency (CETESB). Daily data on surface meteorological variables were provided by the IAG/USP meteorological station. Persistent exceedance events (PEE) were identified using the criteria: exceedance events that occurred simultaneously in at least 50% monitoring stations, persisting for at least 5 consecutive days. PEE occurrence was represented as a timeseries of a binary variable. The resulting daily dataset had 6204 lines and 13 attributes, without missing values. The dataset was divided into a training set (80%) and a test set (20%). A logistic regression model was applied, having the PEE occurrence (positive = 1) as the target value. The Variance Inflation Factor and the Stepwise Feature Selection method was applied to obtain an optimized subset of predictors. Model accuracy was accessed by the ROC curve and by a confusion matrix. Results indicate that PEE can be satisfactorily predicted by surface meteorological variables using a logistic regression. As for the next steps, we intend to extract easy-tocommunicate classification rules, aiming to support the development of warnings systems for air quality poor conditions in the MASP.

Solar Energy and Data Science: a prediction study for Manaus and the Amazon Basin

andre luis ferreira marques

and 3 more

September 27, 2022

The renewable energies have confirmed their potential to reshape the energy matrix of several countries, in the last two decades, reinforcing the set of actions to better protect the environment. For instance, the solar and wind sources of energy production can dim the production and release of pollutants associated to electricity production. Unfortunately, the atmospheric temperature rise has been linked to the increase of the gas emissions, such as Carbon Dioxide (CO2) and Methane (CH4), from manifold sources, such as the thermal power stations for electricity generation. The Amazon basin has caught special importance for Brazil and abroad, and more recently, the use of satellites, fixed instrumented stations and airborne surveys has provided data focused on studies of the environment impact. This work deals with the assessment of the solar irradiation on the Manaus city, the largest city in the Western Amazon region, using Data Science tools, in a way to help the evaluation of the renewable energy in that region.

Neural Network model for classification of net CO2 fluxes scenarios in Tapajós Forest...

Lucas Bauer

and 5 more

September 27, 2022

The Amazon rainforest has a great influence on the global energy balance and carbon fluxes, responsible for the net removal of approximately 4 million tons of carbon per year, via photosynthetic activity. Climate change and deforestation have impacts on the carbon budget in Amazonia, transforming CO2 sink areas into sources. Given the complexity of the factors that govern the carbon exchange in the Amazon and its influence on biological processes, the use of Data science strategies can promote a better understanding about the main environmental factors for different scenarios, and also, assist in public policies to mitigate the global warming effects. This study aims to identify the environmental factors that determine the temporal variability of carbon exchanges between the biosphere and the atmosphere in the Tapajós National Forest, in the Amazon, applying Data Science strategies in an integrated set of environmental data from energy and carbon fluxes and remote sensing data. The specific objective is to assess the influence of a selected set of environmental variables on the variability of carbon exchanges, with the use of an artificial neural networks classification model to identify the variables with great impact on source, sink and neutrality scenarios in Tapajós National Forest. Data Science strategies were applied to an integrated dataset of ground-based carbon flux measurements and remote sensing data, considering the period between 2002 and 2006. An artificial neural network (ANN) classification model was developed to identify the environmental variables with great impact on carbon source, sink and neutrality conditions. The average global score of ANN model was 65%. It was possible to identify the predictor variables with greatest impact to the carbon sink condition: radiation at the top of the atmosphere, sensible and latent energy fluxes and leaf area index. Thus, the ANN model with an ensemble of Data Science strategies can improve a better understanding of variability CO2 fluxes and be a powerful tool to promote new knowledge.

Spatial Sensitivity of Complex Network Communities in the Amazon Basin in Relation to...

Cesar Arturo Sanchez Pena

and 3 more

September 27, 2022

The complex network is a method with a high flexibility and easy application. Complex Network allows extracting relevant information from the system, like its organization and dynamics, as well as different indices that allow obtaining particular characteristics. This work studies the communities present on the rain network in the Amazon basin for the austral summer. Summer was used due to the presence of the South American monsoon system (SAMS), since this is the greatest mechanism for modulating precipitation over South America. Once the communities were obtained, the minimum correlation value (MCV) was varied in order to verify the spatial variations of the communities. Where it was verified how certain communities are composed of subcommunities while others simply disappear. Finally, it is shown how the spatial distribution of the subcommunities shows a relationship with the presence of SAMS. However, more detailed analyzes are needed for each of these communities.