Sequence Data Processing and Statistical Analysis
(1) The DNA matrix was calculated. The UPARSE program (Edgar, 2013) was employed to divide the operational taxonomic units (OTUs) and cluster all the sequences considered very similar (i.e., with a minimum 97% identity threshold). The RDP-Classifier software v2.11 (https://sourceforge.net/projects/rdp-classifier/) was used to classify and annotate the sequences of each leaf sample, with the confidence parameter set to 50%. Microorganisms corresponding to sequences with a genetic distance of less than 3% are generally considered to belong to the same microbiota (Wang et al., 2007).
(2) The data were homogenized, and the number of OTUs at different taxonomic levels (genus and phylum) were calculated and their rarefication curves plotted. To assess the alpha diversity of sequences in each sample, the Chao1, Ace, Shannon, and Simpson indexes were used.
(3) Phyllosphere microbiota in leaf samples from different regions were subjected to principal component analysis (PCA) (Gewersand et al., 2021) and nonmetric multidimensional scaling (NMDS) multivariate analyses (the correlations used in the PCA were linear).
(4) A Kruskal–Wallis (KW) sum-rank test was performed using the linear discriminant analysis effect size (LEfSe) algorithm to compare the distribution of OTUs between the leaves of rubber trees in different regions.