2.3 Soil microbial community (SMC) analysis
Soil microbial community (SMC) composition was analyzed through
Illumina® MiSeq whole genome sequencing (Chen et al., 2017). Soil DNA
was extracted from the fresh soil samples stored at -20oC (N=75; Section 2.2) using the
FastDNATM SPIN kit (MP Biomedicals, Solon, OH, USA)
and following the manufacturer’s instructions. DNA concentration and
purity were measured with the NanoDrop One kit (Thermo Fisher
Scientific, MA, USA). The V4–V5 region of the bacterial 16S rRNA genes
were amplified using the primer sets 515F (5’-GTGCCAGCMGCCGCGGTAA-3’)
and 907R (CCGTCAATTCMTTTRAGTTT) from Invitrogen (Invitrogen, Carlsbad,
CA, USA). PCR reactions were carried out using the BioRad S1000
apparatus (Bio-Rad Laboratory, CA, USA). PCR products were mixed in
ratios of equal density as specified by the GeneTools Analysis Software
(Version4.03.05.0, SynGene). Accordingly, a 50 µl volume of PCR mix
contained 25 μl Premix Taq (Takara Biotechnology, Dalian Co. Ltd.,
China), 1 μl of each rRNA primer (10 M) and 3 μl of DNA (20 ng
μl-1) template. The PCR mix was amplified following
the thermos-cycling protocol: (i) 5 min at 94°C for initialization; (ii)
30 cycles of 30 s at 94°C for denaturation, 30 s at 52°C for annealing,
and 30 s at 72°C for extension; (iii) 10 min at 72°C for final
elongation. The PCR products were purified with the EZNA Gel Extraction
Kit (Omega, USA).
A whole genome sequencing library was generated using the NEBNext®
Ultra™ DNA Library Prep Kit for Illumina® (New England Biolabs, MA, USA)
following the manufacturer’s recommendations. The quality of the library
was assessed with the Qubit@ 2.0 Fluorometer (Thermo Fisher Scientific,
MA, USA) and the Agilent Bioanalyzer 2100 system (Agilent Technologies,
Waldbron, Germany). Then, the generated library was sequenced on the
Illumina Hiseq 2500 platform from which 250 paired-end, base pair (bp)
reads were created (Guangdong Magigene Biotechnology Co., Ltd.
Guangzhou, China). Paired-end raw reads were filtered and
quality-checked following the Trimmomatic v0.33 tool
(http://www.usadellab.org/cms/?page=trimmomatic). Paired-end clean reads were merged using the FLASH v1.2.11 software
(https://ccb.jhu.edu/software/FLASH/).
Genome sequences were assigned to each soil sample (Section 2.2) based
on their unique barcode and primer using the Mothur v1.35.1 software
(http://www.mothur.org).
The detection of potential chimaera and clustering of the retrieved rRNA
sequences were performed with the usearch v10 software
(http://www.drive5.com/usearch/).
Operational taxonomic unit (OTU) were defined as clusters of sequences
with ≥97% similarity. For each sequence, the silva database
(https://www.arb-silva.de/) was used to annotate taxonomic information,
setting the confidence threshold to default at ≥0.5. According to the
taxonomic results, OTU relative abundance diagrams were built from which
rich infrared images were obtained using Origin 9.1 (OriginLab,
Northampton, MA, USA).
Data availability: The raw sequence data for 16S rRNA gene have been
deposited under NCBI Sequence Read Archive with accession number
PRJNA674491.
2.4 Molecular ecological networks
(MENs)
To investigate the effects of reclamation time on interactions and key
species of SMC, phylogenetic molecular ecological networks (MENs; Zhou
et al., 2011) were established per reclamation site (r8, r11, r14 and
r17) using the information retrieved from the SMC sequence data (Section
2.3). MENs were built using the computational pipeline MENA
(http://ieg4.rccc.ou.edu/mena)
and plotted using the Cytoscape v3.7.1 software (Doncheva et al., 2018)
for further analysis and characterization. Only those OTUs detected in
all the analyzed soil samples were considered to build the MENs. Each
MEN was characterized in the light of its topology (Tu et al., 2020) –
i.e. the total number of nodes (species or OTUs) and links between the
nodes. The quantity of nodes within a network was quantified by
establishing a power-law distribution (Deng et al., 2012), for which we
quantified the coefficient of determination
(R 2, between 0 and 1) to classify the network
into scale-free (i.e. most nodes have few neighbors; Girvan and Newman
2002; Tu et al., 2020). The links between the nodes were characterized
in the light of the average node degree or average connectivity (avgK),
which was used to classify the network into a small-world (i.e. the
average distance between two nodes is short, meaning that the nodes in
the network are closely related with each other; Amaral et al., 2000;
Zhou et al., 2010). The average geodesic distance (GD) between nodes was
also calculated (Deng et al., 2012). On one hand, AvgK refers to the
number of nodes directly connected to a particular node, where ‘higher
average degree’ (connectivity) indicates a more complex network (Deng et
al., 2012). Nodes with higher degrees were considered to be the central
nodes in the network structure (Layeghifard et al., 2017). Accordingly,
nodes with higher connectivity (degree) in the network were defined as
top nodes. On other hand, GD is the shortest path between two nodes.
When GD is low indicates that the nodes within a network are close to
each other, indicating that the network is complex (Deng et al., 2012).
We used the metric ‘average clustering coefficient’ (avgCC) to describe
how well a node was connected to its neighbors using a scale ranging
from 0 to 1, where 1 denotes a fully connected node and 0 an unconnected
node (Deng et al., 2012). Further MEN attributes were retrieved by
establishing/detecting ‘Modules’ within each MEN. A ‘Module’ was defined
using the metric ‘modularity’ (i.e. ranging for 0 to 1, it is the degree
to which a network can be divided into communities or modules) as a
group of OTUs with high connectivity within a MEN but with few
connections outside the group. The higher the modularity is, the more
modules a network can be divided into and, thus, the less complex a
network is (Olesen et al., 2007; Deng et al., 2012). To estimate
‘modularity’, we used the ‘greedy modularity optimization approach (Deng
et al., 2012). This approach enabled us to retrieve module connectivity
metrics such as ‘within-module’ connectivity (Zi) and ‘among-module’
connectivity (Pi), considering all the nodes within the modules. These
metrics were used to classify the nodes within a given MEN according to
their role in the network (Newman, 2006; Deng et al., 2012) -i.e.
peripherals (Zi ≤ 2.5, Pi ≤ 0.62), connectors (Zi ≤ 2.5, Pi
> 0.62), module hubs (Zi > 2.5, Pi ≤ 0.62),
and network hubs (Zi > 2.5, Pi > 0.62).
2.5 Statistical Analysis
Statistically significant differences between reclamation sites in terms
of the studied key soil attributes were evaluated with the one-way ANOVA
test at the 95 % and 99 % confidence levels, following normality
testing through Shapiro-Wilk test and T test. The correlation between
the OTU’s abundance profile and the studied key soil attributes (Section
2.2.) was assessed through Pearson’s correlation tests
(r2).
Differences between the MEN retrieved for each reclamation site were
evaluated through similarity matrices (Zhang and Horvath, 2005; Horvath
and Dong, 2008). We also generated similarity matrices to evaluate the
correlation between different MENs. To do so, Spearman’s rank
correlation coefficients were used and similarity thresholds ranging
between 0.01 and 0.99 were tried to then choose an optimum similarity
threshold (Newman, 2004). The optimum similarity threshold was found
through an iterative process that evaluated whether the statistical
distribution of the eigenvalues’ spacing contained within the
eigenvectors originated from each similarity matrix followed a Poison
distribution (Newman, 2006; Deng et al., 2012). This test was carried
out using the computational pipeline MENA (Deng et al., 2012 and 2016).
We also investigated the relationship between modules (with
>5 nodes) of MENs and soil attributes through Pearson’s
correlation tests (r2) (Guimera et al., 2007). All
other statistical tests were undertaken using the statistical software
SPSS v20 (IBM, USA).