Association rule
The matrix usually contains large amount of data, therefore data mining techniques are used to extract useful knowledge. We followed the association rule proposed by Agrawal et al (1993).
Association rule is intended to capture a certain type of dependence among species represented in the database. The rule is defined as an implication of the form G1->G2, for example, an association rule between species in the form of G1->G 2 which means species 1 is also very likely to be observed with species 2 to form an association {G1, G2}.
The significance of the association rule is measured via support and confidence. The support of rule G1->G2 is the percentage of G1 and G2 occurring together. Confidence of rule G1->G2 is merely an estimate of the conditional probability of G2 given G1. If the confidence of rule G1->G2 is 1 that means G1 occurs in a particular site then G2 should occur in that site, too.
First, the binary phytoplankton data for identifying phytoplankton associations were constructed (Table 1), “S” represents the sampling site or time series, “G” represents algae species. Secondly, the support of phytoplankton association was calculated. For instance, the association {G1, G3} has 18% support because the species G1 and G3 occurs together in 2 of the 11 (Table 2). Finally, we calculated the confidence of each phytoplankton association (Table 3). For example, the confidence of the association {G1, G3} is 0.5 because species 3 occurs at half of times that also containing species 1.
We identified the phytoplankton associations based on both support>=50% and confidence>=0.8.
All analyses were performed using R software (R Development Core Team 2013). Specifically, we used the R package ‘arules’ for the affinity analysis, ‘vegan’ for detrended correspondence analysis (DCA) and redundancy analysis(RDA) and ‘packfor’ for forward selection analysis (Oksanen et al., 2013; Hahsler et al., 2014; Dray et al., 2013).