2.3.3 Random Forest Model
Analogously, this paper also applies a random forest model. The
relationship between the 204 properties calculated by GFA algorithm is
analyzed using the Pearson correlation coefficient matrix. The results
are represented by the python package Yellowbrick as a picture.
Principal component analysis (including 2D and 3D)32was used to analyze the relationships between
attributes for the 31 composite data set,. Preprocessing of the data set
included the following steps. To begin with, we amplified the numerical
values of the properties to between 0 and 1, and then selected 54
properties with a variance greater than 0.05. Furthermore, we normalized
and got the data with a mean of 0 and a variance of 1. In the end, we
got nine properties using Lasso feature selection. The Pearson
correlation coefficient matrix heatmap of these nine properties showed
that the research results were credible.