Results:
A total of 1143 ICH patients were obtained from the MIMIC-III database,
including 760 survivors and 383 died. Table 1 shows the clinical
characteristics of patients who survived and died during
hospitalization. Table A.2 in Appendices shows the changes of
physiological characteristics and laboratory parameters of dead and
alive patients over time.
First, we used all 122 variables to construct five models, and used the
learning curve and grid search to determine the optimal parameters.
Prediction performance comparison results after 5-fold cross validation
are shown in Table 2.
We can find that the GBDT model have the best accuracy (0.87) and the
best F1 score (0.80). In terms of accuracy and AUROC, GBDT model had
better values than other models, 0.87 and 0.93, respectively. Naïve
Bayes had the best recall rate (0.85), but its accuracy and precision
were the lowest. KNN model had the lowest recall rate (0.60), F1 score
(0.70) and AUROC (0.87). Then we used the feature importance of RF and
LASSO regression to select the most important feature variables. The
first 39 most important variables were selected by two methods
respectively (Table A.3 in Appendices), and the intersection of the two
methods was taken as the screened variables, a total of 18. The
importance order of the intersection variables is shown in Fig. 1. The
importance score is normalized value, distributed between 0 and 1, and
the closer to 1, the more important the variable is.
We reconstructed and trained five models with 18 variables obtained, and
observed the changes of each indicator as shown in Table 3. The ROC
curves of these predictive models are presented in Fig. 2.
Compared with the model constructed with all variables, it was found
that although GBDT model has a small decline in precision, recall and F1
value, it can be seen from the AUROC index that GBDT model was the best
among the five models. From the results, we found that the prediction
effect of all the five models had not decreased significantly.
Therefore, the input variables of our models were reduced from 122 to 18
successfully, greatly improving the practicability.