1. Introduction
Intracerebral
hemorrhage (ICH) is a serious disease caused by rupture of blood vessels
in the brain and is common in the intensive care unit (ICU)[1]. Unlike ischemic strokes that are often
preceded by a transient ischemic attack, ICH appears suddenly without
any warning [2]. Consequently, ICH is the most
fatal form of stroke [3,4]. Two million people per
year are affected by ICH, accounting for 10-15% of the world’s cerebral
stroke patients [1,5]. ICH patients’ mortality
approaches 40-50%, and disability in survivors is common[3,4,6,7]. Although much progress has been made in
ICU research, clinical outcomes after ICH have not improved
significantly in the last few decades [6,8,9].
Therefore, identifying potential patients to provide them early
treatment would be an effective approach to control the ICH disease.
Several prognostic tools have been proposed for mortality and functional
outcome prediction in ICH. These tools can help clinicians select the
best treatment for ICH patients, facilitate communication between
clinicians and patients, and serve as indicators for optimal allocation
of medical resources in the ICU [10,11,12]. Peng
et al. [13] established models such as RF,
support vector machine (SVM), and
logistic regression to predict the 30-day mortality of ICH patients.
Eighteen indicators including demographic information, physiological
characteristics, and laboratory parameters were used and the results
showed that RF had the best predictive performance. The
majority of the patients included in this study are Asian, and the
number of cases (423 ICU patients) and model parameters is small, so the
universality is limited.
Considering the limitations mentioned above, most of the existing
prognostic tools for ICH patients have the problems of insufficient
cases or few parameters, and less research has added the temporal
information that can observe the changes of diseases and make the model
prediction more accurate to the indicators of ICH patients, which has
questioned the accuracy and applicability of the model prediction[13]. In this study, we aim to use
machine learning to develop and validate an ICH in-hospital mortality
prediction model. The publicly accessible ICU database MIMIC III[14] was used for data selection and model
development. We added temporal information to ICH patients and analyzed
broader variables that might affect mortality in ICH patients. Then we
compared random forest (RF) model with
Gradient
Boosting Decision Tree (GBDT),
decision tree,
K-Nearest Neighbor (KNN), and Naïve
Bayes models. Considering the explainability, we also used the feature
importance of RF and LASSO regression to select some important features.