1. Introduction
Intracerebral hemorrhage (ICH) is a serious disease caused by rupture of blood vessels in the brain and is common in the intensive care unit (ICU)[1]. Unlike ischemic strokes that are often preceded by a transient ischemic attack, ICH appears suddenly without any warning [2]. Consequently, ICH is the most fatal form of stroke [3,4]. Two million people per year are affected by ICH, accounting for 10-15% of the world’s cerebral stroke patients [1,5]. ICH patients’ mortality approaches 40-50%, and disability in survivors is common[3,4,6,7]. Although much progress has been made in ICU research, clinical outcomes after ICH have not improved significantly in the last few decades [6,8,9]. Therefore, identifying potential patients to provide them early treatment would be an effective approach to control the ICH disease.
Several prognostic tools have been proposed for mortality and functional outcome prediction in ICH. These tools can help clinicians select the best treatment for ICH patients, facilitate communication between clinicians and patients, and serve as indicators for optimal allocation of medical resources in the ICU [10,11,12]. Peng et al. [13] established models such as RF, support vector machine (SVM), and logistic regression to predict the 30-day mortality of ICH patients. Eighteen indicators including demographic information, physiological characteristics, and laboratory parameters were used and the results showed that RF had the best predictive performance. The majority of the patients included in this study are Asian, and the number of cases (423 ICU patients) and model parameters is small, so the universality is limited.
Considering the limitations mentioned above, most of the existing prognostic tools for ICH patients have the problems of insufficient cases or few parameters, and less research has added the temporal information that can observe the changes of diseases and make the model prediction more accurate to the indicators of ICH patients, which has questioned the accuracy and applicability of the model prediction[13]. In this study, we aim to use machine learning to develop and validate an ICH in-hospital mortality prediction model. The publicly accessible ICU database MIMIC III[14] was used for data selection and model development. We added temporal information to ICH patients and analyzed broader variables that might affect mortality in ICH patients. Then we compared random forest (RF) model with Gradient Boosting Decision Tree (GBDT), decision tree, K-Nearest Neighbor (KNN), and Naïve Bayes models. Considering the explainability, we also used the feature importance of RF and LASSO regression to select some important features.