Introduction
Abnormal uterine bleeding in premenopausal women is a common complaint in five percent of the women who experiences complaints of abnormal uterine bleeding. (2) Endometrial ablation (EA) is one of the treatment options for this common problem. Due to the less invasive nature (lower intra-operative complication risks, shorter recovery time, and lower post-operative morbidity), and low costs of this procedure, this form of treatment seems to be a less-invasive surgical treatment for menorrhagia compared to hysterectomy (3–7). However, long-term follow up shows a decrease in patient satisfaction and treatment efficacy. Due to permanent relief, the more invasive hysterectomy remains the most effective treatment of abnormal uterine bleeding (8–15).
According to literature, several factors prior to endometrial ablation appear to have an influence on the success or failure-rate of this procedure. Younger age, complaints of dysmenorrhea, parity above or equal to five, a thicker pre-procedural endometrium, a duration of menstruation above seven days, presence of an intramural leiomyoma on transvaginal sonography, a history of sterilization or caesarean section, and a longer uterine depth are some of the possible negative influencing factors (1,2,8,9,11–18).
To optimize the counselling of patients with abnormal uterine bleeding, a prediction model based on the combined influence of the above-mentioned predictors could provide a better insight into the individual prognosis of endometrial ablation. In times of personalised medicine this can create better individual care leading to fewer re-interventions, lower healthcare costs and more patient satisfaction. With the use of a prediction model shared decision making can be optimized (19).
For this reason Stevens et al.(1) developed two multivariate prediction models to help counsel patients for failure of EA and for surgical re-intervention within two years after EA. The developed prediction models have a clinically acceptable c-index of 0.68 and 0.71 respectively. In addition, Stevens et al. is performing an external validation of these two prediction models, using retrospective data of similar patient groups in two non-university teaching hospitals in the Netherlands. Results of these data will follow. In the field of gynaecology, many prediction models are developed using multivariate logistic regression as a standard approach, these are based on a combination of various predictors that are significantly related to the outcome of interest. However, this method cannot automatically estimate the interconnection between predictors and in this way can overestimate the influence of an individual predictor (20,21).
We were also interested in other statistical techniques of developing a prediction model. In recent years machine learning (ML) methods have been increasingly used in the development of clinical prediction models. This method is a scientific discipline that focuses on models that directly and automatically learn from data (20,22). Potential advantage of the machine learning methods compared to the traditional statistical strategies is the possibility of capturing complex, nonlinear relationships in the data (23,24). ML computer algorithms use training data with well-defined input and output variables. This gives the opportunity to define a model with predictors which can be used for new and similar data. Compared to statistical logistic regression models, this can be done without a priori assumption of relevant variables (25).
Random forest is a machine learning method used for classification and regression that operates by constructing a large ensemble of decision trees on training data (22,23,26). Each tree in the random forest is built using a bootstrap sample randomly drawn from the training dataset. This results in a reduction of variance and corrects for a single decision trees ability to overfit to a training set. Each tree in the forest gives an individual prediction on the outcome measure. For a classification problem (in this case, surgical re-intervention or no surgical re-intervention after EA) the final random forest model averages the prediction of all the trees in the forest (21,23,27).
The aim of the study was to develop a random forest prediction model to predict the chance of surgical re-intervention within two years after EA. Furthermore, it was our aim to compare the performance of the random forest model with the prediction by previously published the multivariate logistic regression model (1). In both models the surgical re-intervention within two years after EA is used as primary outcome measure.