Cross-validation across two distinct populations shows strong performance.
We further examined the performance by simulating two distinct populations and examined the ability of model extrapolation across different cohorts. Both populations were simulated by the same approach as described in the previous section. Then, we focused on each of the parameters and changed this parameter through a grid search. In this case, we used ExtraTreeRegressor, which is a representative machine learning base learner.
The most important factor affecting results we observed was the termination rates. When fixing the training set termination rate, the best performance is achieved when the test population is most similar to the training set, and deviates gradually when the two termination rates differ (Fig. 4a, Fig. S6-7 ). For example, when the training set average termination rate is 0.0008, the model achieved an error rate of 5.464% for both metrics when the test set termination rate is also 0.0008. The error rate becomes higher at both tails when the test set termination error differs from training set termination error: when the test set termination rate is 0.0002, the model achieved an error rate of 9.18% for absolute error and 9.29% for cumulative error. When the test set termination rate is 0.0012, the model achieved an error rate of 18.82% for both absolute error and cumulative error. This observation is expected, as if the termination rates of the two populations differ too much, and corresponding feature distributions (derived from the termination rate) do not overlap between the two populations, then it would be challenging to predict the patterns. Nevertheless, the error is much lower than directly using the training curve, for which we would expect a 50% error when trained with 0.0008 termination rate and tested with 0.0012 termination rate.