External validation. In the external validation of theη model, 651 ILs containing 7352 data points are used for
training set and 181 ILs containing 1886 data points are treated as
testing set. The detailed statistical parameters are listed in Table 4.
The R 2testing = 0.9108 is close
to the R 2training = 0.9091. The
experimental vs. calculated η values of the model for the
external validation is presented in Figure 6c. Seen from that the data
points in testing set are consistent with the trend of the training set,
indicating that the lnη (T ,P ,I )-QSPR model
has quite excellent predictive ability for η of ILs at variable
temperature and pressure.
Y-randomized analysis. After 1000 repetitions ofY -random validation, and were lower than 0.00586 and 0.00044, far
less than the accuracy of the lnη(T ,P ,I )-QSPR model, indicating that the model was
not affected by chance correlation.
3.4.2. Model comparison: before and after data pre-screening
Similar to the analysis of the ρ (T ,P ,I )-QSPR
model, this work also carried out a data pre-screening process for theη dataset. A QSPR model was built for the initial ηdataset using the same descriptors as in Eq. (16). The detailed LOIO-CV
results for that model are shown in Table 6. The model built without
data pre-screening, while having high Q 2(Q 2LOCO = 0.8935 andQ 2LOAO = 0.8913) and low MAE
(MAELOCO = 0.3153 and MAELOAO = 0.3211),
gets a significant downward trend when the stability of its model is
assessed again by post data pre-screening
(Q 2LOCO = 0.8815 andQ 2LOAO = 0.8806;
MAELOCO = 0.3691 and MAELOAO = 0.3755).
The model is not as stable as the one obtained after the data
pre-screening exercise prior to modelling withQ 2LOCO = 0.8863 andQ 2LOAO = 0.8866. It is
therefore well established that the model built without data
pre-screening are less stable. Thus, it is necessary to carry out data
pre-processing before building the QSPR model to ensure a balanced and
stable distribution of the dataset and to obtain a stable model.
Table 6 . Comparison of model stability before and after data
pre-screening for viscosity.