External validation. In the external validation of theη model, 651 ILs containing 7352 data points are used for training set and 181 ILs containing 1886 data points are treated as testing set. The detailed statistical parameters are listed in Table 4. The R 2testing = 0.9108 is close to the R 2training = 0.9091. The experimental vs. calculated η values of the model for the external validation is presented in Figure 6c. Seen from that the data points in testing set are consistent with the trend of the training set, indicating that the lnη (T ,P ,I )-QSPR model has quite excellent predictive ability for η of ILs at variable temperature and pressure.
Y-randomized analysis. After 1000 repetitions ofY -random validation, and were lower than 0.00586 and 0.00044, far less than the accuracy of the lnη(T ,P ,I )-QSPR model, indicating that the model was not affected by chance correlation.
3.4.2. Model comparison: before and after data pre-screening
Similar to the analysis of the ρ (T ,P ,I )-QSPR model, this work also carried out a data pre-screening process for theη dataset. A QSPR model was built for the initial ηdataset using the same descriptors as in Eq. (16). The detailed LOIO-CV results for that model are shown in Table 6. The model built without data pre-screening, while having high Q 2(Q 2LOCO = 0.8935 andQ 2LOAO = 0.8913) and low MAE (MAELOCO = 0.3153 and MAELOAO = 0.3211), gets a significant downward trend when the stability of its model is assessed again by post data pre-screening (Q 2LOCO = 0.8815 andQ 2LOAO = 0.8806; MAELOCO = 0.3691 and MAELOAO = 0.3755). The model is not as stable as the one obtained after the data pre-screening exercise prior to modelling withQ 2LOCO = 0.8863 andQ 2LOAO = 0.8866. It is therefore well established that the model built without data pre-screening are less stable. Thus, it is necessary to carry out data pre-processing before building the QSPR model to ensure a balanced and stable distribution of the dataset and to obtain a stable model.
Table 6 . Comparison of model stability before and after data pre-screening for viscosity.