Figure 2. The correlation between experimental and calculated for ρ and η :β , γ , and χ are constants in (a) and (c); β , γ , and χ are variables in (b) and (d).
3.2. ρ (T ,P ,I )-QSPR model
The ρ (T ,P ,I )-QSPR model was proposed as Eq. (15). Detailed parameter values are shown in Table C1 of Supporting Information (atomic-distribution-matrix.docx).
n = 19335, R 2= 0.9922,Q 2 LOCO = 0.9905,Q 2 LOAO = 0.9894;
n training = 15015; ​R 2training= 0.9922; MAEtraining = 9.3290 kg/m3;
n testing = 4320; ​R 2testing = 0.9921; MAEtesting = 9.3606 kg/m3;
Where, I IL, I C andI A represent norm index (I ) of ILs, cation and anion, respectively.n C andn A are the number of cations and anions (eg.n C and n A of 1-methyl-3-(3-(trimethylammonio)propyl)-1H-imidazolium bis(dicyanamide) are 1 and 2, respectively.).
The high R 2 and low MAE show that theρ (T ,P ,I )-QSPR model has a good ability to calculate the ρ of ILs. The experimental and calculated ρvalues from the model expressed in Eq. (15) were shown in Table S1 of Supporting Information (exp-cal-values.xlsx).
3.3.1. Model validation
Internal validation. The distribution of cations and anions for ρ are shown in Figures 3a-b. It is apparent that the type distribution of cations in the ρ dataset is more equal than that of anions. As can be seen in Figure 3(a), the cations with large data points are [C4mim] = 9.97%, [emim] = 7.57%, [C6mim] = 5.75%, [meim] = 4.16% and [mC4pyr] = 3.93%. What is noteworthy is that the ILs containing [emim] are in the testing set. Although there are more ILs containing [C4mim] in the ρdataset, the results validated by LOCO-CV are acceptable, with MAE of 7.2524 kg/m3. Similarly in Figure 3(b), the anions with large data points are [N(SO2CF3)2] = 28.49%, [BF4] = 11.83%, [PF6] =4.62%, [N(CN)2] = 3.57% and [N(SO2F)2] = 3.38%. Although the proportion of [N(SO2CF3)2] in the anion is relatively high, there are as many as 265 cations in the ILs with [N(SO2CF3)2] as well. So, the MAE for ILs containing [N(SO2CF3)2], as verified by LOAO-CV, is quite optimistic at 6.7168 kg/m3. The validation results of the ρ(T ,P ,I ) model are illustrated in Figure 4. The scatter diagram results of the LOCO-CV and LOAO-CV are shown in Figures 4a-b. Clearly, the internal cross-validation results for LOCO-CV and LOAO-CV are 0.9905 and 0.9894, respectively, which fully demonstrated the high stability of the model in predicting ρ of ILs containing novel cations and anions. Analogously, Figure 4d shows the absolute error distributions diagram for the ρ(T ,P ,I )-model, LOCO-CV and LOAO-CV. From Figure 4d the error range distribution of LOCO-CV has more points concentrated in the range of 0 ~ 10 kg/m3 than LOAO-CV, which further indicates that this model has greater stability in predicting ILs with new cations. The detailed statistical parameters of internal validation are listed in Table 2. It is worth reminding that the model validation results for LOO (Q 2LOILO = 0.9907 andQ 2LODPO = 0.9921) are generally higher than those for LOIO (Q 2LOCO = 0.9905 andQ 2LOAO = 0.9894), especially for LODPO-CV, as can be seen in Table 2. In addition, the “pseudo-high” accuracy of LOO-CV is more evident in the results of the MAE. The MAE of LOILO-CV is 10.2623 kg/m3 which is lower than that of the LOAO-CV (MAELOAO = 11.3498 kg/m3). These facts suggest that the LOILO does not accurately evaluate the stability of QSPR models for new anion and cation, producing a “pseudo-high” accuracy. This provides a more straightforward demonstration that the use of LOO to validate ILs property models leads to an “pseudo-high” accuracy of model stability. There is a strong need to use LOIO-CV to evaluate the ILs QSPR model to obtain a more realistic and stable model. Moreover, the absolute error (AE) distributions of the LOIO-CV is consistent with the training set of the ρ (T ,P ,I )-model, and most of the errors are within the range of 0 ~ 10 kg/m3. Therefore, it is further confirmed that the model is feasible to predictρ of ILs.