1. Introduction
Ionic liquids (ILs), composed of organic cations and organic/inorganic anions, have been diffusely utilized in absorption and separation1, 2, synthesis3, catalysis4, 5 and electrochemistry6, 7 owing to their superior properties as gas solubility, thermal stability and low volatility. Density(ρ ) and viscosity (η ) are key process parameters required in a significant amount of applications such as chemical process simulation, equipment sizing, lubrication and refrigeration8, 9. On the other hand, some properties are often estimated in relation to these two basic attributes, such as heat capacity, speed of sound and surface tension. In face of the vast number of ILs, it is a hard task to experimentally measure the ρ and η at variable temperature and pressure of all ILs. Accordingly, computational tools are particularly important to fill the gap of ILs property database. Furthermore, computational methods are also valuable for the property-directed design of ILs.
Quantitative structure-property relationship (QSPR) is one of the commonly used approaches to calculate the physical characteristics of chemical substances10, 11. Up to now, QSPR has been widely applied to the field of ILs, especially in the temperature and pressure-dependent property12-14. By combining group contribution (GC), Kamil Paduszyński15 built a QSPR model for estimating the ρ at different temperatures of ILs with the most comprehensive collection of data reported to date. In Das et al.’s investigation16, based on the multilayered variable selection strategy, a QSPR model was developed for predicting the η of ILs with Q 2LOO= 0.713. Mirkhani and Gharagheizi17 proposed a linear QSPR model for predicting the η of 293 ILs using Genetic function approximation for the model’s parameter selection withR 2training = 0.8096. In the process of establishing the QSPR model, validation is inevitable. The external validation and internal validation are usually adopted in most studies18, 19. Although a few QSPR models have been developed with external validation and internal validation for the temperature and pressure-dependent properties of ILs, the stability and reliable of QSPR are challenged.
Due to the special nature of ILs - consisting of both anions and cations - ILs with both cations and anions in the training set can be obtained directly from the contributions of both cations and anions. However, most studies ignored the criterion that both cation and anion of one IL in testing set should not both reappear in training set, otherwise, it will lead to “pseudo-high” accuracy for external and internal (Leave-one-out cross-validation, LOO-CV) validations. Recently, Makarov et al.20 analyzed the published QSPR models21, 22 for the melting point of ILs by the five-fold cross-validation (5-CV) and found that traditional validation method has “pseudo-high” accuracy. Nevertheless, for the temperature and pressure-dependent properties of ILs as ρ and η , the data points of ILs vary considerably with variable temperature and pressure. K-fold cross-validation (K-CV) is difficult to balance the distribution of data points under different types of ions. So, it is therefore necessary to establish an easy-to-implement internal validation method to efficiently and accurately evaluate the QSPR model. Furthermore, the stability of the QSPR model is also related by the veracity and distribution of the dataset. The authenticity of experimental data was evaluated in most previous studies15, 23-25, while the distribution of data points was ignored. For example, in the QSPR model of heat capacity developed by Sattari et al.26, 1528 data points were used for [C4mim][PF6], which account for 41% of the total dataset. Similarly, in our previous work on heat capacity QSPR model23, [C4mim][PF6] accounted for 21% of the total dataset. The QSPR model is usually established by the least square method, whose objective function is the minimum sum of error squares10. Thus, a balanced distribution of data points should be selectively collected. Further, in the case of temperature and pressure-dependent properties, the temperature and pressure terms are usually treated as constant terms for all ILs27, 28. The temperature and pressure terms are affected by the structure of IL based on the analysis of our previous works23, 29, 30, so it is necessary to introduce descriptors to temperature and pressure terms.
In this contribution, two f (T ,P ,I )-QSPR models for ρ and η were established by a method for a balanced distribution of data points and the treatment of temperature and pressure effects according to the structures of ILs. A novel internal validation method namely the leave-one-ion-out cross-validation (LOIO-CV) was proposed to handle the “pseudo-high” accuracy of LOO-CV for ILs. These models were also validated by the external validation, which follow the principle that cation and anion do not appear in the training set and testing set simultaneously. Analysis of the statistical results showed that two models achieved good predictive power as well as stability, which is an excellent guide for future rapid screening and design of functional ILs.