1. Introduction
Ionic liquids (ILs), composed of organic cations and organic/inorganic
anions, have been diffusely utilized in absorption and
separation1, 2, synthesis3,
catalysis4, 5 and electrochemistry6,
7 owing to their superior properties as gas solubility, thermal
stability and low volatility. Density(ρ ) and viscosity (η )
are key process parameters required in a significant amount of
applications such as chemical process simulation, equipment sizing,
lubrication and refrigeration8, 9. On the other hand,
some properties are often estimated in relation to these two basic
attributes, such as heat capacity, speed of sound and surface tension.
In face of the vast number of ILs, it is a hard task to experimentally
measure the ρ and η at variable temperature and pressure
of all ILs. Accordingly, computational tools are particularly important
to fill the gap of ILs property database. Furthermore, computational
methods are also valuable for the property-directed design of ILs.
Quantitative structure-property relationship (QSPR) is one of the
commonly used approaches to calculate the physical characteristics of
chemical substances10, 11. Up to now, QSPR has been
widely applied to the field of ILs, especially in the temperature and
pressure-dependent property12-14. By combining group
contribution (GC), Kamil Paduszyński15 built a QSPR
model for estimating the ρ at different temperatures of ILs with
the most comprehensive collection of data reported to date. In Das et
al.’s investigation16, based on the multilayered
variable selection strategy, a QSPR model was developed for predicting
the η of ILs with Q 2LOO= 0.713. Mirkhani and Gharagheizi17 proposed a linear
QSPR model for predicting the η of 293 ILs using Genetic function
approximation for the model’s parameter selection withR 2training = 0.8096. In the
process of establishing the QSPR model, validation is inevitable. The
external validation and internal validation are usually adopted in most
studies18, 19. Although a few QSPR models have been
developed with external validation and internal validation for the
temperature and pressure-dependent properties of ILs, the stability and
reliable of QSPR are challenged.
Due to the special nature of ILs - consisting of both anions and cations
- ILs with both cations and anions in the training set can be obtained
directly from the contributions of both cations and anions. However,
most studies ignored the criterion that both cation and anion of one IL
in testing set should not both reappear in training set, otherwise, it
will lead to “pseudo-high” accuracy for external and internal
(Leave-one-out cross-validation, LOO-CV) validations. Recently, Makarov
et al.20 analyzed the published QSPR
models21, 22 for the melting point of ILs by the
five-fold cross-validation (5-CV) and found that traditional validation
method has “pseudo-high” accuracy. Nevertheless, for the temperature
and pressure-dependent properties of ILs as ρ and η , the
data points of ILs vary considerably with variable temperature and
pressure. K-fold cross-validation (K-CV) is difficult to balance the
distribution of data points under different types of ions. So, it is
therefore necessary to establish an easy-to-implement internal
validation method to efficiently and accurately evaluate the QSPR model.
Furthermore, the stability of the QSPR model is also related by the
veracity and distribution of the dataset. The authenticity of
experimental data was evaluated in most previous
studies15, 23-25, while the distribution of data
points was ignored. For example, in the QSPR model of heat capacity
developed by Sattari et al.26, 1528 data points were
used for [C4mim][PF6], which account for 41% of the total
dataset. Similarly, in our previous work on heat capacity QSPR
model23, [C4mim][PF6] accounted for 21% of
the total dataset. The QSPR model is usually established by the least
square method, whose objective function is the minimum sum of error
squares10. Thus, a balanced distribution of data
points should be selectively collected. Further, in the case of
temperature and pressure-dependent properties, the temperature and
pressure terms are usually treated as constant terms for all
ILs27, 28. The temperature and pressure terms are
affected by the structure of IL based on the analysis of our previous
works23, 29, 30, so it is necessary to introduce
descriptors to temperature and pressure terms.
In this contribution, two f (T ,P ,I )-QSPR
models for ρ and η were established by a method for a
balanced distribution of data points and the treatment of temperature
and pressure effects according to the structures of ILs. A novel
internal validation method namely the leave-one-ion-out cross-validation
(LOIO-CV) was proposed to handle the “pseudo-high” accuracy of LOO-CV
for ILs. These models were also validated by the external validation,
which follow the principle that cation and anion do not appear in the
training set and testing set simultaneously. Analysis of the statistical
results showed that two models achieved good predictive power as well as
stability, which is an excellent guide for future rapid screening and
design of functional ILs.