2. Methodology
2.1. Database
The ILs data were collected from the National Institute of Standards and Technology (NIST)31. In total, 19335 ρ data points for 972 ILs and 9238 η data points for 832 ILs were included in the dataset. For ρ and η , the temperature and pressure ranges were 221.314 ~ 473.15 K and 0.0815 ~ 251.5 MPa, 253.15 ~ 438.15 K, and 0.06 ~ 300 MPa, respectively. The total dataset contains 501 cations, including imidazolium (im), pyridinium (py), pyrrolidinium (pyr), ammonium (N), phosphonium (P), piperidinium (pip), morpholinium (mor), sulfonium (S), triazolium (Trl), propylpyrazolium (pyra), etc. It contains 154 anions, such as bis[(trifluoromethyl)sulfonyl]imide [(N(SO2CF3)2)-], tetrafluoroborate [(BF4)-], hexafluorophosphate [(PF6)-], dicyanamide [(N(CN)2)-], tetracyanoborate [(B(CN)4)-], trifluoroacetate [(C(CN)3)-], tris(pentafluoroethyl)trifluorophosphate [(PF3(C2F5)3)-], halogen [(X)-], thiocyanate [(SCN)-], alkoxy-alkylsulfates [(RSO3)-], alkyl-sulfate [(RSO4)-], and so on. In particular, geminal dicationic ILs (GDILs) were also collected in this work (E.g. 1-methyl-3-(3-(trimethylammonio)propyl)-1H-imidazolium bis(dicyanamide) ). The information about these ILs together with corresponding experimental values of ρ and η are shown in Tables S1 ~ S2 of Supporting Information (exp-cal-values.xlsx).
2.2. Data pre-processing
In the NIST database, the vast data points at variable temperature and pressure were included for one IL. Some ILs would represent a large percentage of the dataset if all these points were collected for modeling. According to the principle of the least square method32, 33, a large percentage of some ILs could reduce the reliability of the QSPR model. Therefore, the criteria were adopted in the process of data collection for which data points were collected at 5 K temperature and 2.5 MPa pressure intervals.
2.3. f (T ,P ,I )-QSPR model
f (T ,P ,I )-QSPR models were established to describe the relationship of ρ and η with structure, temperature and pressure23. The preliminaryf (T ,P ,I )-QSPR models are shown as Eqs. (1)-(2).
ρ is the density of the ILs in units of kg∙m3,η is the viscosity of the ILs in units of Pa∙s, T is the temperature in K, and P is the pressure in kPa. α is a variable related to the ILs structures. In most studies, the parametersβ , γ , and χ , are treated as constant terms for all ILs27, 34. From our previous works23, 29, treating these three coefficients as variables for each IL makes the model more accurate. This strategy has hence been continued in the present work.
2.4. Proposed norm descriptors
The step matrix (MS ), such as the full step matrix (MS F), the adjacent step matrix (MS A), the adjacent-interphase step matrix (MS AB) and the adjacent-interphase-jump step matrix (MS ABC) are used to reflect the connection relationship of atoms, as Eqs. (3)-(6). On this basis, two step matrices (MS ABC_cyc andMS bon_cyc), given by Eqs. (7)-(8), are defined to present the interaction of adjacent-interphase-jump atom on the ring and the interaction of atoms on different bonds on the ring, respectively. To better reveal the properties of atomic in molecules, the property matrices (MP ) are used as shown in Table 1. The properties of each atom were shown in S1 of Supporting Information (atom properties.xlsx).
Table 1 . The property matrices (MP ).