3.3 2D-QSAR model
3.3.1
SVM and MLR Models
When we built property-activity relationship models, the molecular
properties that are the same for almost all the molecules are first
removed. Then ten molecular properties with large numerical differences
between molecules are considered to be a good description of molecules.
The 10 properties are ALogP, ES_Sum_ssCH2, Dipole_mag, Dipole_X,
Jurs_DPSA_1, Jurs_PPSA_3, Jurs_RNCS, IAC_Total,
Kappa_3_AM, ES_Sum_aaCH. ALogP is the log of the ratio of the
partition coefficients of a substance in n-octanol (oil) to water. It
reflects the distribution of matter in oil and water phase.
ES_Sum_ssCH2 represents the electro topological state (E-state) count
for CH2 with two single bonds. Dipole_mag means the Dipole moment.
Dipole_X is 3D electronic descriptor that indicates the strength and
orientation behavior of a molecule in an electrostatic field
Jurs_DPSA_1 on behalf of partial positive solvent reachable surface
area minus partial negative solvent reachable surface area.
Jurs_PPSA_3 means the sum of the solvent accessible area of all
positively charged atoms in a molecule and their partial charges.
Jurs_RNCS represents solvent-accessible surface area of most negative
atom divided by relative negative charge. IAC_Total property indicates
total information of atomic composition. Kappa shape index is a
topological index used to characterize molecular shapes.ES_Sum_aaCH
represents the electro topological state (E-state) count for tertiary
carbon with two aromatic bonds. The MLR model was described in ():\(\backslash n\text{pIC}_{50}=10.2930-0.3937*ALogP+0.0044*ES\_Sum\_ssCH2+0.0853*Dipole\_mag-0.0073*Dipole\_X+0.0014*Jurs\_DPSA\_1-0.0239*Jurs\_PPSA\_3-0.1298*Jurs\_RNCS-0.0568*IAC\_Total+0.5199*Kappa\_3\_AM-0.0639*ES\_Sum\_aaCH\)All of the training set and test set molecules were scored using these
ten properties to construct a predictive model of biological activity
for each molecule in the TCM database. The accuracy of the SVM and MLR
model predictions has been verified. The verification results show that
both models have high prediction accuracy. It can be seen from the high
values of the correlation coefficient (R2) of the SVM model (R2 =
0.7925) and the MLR model (R2 = 0.8942) that the two models have higher
prediction accuracy. Therefore, we used the SVM and MLR models to
predict the biological activity of compounds from the Chinese medicine
database.