3.3 2D-QSAR model
3.3.1 SVM and MLR Models
When we built property-activity relationship models, the molecular properties that are the same for almost all the molecules are first removed. Then ten molecular properties with large numerical differences between molecules are considered to be a good description of molecules. The 10 properties are ALogP, ES_Sum_ssCH2, Dipole_mag, Dipole_X, Jurs_DPSA_1, Jurs_PPSA_3, Jurs_RNCS, IAC_Total, Kappa_3_AM, ES_Sum_aaCH. ALogP is the log of the ratio of the partition coefficients of a substance in n-octanol (oil) to water. It reflects the distribution of matter in oil and water phase. ES_Sum_ssCH2 represents the electro topological state (E-state) count for CH2 with two single bonds. Dipole_mag means the Dipole moment. Dipole_X is 3D electronic descriptor that indicates the strength and orientation behavior of a molecule in an electrostatic field Jurs_DPSA_1 on behalf of partial positive solvent reachable surface area minus partial negative solvent reachable surface area. Jurs_PPSA_3 means the sum of the solvent accessible area of all positively charged atoms in a molecule and their partial charges. Jurs_RNCS represents solvent-accessible surface area of most negative atom divided by relative negative charge. IAC_Total property indicates total information of atomic composition. Kappa shape index is a topological index used to characterize molecular shapes.ES_Sum_aaCH represents the electro topological state (E-state) count for tertiary carbon with two aromatic bonds. The MLR model was described in ():\(\backslash n\text{pIC}_{50}=10.2930-0.3937*ALogP+0.0044*ES\_Sum\_ssCH2+0.0853*Dipole\_mag-0.0073*Dipole\_X+0.0014*Jurs\_DPSA\_1-0.0239*Jurs\_PPSA\_3-0.1298*Jurs\_RNCS-0.0568*IAC\_Total+0.5199*Kappa\_3\_AM-0.0639*ES\_Sum\_aaCH\)All of the training set and test set molecules were scored using these ten properties to construct a predictive model of biological activity for each molecule in the TCM database. The accuracy of the SVM and MLR model predictions has been verified. The verification results show that both models have high prediction accuracy. It can be seen from the high values of the correlation coefficient (R2) of the SVM model (R2 = 0.7925) and the MLR model (R2 = 0.8942) that the two models have higher prediction accuracy. Therefore, we used the SVM and MLR models to predict the biological activity of compounds from the Chinese medicine database.