2.7 Data and Statistical Analysis
The raw intensity values of all features were log transformed using MetaboAnalyst 5.0, to remove heteroscedasticity and correct for skewed data distribution. Any 0 values during log transformation were treated as 1/5 of the minimum intensity values of each feature. Log transformed feature intensity values were used for all analysis unless stated otherwise.
The EZInfo 2.0 software (Umetrics, UmeƄ, Sweden) was used to perform multivariate analysis on the metabolomics data. Data was centered and pareto scaled upon import into EZInfo 2.0. Principal component analysis (PCA) was used to visualize unsupervised metabolic variation between saline and cisplatin treatment at each of the timepoints of the study. Orthogonal partial least squares discriminant analysis (OPLS-DA), a supervised discriminatory analysis, was used for the pairwise discrimination of treatment groups at each timepoint. For each OPLS-DA, metabolites were ranked by their correlation (p(corr)) values and variable importance in projection (VIP) values to select a subset of metabolites for identification. Features with 0.4 < p(corr) < -0.4 and VIP > 1 were considered as important discriminators of the groups being compared.
Features were analyzed by two-way ANOVA with Benjamini-Hochberg false discovery rate (FDR) correction to find features that were significantly different by saline vs. cisplatin treatment. The DEGreport (1.30.3) R package was used to generate hierarchical clusters of features that were determined by two-way ANOVA as being significantly altered by treatment. The Z-scores presented in these time course figures (Figure 5 ) are calculated by centering each feature to its mean and dividing by the standard deviation of the feature. Clusters were selected for further analysis based on time course patterns of clinical interest, focusing on features that are differently expressed in the early timepoints (day 1 and 2) between the saline and cisplatin-treated mice. Individual features that were found to be significantly different by two-way ANOVA and FDR correction were further analyzed by pairwise t-tests comparing saline vs. cisplatin treated mice at each timepoint, with p-values adjusted for multiple comparisons using Bonferroni correction. p<0.05 was considered as significantly significant for all univariate data analysis.
Univariate and multivariate receiver operating characteristic (ROC) curves were generated using MetaboAnalyst 5.0. Multivariate ROCs were generated using linear support vector machine classification, with features ranked by highest to lowest univariate area under the ROC (AUROC) values.