2.7 Data and Statistical Analysis
The raw intensity values of all features were log transformed using
MetaboAnalyst 5.0, to remove heteroscedasticity and correct for skewed
data distribution. Any 0 values during log transformation were treated
as 1/5 of the minimum intensity values of each feature. Log transformed
feature intensity values were used for all analysis unless stated
otherwise.
The EZInfo 2.0 software (Umetrics, UmeƄ, Sweden) was used to perform
multivariate analysis on the metabolomics data. Data was centered and
pareto scaled upon import into EZInfo 2.0. Principal component analysis
(PCA) was used to visualize unsupervised metabolic variation between
saline and cisplatin treatment at each of the timepoints of the study.
Orthogonal partial least squares discriminant analysis (OPLS-DA), a
supervised discriminatory analysis, was used for the pairwise
discrimination of treatment groups at each timepoint. For each OPLS-DA,
metabolites were ranked by their correlation (p(corr)) values and
variable importance in projection (VIP) values to select a subset of
metabolites for identification. Features with 0.4 < p(corr)
< -0.4 and VIP > 1 were considered as important
discriminators of the groups being compared.
Features were analyzed by two-way ANOVA with Benjamini-Hochberg false
discovery rate (FDR) correction to find features that were significantly
different by saline vs. cisplatin treatment. The DEGreport (1.30.3) R
package was used to generate hierarchical clusters of features that were
determined by two-way ANOVA as being significantly altered by treatment.
The Z-scores presented in these time course figures (Figure 5 )
are calculated by centering each feature to its mean and dividing by the
standard deviation of the feature. Clusters were selected for further
analysis based on time course patterns of clinical interest, focusing on
features that are differently expressed in the early timepoints (day 1
and 2) between the saline and cisplatin-treated mice. Individual
features that were found to be significantly different by two-way ANOVA
and FDR correction were further analyzed by pairwise t-tests comparing
saline vs. cisplatin treated mice at each timepoint, with p-values
adjusted for multiple comparisons using Bonferroni correction.
p<0.05 was considered as significantly significant for all
univariate data analysis.
Univariate and multivariate receiver operating characteristic (ROC)
curves were generated using MetaboAnalyst 5.0. Multivariate ROCs were
generated using linear support vector machine classification, with
features ranked by highest to lowest univariate area under the ROC
(AUROC) values.