Statistical Analysis
Sample size calculation: The null hypothesis being tested was
that there is no difference in the mean change in composite score B at
12 weeks compared across the two treatment groups. Previous literature
of 375 breast cancer women randomised to placebo reported a mean Hot
Flush Score at randomisation of 15.7 (SD=11.7). A 3.6-point
(~25%) reduction in score was reported, and was
expected in women randomised to placebo. The standard deviation of the
change from baseline was reported as 7.1. A clinically relevant
reduction is an additional ≥20% reduction with folic acid over and
above the placebo effect which translates to ≥7-point
(~45%) reduction in Hot Flush Score at 12 weeks. To
detect a true 3.4-point mean difference in the change in Hot Flush Score
with folic acid compared to placebo, using two-sided type 1 error α =
0.05 and 80% power and a within-group standard deviation of 7.1 for the
change from baseline, 70 patients are required per arm i.e. 140 in
total. We planned to include 162 women in the study to account for a
15% rate of loss to follow-up. The statistical analysis plan is
uploaded as Supplementary Document 3.
Analysis: All outcome measures were recorded longitudinally at
screening, 4, 8 and 12 weeks. The primary analysis compared the
treatment groups in terms of the primary outcome measure, change in
daily Hot Flush Score at 12 weeks from randomisation, using a two-sample
t-test. A significance level of p <0.05 was used. The
primary outcome measure was analysed using a linear regression model,
which evaluated treatment effects adjusted by clinically relevant
baseline covariates (number of hot flushes at screening and folate level
at baseline) and stratification factors (healthy versus cancer as
categorical and BMI as continuous). All outcomes were analysed using
multi-level mixed effects models, where repeated measurements from
baseline through to 12 weeks were analysed as random effects and
clinically relevant baseline covariates and stratification factors were
forced in to the model as fixed effects. Where the shape of the data
appears to be quadratic over time (week), time has been used as a
quadratic term in the model. The mean change in serum folate at week-12
from baseline was compared between the groups using a two-sample t-test.
A planned sensitivity analysis was performed which accounted for missing
data via multiple imputation for the primary outcome analysis. Women
were required to have data available for week 1 to be included. This
analysis was performed using a regression-based imputation model using a
bootstrap approach. For women with complete data up to a particular
week, a multiple regression model was developed that included the
outcome at that visit as the dependent variable and outcomes at previous
visits, treatment, site, and stratification variables as independent
variables. Models were constructed separately for subsequent visits.
Missing value was imputed sequentially starting from week 2 to week 12.
This was repeated 100 times, resulting in 100 complete analysis
datasets. The analyses were performed separately and then combined in to
one inference.30 A sensitivity analysis using the Last
Observation Carried Forward (LOCF) imputation procedure was also
performed, which used the last observed value for a pariticipant to fill
in missing values. The sensitivity analyses were unadjusted and adjusted
as described above. Stata v16.0 was used for the analysis.