Statistical Analysis
Sample size calculation: The null hypothesis being tested was that there is no difference in the mean change in composite score B at 12 weeks compared across the two treatment groups. Previous literature of 375 breast cancer women randomised to placebo reported a mean Hot Flush Score at randomisation of 15.7 (SD=11.7). A 3.6-point (~25%) reduction in score was reported, and was expected in women randomised to placebo. The standard deviation of the change from baseline was reported as 7.1. A clinically relevant reduction is an additional ≥20% reduction with folic acid over and above the placebo effect which translates to ≥7-point (~45%) reduction in Hot Flush Score at 12 weeks. To detect a true 3.4-point mean difference in the change in Hot Flush Score with folic acid compared to placebo, using two-sided type 1 error α = 0.05 and 80% power and a within-group standard deviation of 7.1 for the change from baseline, 70 patients are required per arm i.e. 140 in total. We planned to include 162 women in the study to account for a 15% rate of loss to follow-up. The statistical analysis plan is uploaded as Supplementary Document 3.
Analysis: All outcome measures were recorded longitudinally at screening, 4, 8 and 12 weeks. The primary analysis compared the treatment groups in terms of the primary outcome measure, change in daily Hot Flush Score at 12 weeks from randomisation, using a two-sample t-test. A significance level of p <0.05 was used. The primary outcome measure was analysed using a linear regression model, which evaluated treatment effects adjusted by clinically relevant baseline covariates (number of hot flushes at screening and folate level at baseline) and stratification factors (healthy versus cancer as categorical and BMI as continuous). All outcomes were analysed using multi-level mixed effects models, where repeated measurements from baseline through to 12 weeks were analysed as random effects and clinically relevant baseline covariates and stratification factors were forced in to the model as fixed effects. Where the shape of the data appears to be quadratic over time (week), time has been used as a quadratic term in the model. The mean change in serum folate at week-12 from baseline was compared between the groups using a two-sample t-test.
A planned sensitivity analysis was performed which accounted for missing data via multiple imputation for the primary outcome analysis. Women were required to have data available for week 1 to be included. This analysis was performed using a regression-based imputation model using a bootstrap approach. For women with complete data up to a particular week, a multiple regression model was developed that included the outcome at that visit as the dependent variable and outcomes at previous visits, treatment, site, and stratification variables as independent variables. Models were constructed separately for subsequent visits. Missing value was imputed sequentially starting from week 2 to week 12. This was repeated 100 times, resulting in 100 complete analysis datasets. The analyses were performed separately and then combined in to one inference.30 A sensitivity analysis using the Last Observation Carried Forward (LOCF) imputation procedure was also performed, which used the last observed value for a pariticipant to fill in missing values. The sensitivity analyses were unadjusted and adjusted as described above. Stata v16.0 was used for the analysis.