Test Methods
The index test was AAGL stage, and the reference standard was AAGL surgical complexity level. Based on the coded surgical data, each participant could be reviewed and AAGL score, stage and also AAGL surgical complexity level could be retrospectively apportioned in the same manner as a patient being staged in real time at laparoscopy. Where detail on the size of lesions was missing, the maximum number of possible points for that region was used to calculate the AAGL score. Maximum scores were used so as not to underestimate the severity of disease.
The de-identified database was presented to three expert observers who were either MIGS or fellows in their final year of MIGS training. Firstly, the three observers were asked to allocate an AAGL surgical complexity level (A to D) for each case, as defined in the paper byAbrão et al (2) . A single, reference AAGL surgical complexity level allocation was then developed by consensus. Next, the three observers were blinded to the AAGL surgical complexity level and asked to allocate an AAGL stage for each case. Staging was allocated twice. For the first allocation run, observers were left to interpret the AAGL staging tool and schematic in the paper by Abrão et alindependently (2). The three observers then met, discussed the tool and developed consensus rules of interpretation. The three observers then performed a second staging allocation run for each case, blinded to the first. The stages from the second allocation run were used in the final analysis, to optimise interobserver agreement.