Results
272 patients were included in the final analysis. The database used in the Leonardi et al. (19) and Espada et al. (18) papers containing 204 patients was reduced to 194 after incomplete data was identified in 10 cases. The database used in the Rao et al. (20) paper included 78 patients. All were complete and therefore included.
Summary data is presented in Tables 1 and 2. Overall, AAGL stage by three observers accurately predicted the corresponding AAGL surgical complexity level in 175 – 180 of the 272 cases (64.3 – 66.2%). The overall performance of the AAGL system in terms of kappa and weighted kappa scores, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio to predict AAGL level of laparoscopic surgical complexity is summarised in Tables 3, 4 and 5. Best performance of three observers for sensitivity, specificity, PPV and NPV (95% CI’s) for stage 1 to predict complexity level A was 98.7% (96.8 to 100.5), 64.2% (55.8 to 72.8), 77.0% (71.0 to 82.9) and 97.5% (94.1 to 100.9) respectively. For stage 2 to predict level B was 30.4% (11.6 to 49.2), 95.6% (93.0 to 98.1), 35.3% (12.6 to 58.0) and 93.5% (90.5 to 96.6) respectively. For stage 3 to predict level C was 10.0% (3.4 to 16.5), 94.8% (91.6 to 97.9), 42.% (19.9 to 64.3) and 71.5% (65.9 to 77.1) respectively. For stage 4 to predict level D was 95.0% (85.4 to 104.5), 91.7% (88.2 to 95.1), 47.5% (32.0 to 63.0) and 99.6% (98.7 to 100.4) respectively. The performance of score thresholds 8, 15 and 21 for predicting corresponding skill levels (A – D) is reported in Table 6, and corresponding ROC curves are shown in Figure 1.