Comparison of performance between AI network and doctors
The AI outperformed the average efficacy of 13 doctors with respect to the overall types of malformations detection as shown in Table 3 and Figure 4a, the doctors’ diagnostic accuracy [65.4% (95% CI 57.3-73.7%), p = 0.002], sensitivity [88.2% (82.3%-94.1%), p = 0.003], specificity 63.3% [(54.6-72.0%), p = 0.041] and AUC[ 0.758 (0.694, 0.821), p = 0.004] were all lower to that of AI system.
When compared AI performance with that of three groups of doctors respectively, we found the performance of AI model was similar to that of the expert doctors in terms of accuracy [ 78.9% (95%CI 75.2-82.5%), p = 0.528], sensitivity [77.5% (95%CI 73.7-81.4%), p = 0.521], and AUC [0.853 (95% CI 0.800-0.905), p = 0.681], while the performance of AI was higher than that of the competent {[accuracy: 69.6% (95% CI 75.2-85.2%), p = 0.016]; [sensitivity: 67.5% (95% CI 59.7-75.3%), p = 0.021]; [AUC: 0.793 (95% CI 0.777-0.809), p = 0.001]} and that of the trainees as well{[ accuracy: 51.5%, 95% CI (39.4-63.6%), p = 0.001]; [sensitivity: 48.6% ( 95% CI 36.0-61.2%), p = 0.003]; [AUC: 0.654( 95% CI 0.538-0.770), p = 0.008) ]}. However, specificity of AI did not differ to those of three categories of doctors. The comparison in performance between AI system and the various doctors is shown in Table 3 and Figure 4b.
The developed AI algorithm could analyze 7–8 images per second(s) and took only 113s to complete the diagnosis of 812 ultrasound image. The time consuming was significantly less than the average time of the 13 doctors (113s vs. 11571s, p = 0.001). When compared with the subgroups, the time of the diagnosis process were also shorter than three groups of doctors respectively [113s vs. 8864s (expert), p=0.02; 12801s (competent), p=0.003; 12663s (trainee). p = 0.001].