AI tests and comparison with human doctors
An external test set of 812 images from 449 patients was used to
evaluate the performance of AI networks. The diagnostic accuracy,
specificity, and sensitivity of AI in identifying CNS malformations were
calculated, and the ROC curves were generated to evaluate the
performance of the established AI algorithm. The performance of AI was
then compared with that of doctors, who reviewed the same images in a
separate testing. In this testing, images were shown one by one on the
personal computer screen in a random order, and each image was along
with 13 diagnosis choices (12 types of CNS abnormalities and normal).
Ultrasonic doctors from different hospitals with varying degrees of
expertise, who had experience >10 years (expert), 5-10
years (competent), and 1 year (trainee), reviewed one image with an
optimal diagnosis and turned to the next image without returning to the
previous one. The processing time for reading each image was recorded.
All the doctors were blind to the diagnoses of images.