Experiments and Results

Data analysis

We started with 626 pregnant women with gestational age between 22 to 26 weeks, who underwent prenatal examinations at the First Hospital of Jilin University in China from January 2018 to June 2019. There are 90 cases with lateral ventricular width equal to or bigger than 10 mm and 16 cases with lateral ventricular width bigger than 15 mm. Actually, these 90 ventriculomegaly cases and the cases with lateral ventricular width near 10 mm were selected from 22616 pregnant women. The other normal cases were randomly selected from all normal cases. The average lateral ventricular (LV) width, which refers to the larger width of the left and right lateral ventricles, of the 626 cases was 7 mm (see Figure S1(a)). The average gestational age is 23.8 weeks. There were 49212 stored freeze-frame images (Figure S2) and the mean number of stored freeze-frame images is 78.6. Each frame had a size of 768x576 pixels.

Picking out brain images

70 cases were randomly selected as the validation set and 70 other cases were randomly selected as the test set. The other 486 cases were training set, which had 2731 brain images and 35687 other images. 2731 brain images and the same number of randomly selected other images were used for classification training. The training was terminated after 20 epochs and the model with the best overall validation accuracy was chose as the final model. 376 brain images and 4967 other images from the 70 test cases were successfully tested, and the overall test accuracy is 99.8%. The classification accuracy was 100% (376/376) and 99.8% (4955/4967) for the brain images and other images, respectively. The sensitivity and specificity for brain images were 100% (376/376) and 96.9% (376/388), respectively.

Picking out TV and TT planes and localization of brain region

We randomly selected 60 cases as test set. The remaining 566 cases, which had 2094 TV-TT plane images and 1044 other images, were training set. 1044 other images and the same number of randomly selected TV-TT plane images were used for training. The training was terminated after 20 epochs and the last model was chose as the final model. 210 TV-TT plane images and 108 other images were successfully tested. The AP@0.5 and AP@0.75 were all 0.992 and the mAP@[.5,.95] was 0.92. The mAR@[.5,.95] was 0.945. Then we chose the first object detected, which has the largest percentage, as the result. The overall test accuracy is 98.1% (312/318). The detection accuracy for the TV-TT plane images and other images was 97.6% (205/210) and 99.1% (107/108), respectively. The sensitivity and specificity for TV-TT plane images were 97.6% (205/210) and 99.5% (205/206), respectively.

Predicting the lateral ventricular width

The lateral ventricular width shown in each brain region image was determined by doctors. From all the 2304 TV-TT planes, 1431 planes had confirmed lateral ventricular width. Other planes either did not show clear lateral ventricle or the lateral ventricular width cannot be determined.
We performed two experiment. The first one was to use all the 626 cases, corresponding to 1431 images with known lateral ventricular width, to train and test the regression model. The second one was to use the 610 cases with lateral ventricular width less than 15 mm, corresponding to 1351 images, to train and test the model.
For the first experiment, 60 cases were randomly selected as the test set, which had 141 images. Other 60 case were randomly selected as the validation set, which had 132 images. The remaining 506 cases, which had 1158 images, were training set. The training was terminated after 100 epochs and the model with the least mean square error (MSE) was chose as the final model. The mean absolute error (MAE) of the test set was 1.01 mm. More than 65% test images had a MAE of less than 1 mm (Figure 2(a), Figure S3(a) and Table S1).
For the second experiment, 58 cases were randomly selected as the test set, which had 107 images. Other 58 case were randomly selected as the validation set, which had 118 images. The remaining 495 cases, which had 1124 images, were training set. The training was terminated after 100 epochs and the model with the least MSE was chose as the final model. The MAE of the test set was 0.54 mm. More than 82% test images had a MAE of less than 1 mm (Figure 2(b), Figure S3(b) and Table S2).
We also evaluated the possibility of the two models to predict lateral ventricular width in the case level. For each test case, we set the predicted LV width as the largest predicted LV width of all its TV and TT planes. For the first model, 235 TV and TT planes from the 60 test cases were tested. The MAS was 1.47 mm (Figure 2(c), Figure S3(c) and Table S3). For the second model, 203 TV and TT planes from the 58 test cases were tested. The MAE was 0.73 mm (Figure 2(d), Figure S3(d) and Table S4). If we set the threshold for the two models as 10 mm, the sensitivity was 100% (8/8) and 75% (6/8), and the precision was 57% (8/14) and 86% (6/7), respectively (Figure 2(c-d)).
From Figure 2(c) and Figure S3(c) we can see that there was a case with large prediction error of 9.2 mm. The truth LV width was 4.4 mm and the predicted width was 13.6 mm. We analyzed the prediction result of this case. This case had three TV or TT planes and the predicted LV width was 4.94 mm, 5.60 mm and 13.6 mm, respectively (Figure S4). Based on the rule we used, the predicted LV width of this case was set as 13.6 mm. We found that the last image (Figure S4(c)) was not a regular TV or TT plane, hence the large prediction error, 9.2 mm, was not a normal result.

Interpretation of the results using heat maps

We generated heat maps and their corresponding overlay images for all test images (Figure 3 and Figure 4). The results were all reviewed by an expert. For the first experiment, 97 out of 141 heat maps were activated in/around the lateral ventricular regions. Moreover, all the 141 heat maps were activated at the left-upper corner. Figure 3 shows some examples. For the second experiment, 74 out of 107 heat maps were activated on/around the lateral ventricular regions. 28 of them were also activated on other regions. Other 34 heat maps were not activated on/around the lateral ventricular regions. Figure 4 shows some examples.
We can see that for images with large lateral ventricular width, the heat maps were activated on/around the lateral ventricular regions, as we expected. We performed further analysis to investigate this phenomenon. Figure S5(a) and Figure S5(c) shows distribution of lateral ventricular width of images whose heat maps did not activate the lateral ventricular regions for the first and second experiment, respectively. Compared with Figure S5(b) and Figure S5(d), which refer to distribution of lateral ventricular width of images whose heat maps activate the lateral ventricular regions for the first and second experiment, the mean LV width was much smaller (p<0.001 for both experiments).
These results indicate that the regression models can locate the lateral ventricular regions of images with large lateral ventricular width successfully and then predict their width based on these regions with small error.