In Table 3, for the T2 sequence, BFC with either scaled or whitening outperforms the baselines. Besides, BFC with whitening achieves best AUCs of 0.91 and 0.80 on P-x and LC-A, respectively. However, these findings are not consistent with the results in ADC and hDWI. In terms of ADC, the models preprocessed with BFC or NF underperform the baselines. Instead, the baseline models receive the highest AUCs, where scaled alone and whitening alone accomplish 0.73 and 0.72 on P-x and LC-A, respectively. When it comes to the sequence of hDWI, either BFC or NF attributes limited improvement over the baselines. On P-x, the AUC increases marginally from 0.73 (scaled only) to 0.80 (scaled with NF); on LC-A, only an AUC of 0.65 is achieved using scaled with BFC. The above results of the three sequences show that these pre-processing approaches could improve CM-Net’s classification performance when combing our two datasets. However, none of the methods is capable of boosting the joint models’ generalization considerably, as compared with the separate models of P-x and LC-A (in Table 2). This indicates that the preprocessing methods are probably insufficient to solve domain shift fundamentally. A possible reason is that the severe discrepancies do not come from the inter-site discrepancies (in Table 1), rather than the intensity distribution of the heterogeneous mpMRI sequences only (see details in Supplementary Figure 2).
Cross-domain Malignancy Classification and Lesion Detection
We emphasize the importance of knowledge transfer from a large-scale publicly dataset to a small-scale target domain. The malignancy estimation performance of CMD²A-Net (the architecture is shown in Figure 4 and described in detail in the Methods section) is evaluated. Dataset, P-x, is only regarded as the source domain. Either LC-A or LC-B is also set as the source domain for knowledge transfer between local cohorts. The scaled method was employed for image preprocessing. In general, available types of MR sequences may vary in healthcare institutions. Thus, we employed ensemble learning[4] to handle multiple sequences, allowing the use of single and multiple sequence(s) in our framework. Three common metrics were adopted for classification performance evaluation, i.e. AUC, sensitivity (SEN), and specificity (SPE) .
Table 4. Malignancy classification results in the target
domains in four combinations of source-target domain.