Preprocessing Methods T2 T2 ADC ADC hDWI hDWI
P-x LC-A P-x LC-A P-x LC-A
Scaled 0.89 0.67 0.73 0.54 0.73 0.54
Whitening 0.87 0.73 0.65 0.72 0.56 0.54
Scaled + BFC 0.90 0.71 0.67 0.65 0.76 0.65
Whitening + BFC 0.91 0.80 0.68 0.68 0.73 0.55
Scaled + NF 0.89 0.75 0.66 0.61 0.80 0.56
Whitening + NF 0.84 0.72 0.64 0.66 0.79 0.57
The results of separate models from P-x, LC-A, and LC-B are shown in Table 2. For the three sequences (i.e., T2, ADC, and hDWI), the AUCs of three separate models are relatively high when tested within their domains, but the AUCs sharply drop when directly tested in the unseen domains. Such results show the sensible cross-domain discrepancy (i.e. domain shift) among the four datasets. Note that, in terms of the T2 sequence, separate models of LC-A and LC-B accomplish the highest testing AUCs (0.66 and 0.67) in the unseen domain, LC-C, just marginally higher than the ones (0.61) within their corresponding domains. A potential reason for the biased predictions is the deficiency of testing samples (i.e. 29) on LC-C. When it comes to the joint models in the table, they cannot bring remarkable improvements in each sequence compared with the separate models, instead, even may lead to performance degradation due to cross-site heterogeneity.