loading page

Identifying the Best Image Classification Algorithm for COVID-19 Diagnosis with a Small, Imbalanced Chest X-Ray Dataset
  • Joyce Yang
Joyce Yang
Notre Dame High School

Corresponding Author:jyang23@ndsj.org

Author Profile


In this project, I study families of deep learning neural networks that are trained on publicly available chest X-ray datasets to identify the best image classification algorithm for automating the diagnosis of respiratory illnesses. Specifically, the learned networks will be used to classify anonymized chest X-ray images to three classes: healthy, COVID-19 and non-COVID pneumonia. As in most real-world applications, publicly available chest X-ray image datasets are not abundant, and ground truth data of COVID-19 diagnosis is especially hard to come by. In this project, the first variable implemented to improve the predictive power of the neural networks is through pretraining on a domain-relevant and much larger than the transfer learning dataset. To address the imbalance within training data, the second variable implemented is to customize the data sampling configuration using the equal-weight-per-epoch method or fixed- fraction-per-batch method. As control for each neural network, pretrained weights learned from the classic ImageNet dataset are used, and no customized training data sampling method is applied. In regard to transfer learning, two scikit-learn functions, average precision and F1 score, are computed during training. Then Precision and Recall are manually calculated based on the confusion matrix for each neural network along with the hyperparameters. The most significant observation is that the Recall metric for the control group is consistently less than 0.6, which is a clear indicator of the underperformance on COVID-19 prediction. The family with significantly higher performance is DenseNet; surprisingly, DenseNet169 has one of the highest Precision and Recall of 0.870 and 0.837. With more than 82 million COVID-19 cases worldwide, the need for efficient, accurate and mass diagnosis of patients is apparent and growing. The utilization of chest X-ray images in medical diagnosis is both a cost-effective and widespread technique for early screening of respiratory illnesses.