loading page

Machine Learning for Outlier Detection in Algal and Cyanobacterial Fluorescence Signals
  • Husein Almuhtaram,
  • Arash Zamyadi,
  • Ron Hofmann
Husein Almuhtaram
University of Toronto

Corresponding Author:[email protected]

Author Profile
Arash Zamyadi
Water Research Australia
Author Profile
Ron Hofmann
University of Toronto
Author Profile

Abstract

Many drinking water utilities drawing from waters susceptible to harmful algal blooms (HABs) are implementing monitoring tools that can alert them of the onset of potential blooms. Some have invested in fluorescence-based online monitoring probes to measure chlorophyll a and phycocyanin, two pigments found in cyanobacteria, but it is not clear how to best use the data generated this way. Previous studies have focused on correlating phycocyanin fluorescence and cyanobacteria cell counts. However, not all utilities collect cell count data, making this method impossible to apply in some cases. Instead, this paper proposes a novel approach to determine when a utility needs to respond to an HAB based on machine learning by identifying outliers in chlorophyll a and phycocyanin fluorescence data without the need for corresponding cell counts or biovolume. Four existing algorithms are evaluated on data collected at four buoys in Lake Erie from 2014-2019: k-means clustering, One-Class Support Vector Machine (SVM), elliptic envelope, and Isolation Forest (iForest). When trained and tested on data collected at different buoys, the iForest algorithm performed the best in terms of computation time for training and true positive rate, and second best for false positive rate. In a more realistic application where the algorithms are trained on historical phycocyanin data collected at the same location as the testing data, all the algorithms, except k-means, accurately identified anomalies in phycocyanin data coinciding with real cyanobacteria bloom events. Therefore, One-Class SVM, elliptic envelope, and iForest are promising algorithms for detecting potential HABs using fluorescence data.
Jun 2021Published in Water Research volume 197 on pages 117073. 10.1016/j.watres.2021.117073