Application of Statistical Learning Theory to Plankton Image Analysis

Qiao Hu, Ph.D., 2006
Cabell Davis, Hanu Singh, Advisors

A fundamental problem in limnology and oceanography is the inability to quickly identify and map distributions of plankton. This thesis addresses the problem by applying statistical machine learning to images collected by the Video Plankton Recorder. The research is focused on development of a real-time automatic plankton recognition system to estimate plankton abundance. The system includes four major components: pattern representation/feature measurement, feature extraction/selection, classification, and abundance estimation. After an extensive study on a traditional learning vector quantization (LVQ) neural network (NN) classifier built on shape-based features and different pattern representation methods, I developed a classification system combined multi-scale co-occurrence matrices feature with a support vector machine classifier. This new method outperforms the traditional shape-based NN classifier by 12% in classification accuracy. Subsequent plankton abundance estimates are improved in the regions of low relative abundance by more than 50%. Two rejection metrics were developed. One was based on the Euclidean distance in the feature space for NN classifier. The other was dual-classification system. Dual-classification method yields almost as good abundance estimation as human labeling on a very large real-world data. The distance rejection metric for NN classifier might be more useful to reject outliers.