How to translate text using browser tools
15 September 2014 Testing a Multiple Machine Learning Tool (HYDRA) for the Bioassessment of Fresh Waters
Maria João Feio, Carlos Viana-Ferreira, Carlos Costa
Author Affiliations +
Abstract

We developed a bioassessment tool (HYDRA) to predict the taxa present at a site based on the best performing machine learning tool (Support Vector Machines [SVM], Multi-Layer Perceptron [MLP], K-Nearest Neighbour [KNN]) for each taxon. HYDRA differs from standard models based on discriminant function analysis (DFA) in 2 main ways: 1) HYDRA predicts taxa directly without a priori reference-site classification, and 2) all environmental variables provided for model building contribute to predictions instead of only those that best explain differences among groups. Probabilities of taxon occurrence were used to calculate the Observed/Expected index (O/E50), based on taxa predicted with >50% probability of occurrence. We tested the hypothesis that a combination of models (HYDRA) would perform better than each model alone (SVM, MLP, KNN). We measured performance as: 1) taxon prediction accuracy, 2) precision given by the O/E50 standard deviation (SD O/E50), 3) accuracy of the validation O vs E linear regression, and 4) sensitivity to impairment. We used 3 data sets covering a wide range of environmental conditions (Yukon Territory, Great Lakes, Australian Capital Territory) and calculated O/E50 values for reference, validation, and sites with 3 known levels of simulated impairment. We created 4 quality classes (Good—Severe) and used the 10th percentile O/E50 values of training sites to define the boundary between Good (= reference) and Moderate classes. HYDRA was the best solution for all data sets and was able to distinguish levels of impairment. Taxon prediction accuracy was not related to taxonomic group. Models (SVM, MLP, KNN) varied in accuracy among data sets, and accuracy seemed to depend on the distribution of the taxa across training sites. SVM provided good models, but showed poor sensitivity with 1 data set, which indicated inability to deal with low-richness communities.

© 2014 by The Society for Freshwater Science.
Maria João Feio, Carlos Viana-Ferreira, and Carlos Costa "Testing a Multiple Machine Learning Tool (HYDRA) for the Bioassessment of Fresh Waters," Freshwater Science 33(4), 1286-1296, (15 September 2014). https://doi.org/10.1086/678768
Received: 12 April 2013; Accepted: 1 June 2014; Published: 15 September 2014
JOURNAL ARTICLE
11 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

KEYWORDS
artificial neural networks
bioassessment
invertebrates
K-Nearest Neighbor
reference condition approach
streams
support vector machines
RIGHTS & PERMISSIONS
Get copyright permission
Back to Top