Predictive modelling of macroinvertebrate assemblages for stream habitat assessments in Queensland...

12
Ecological Modelling 195 (2001) 195 – 206 Predictive modelling of macroinvertebrate assemblages for stream habitat assessments in Queensland (Australia) Huong Hoang a , Friedrich Recknagel a, *, Jonathan Marshall b , Satish Choy b a Department of Soil and Water, Adelaide Uniersity, PMB 1, 5064 Glen Osmond, Adelaide, Australia b Queensland Department of Natural Resources, 1345 Ipswich Road, 4106 Rocklea, Australia Abstract This paper describes the iterative approach towards predictive Artificial Neural Network (ANN) models for 37 macroinvertebrate taxa based on 896 stream data sets from the Queensland stream system. Data preprocessing and sensitivity analyses proved to be crucial in order to create data consistency and non-redundancy in the context of this approach. The model validation by means of 167 independent data sets revealed 73% as lowest rate and 82% as average rate of correct ANN predictions of stream site habitats. The increase of correct predictions was 30%, if ANNs and the statistical stream model AusRivAS were compared based on the same data sets. The validation of the ANN models justified their application to the prediction and assessment of stream habitats based on an independent database for test sites. Implications to stream management and research were drawn from prediction results. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Artificial neural networks; Aquatic macroinvertebrates; Bio-assessment; Stream habitats; Sensitivity analysis; AusRivAS www.elsevier.com/locate/ecolmodel 1. Introduction Habitat conditions and food web structures in freshwater streams are very much determined by the directed water flow. Water currents sort parti- cles according to their size and weight, with more rapid flow transporting the largest particles. Thus, in rapidly flowing streams, the bottom consists primarily of very coarse stones, whereas the bot- tom in quiet habitats is made up of sand and silt deposits. Floods can mechanically disturb the stream bottom and destroy stream habitats (Hauer and Lamberti, 1996). Food webs in streams therefore are dominated by benthic macroinvertebrates, which are adapted to fluctu- ating flow (Allan, 1995; Lampert and Sommer, 1997). They either are attached to the substratum or good swimmers that can occasionally rest in dead water regions. Because of the importance of macroinvertebrates to stream ecosystems, great efforts are undertaken to preserve healthy and restore degraded stream habitats. Water pollution in streams quite often occurs as catastrophic short-term events such as pulses of toxins. Water samples can be collected and chem- ically analysed over a longer period of time and combined, but may underestimate or even over- look such catastrophic pollution events. There- * Corresponding author. E-mail address: [email protected] (F. Recknagel). 0304-3800/01/$ - see front matter © 2001 Elsevier Science B.V. All rights reserved. PII:S0304-3800(01)00306-4

Transcript of Predictive modelling of macroinvertebrate assemblages for stream habitat assessments in Queensland...

Ecological Modelling 195 (2001) 195–206

Predictive modelling of macroinvertebrate assemblages forstream habitat assessments in Queensland (Australia)

Huong Hoang a, Friedrich Recknagel a,*, Jonathan Marshall b, Satish Choy b

a Department of Soil and Water, Adelaide Uni�ersity, PMB 1, 5064 Glen Osmond, Adelaide, Australiab Queensland Department of Natural Resources, 1345 Ipswich Road, 4106 Rocklea, Australia

Abstract

This paper describes the iterative approach towards predictive Artificial Neural Network (ANN) models for 37macroinvertebrate taxa based on 896 stream data sets from the Queensland stream system. Data preprocessing andsensitivity analyses proved to be crucial in order to create data consistency and non-redundancy in the context of thisapproach. The model validation by means of 167 independent data sets revealed 73% as lowest rate and 82% asaverage rate of correct ANN predictions of stream site habitats. The increase of correct predictions was 30%, if ANNsand the statistical stream model AusRivAS were compared based on the same data sets. The validation of the ANNmodels justified their application to the prediction and assessment of stream habitats based on an independentdatabase for test sites. Implications to stream management and research were drawn from prediction results. © 2001Elsevier Science B.V. All rights reserved.

Keywords: Artificial neural networks; Aquatic macroinvertebrates; Bio-assessment; Stream habitats; Sensitivity analysis; AusRivAS

www.elsevier.com/locate/ecolmodel

1. Introduction

Habitat conditions and food web structures infreshwater streams are very much determined bythe directed water flow. Water currents sort parti-cles according to their size and weight, with morerapid flow transporting the largest particles. Thus,in rapidly flowing streams, the bottom consistsprimarily of very coarse stones, whereas the bot-tom in quiet habitats is made up of sand and siltdeposits. Floods can mechanically disturb thestream bottom and destroy stream habitats

(Hauer and Lamberti, 1996). Food webs instreams therefore are dominated by benthicmacroinvertebrates, which are adapted to fluctu-ating flow (Allan, 1995; Lampert and Sommer,1997). They either are attached to the substratumor good swimmers that can occasionally rest indead water regions. Because of the importance ofmacroinvertebrates to stream ecosystems, greatefforts are undertaken to preserve healthy andrestore degraded stream habitats.

Water pollution in streams quite often occurs ascatastrophic short-term events such as pulses oftoxins. Water samples can be collected and chem-ically analysed over a longer period of time andcombined, but may underestimate or even over-look such catastrophic pollution events. There-

* Corresponding author.E-mail address: [email protected] (F.

Recknagel).

0304-3800/01/$ - see front matter © 2001 Elsevier Science B.V. All rights reserved.

PII: S 0304 -3800 (01 )00306 -4

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206196

fore, so-called bioassessment of stream waterquality by macroinvertebrate assemblages can beused instead (e.g Hilsenhoff, 1988). Macroinverte-brates in streams generally have life cycles of ayear or more, which expose them to pollutantsover long periods of time and integrate the effectsof short-term pollution episodes. Thus there hasbeen much emphasis on finding organisms thatcan be used to indicate the average water qualityin a stream.

Stream modelling based on ecological knowl-edge and sufficient stream monitoring data cansubstantially facilitate and further improve assess-ment of stream habitats. Different modelling tech-niques are available and have been applied tofreshwater streams. Moss et al. (1987) developed astatistical model for the prediction of themacroinvertebrate community to occur at certainstream sites in UK depending on specific habitatconditions. Simpson et al. (1997) used a similarmodel for freshwater streams in Australia andNew Zealand. Even though these models havebeen applied with some success to a variety ofstream systems, they cannot overcome the imma-nent constraints of statistical models of not beingable to cope with non-linearity and high complex-ity of stream systems.

Walley and Fontama (1998) and Pudmenzky etal. (1998) have developed artificial neural network(ANN) models for the same stream databases inUK and Australia used by Moss et al. (1987),Simpson et al. (1997) to carry out statistical mod-elling. A comparison of techniques indicates thatANN models have higher accuracy and allow agreater complexity in the predictions of macroin-vertebrate communities for specific habitats.Schleiter et al. (1999) have demonstrated for Ger-man streams that population dynamics ofmacroinvertebrates can be modelled by ANN aswell.

Artificial neural networks belong to a new gen-eration of inductive modelling techniques thattake their inspiration from aspects of biologicalinformation processing. They have the ability toextract temporal or spatial patterns and knowl-edge from highly non-linear and complex data.Based on such patterns and knowledge they canexplain processes and predict future conditions ofaquatic ecosystems (Recknagel and Wilson, 2000).

The aim of this study was the development ofANN models to predict habitat condition basedon the distribution of macroinvertebrates in theQueensland river system. The predictions werebased on a comprehensive database. This data-base was previously subject to statistical mod-elling (Coysh et al., 2000) and a preliminary ANNcase study (Pudmenzky et al., 1998). The develop-ment and validation of the ANN models aredocumented. Based on the same data validationresults are compared between the ANN and thestatistical models.

2. Study sites and methods

2.1. The Queensland stream system

The Queensland river and stream networkspreads over the territory of the federal state ofQueensland (Australia). A diverse range of cli-matic conditions occur over the state, rangingfrom high rainfall area (1600 mm per year) in thetropical Northeast to low rainfall area (200 mmper year) in the Southeast. The majority of sites inthe stream database are situated in relatively highorder streams of coastal lowland areas (Fig. 1).

The stream database contains informationtaken from wet and dry seasons on water quality,habitat characteristics and occurrence patterns ofmacroinvertebrates for 36 stream sites.

Sites that are in near pristine condition aredesignated as ‘reference sites’. These are used as astandard in order to assess the health of all re-maining sites considered as ‘test sites’. Each site ischaracterized by the following two data sets:1. Potential predictor variables are environmen-

tal variables that are relatively stable under theinfluences of human impacts. In the context ofthe modelling framework, site specific habitatfeatures are used to predict the occurrence ofinvertebrate taxa at a site not affected byenvironmental stress. Habitat characteristicssuch as altitude, stream order, annual rainfallare suitable as such predictor variables. Bycontrast chemical variables such as dissolvedoxygen, pH and nutrient concentrations couldeasily be affected by anthropogenic impacts

H.

Hoang

etal./

Ecological

Modelling

195(2001)

195–

206197

Fig. 1. Map of Queensland indicating locations of reference sites of the Queensland stream database.

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206198

and would not be suitable as predictor vari-ables. They would cause misleading predic-tions on the membership of test sites to thereference site groups. Data for habitat charac-teristics included 39 potential predictor vari-ables consisting of categorical or continuousdata. Only some categorical variables wereformed by classification schemes such asstream order, most of them were representedjust as empirical criteria for habitat character-istics such as soil types and vegetation type.These categorical variables are not ordinal orinterval scaled. The 39 potential predictor vari-ables were used as input variables of the ANNmodels.

2. Presence or absence of the 37 most commonmacroinvertebrate taxa in the Queenslandstream system were used as output variables ofthe ANN models. Outputs in the databasereceive only two values, zero if the taxon isabsent and one when if the taxon is present.

2.2. Stream data preprocessing and modelling

The aim of the model was to determine biolog-ical conditions of sites with respect to referenceconditions based on the presence and absence ofinvertebrate taxa. The model was trained bymeans of reference data. Therefore, the modeloutputs strictly reflect ‘reference condition’. Theassessment of the health of specific sites is then ameasure based on the comparison between ob-served and predicted site data.

The flowchart in Fig. 2 shows the steps of datapreprocessing and modelling that were taken indeveloping a predictive ANN model for theQueensland stream system.

As a result of a manual consistency test of theraw database, 79 data sets with incomplete andunreasonable data were removed. The remainingdata sets were then randomly divided into 650training and 167 validation data sets. The follow-ing step of ANN design resulted in the selectionof 39 environmental variables considered as inputnodes and 37 macroinvertebrate taxa consideredas output nodes. A single hidden layer was set upfor the initial ANN based on 15 neurons. TheANN training was carried out by 5000 iterations

using the feedforward backpropagation algorithmand the sigmoid transfer function. Fig. 3 repre-sents the general ANN architecture of the streamhabitat model. In the context of the ANN valida-tion process the target was set to achieve correctpredictions for 75% of the cases in the validationdata set. This aim has finally been reached by therepeated refinement of 37 taxa specific ANN. Thisprocess was supported by results from sensitivityanalyses. After satisfying the predefined validationcriteria the 37 trained ANNs were applied to eachsite of a test database in order to assess the healthstatus of other 1159 test sites ranging from mildlyto severely degraded. According to Fig. 2, streamsite predictions were carried out where outputvalues in the range of 0.5–1 were interpreted aspresence (1) and value below 0.5 as absence (0).As a result, implications could be drawn withregards to the rationalisation of stream monitor-ing, stream management and research.

3. Results and discussion

3.1. Input sensiti�ity analysis

A comprehensive sensitivity analysis was con-ducted for the 37 ANN models for specificmacroinvertebrate taxa as part of the validationprocess. The sensitivity of each model wasquantified in terms of percentage output changeover the range of input data. As an example, Fig.4 shows the sensitivity of Cladocera to 10 inputvariables. It allows not only to draw conclusionsregarding the nature of certain inputs as drivingvariables but the preference of environmental con-ditions of macroinvertebrate taxa as well. FromFig. 4 it can be concluded that at higher dataranges the variables altitude, channel width, silt/clay ratio, and daily maximum temperaturechange Cladocera from presence towards absencein a sigmoid shape. Cladocera is known to befreely swimming (nektonic) and requires slowflowing water at depth. While water depth isincreasing flow velocity is slightly decreasing to-wards high order downland streams at low alti-tudes with small channel widths. The saddle shapeof the sensitivity to water depth indicates the

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206 199

Fig. 2. Approach for data preprocessing and ANN modelling of Queensland streams.

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206200

optimum depth for Cladocera from 0.2 to 1.2 m. Asimilar saddle shaped relationship was discoveredfor water temperature that indicates optimum con-ditions for Cladocera in the range from 10 to 30°C.This range appears to be the same as suggested tobe optimal for Daphnia species regarding physio-logical effects such as ingestion and reproductionrates (Lampert and Sommer, 1997). All the exam-ples discussed above indicate distinct driving vari-ables to Cladocera and need to be kept as such inthe ANN model. Other examples in Fig. 4 indicatelow input sensitivity of Cladocera like sand percent-age and slope. These variables appear to be redun-dant and may just cause noise to the ANN model.Therefore, it seems to be justified to remove themfrom the ANN model.

In the same way as discussed for Cladocera, thesensitivity analysis was applied to refine the 37

ANN models for all macroinvertebrate taxa in theframework of Fig. 2 in order to improve modelvalidity. The results of the sensitivity analysis aresummarised in Table 1.

3.2. ANN �alidation by means of 167 independentdata sets

After refining each taxon specific ANN modelbased on repeated sensitivity analysis and valida-tion, the overall rate of correct predictions ofstream sites in the validation data set by all modelswas better than 73%. Fig. 5 illustrates the develop-ment of correct predictions from the initial modelsprior to sensitivity analysis to the final refinedmodels. It demonstrates that each macroinverte-brate model had been improved even though spe-cific improvement rates differed largely.

Fig. 3. ANN architecture for the stream habitat model based on the feedforward backpropagation algorithm.

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206 201

Fig. 4. Sensitivity of the taxa Cladocera to changes of inputs within their data range.

H.

Hoang

etal./

Ecological

Modelling

195(2001)

195–

206202

Fig. 5. Correct predictions of macroinvertebrate taxa before and after exclusions of redundant or irrelevant inputs.

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206 203

Table 1Summary of the input sensitivity of 37 MI taxa by means of percentages of output change within the range of input change

In another attempt to validate the ANN models,predictions of stream sites of the training andvalidation data sets were made and evaluated by theratio observed to predicted (O/E) data. The resultsclearly indicate that there is a 99.55% correspon-dence with the reference sites for the training data(Fig. 6a) and still a 85.03% correspondence with thereference sites for the validation data (Fig. 6b).

In order to evaluate the relative performance ofthe ANN models, a comparison was conductedwith the model AusRivAS (Australian River As-sessment System) that was developed for the assess-ment of the health of Australian rivers by statisticalmodelling (Coysh et al., 2000). It was applied to theQueensland stream system based on the same

training and validation data of reference sites asused for ANN modelling in this study.

The comparison of O/E data calculated fromoutputs of the ANN and AusRivAS models areplotted in Fig. 6. An O/E value in the range from0.8 to 1.2 was selected corresponding with central80% of reference sites to indicate whether a specificsite was biologically degraded or not. This criteriawas previously suggested for applications of Aus-RivAS models (Coysh et al., 2000). Under optimalconditions the O/E values of both models shouldmeet this range for reference samples as referencesamples were used for developing and validatingboth models.

The results of the model comparison in Fig. 6

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206204

clearly show that the ANN models identifying muchbetter the reference sites as unimpaired comparedto AusRivAS. While the ANN models identified thecorrespondence of training data to reference sitesby 99.55%, AusRivAS did only by 60.10% (Fig. 6a).The ANN models performed similarly well for thevalidation data (85.03%) compared to 54.71% by

AusRivAS models (Fig. 6b).The results of this study clearly illustrated the

superiority of the ANN models compared to theAusRivAS models in terms of validity.

The validation results of the ANN models finallyjustified their application to the prediction andassessment of stream test sites.

Fig. 6. Distribution of observed/expected ratios for predicted stream sites using the AusRivAS and the ANN models. Plots arepresented for the training (a) and the independent validation (b) data of reference sites.

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206 205

Fig. 7. Distribution of observed/expected ratios for predicted stream sites of the test data.

3.3. ANN prediction and assessment of stream testsites

The application of the validated ANN models toan independent test data set of stream sites resultedin a slightly positively skewed distribution of theO/E data (Fig. 7) that might be representative fora field data set. Whilst the majority of data(42.71%) are in the range of 0.8�O/E�1.2 andindicate no degradation effects of correspondingstream sites, 27.49% of the data indicate mildly tomoderately impaired sites. In 30.8% of the dataricher invertebrate communities are indicated thanat reference sites that might be subject to researchfor causal clarification.

4. Conclusions

The paper documents a predictive stream modeldeveloped for the Queensland Stream system bymeans of artificial neural networks (ANN). Thirty-seven ANN models were trained and validated bymeans of 896 stream data sets. Data preprocessingand sensitivity analyses proved to be crucial inorder to maintain data consistency in the context

of this modelling approach. The model validationby means of 167 independent data sets revealed73% as lowest rate and 82% as average rate ofcorrect ANN predictions of stream site habitats.The comparison with the alternative statisticalstream models AusRivAS resulted in a more than30% better identification of training as well asvalidation data known as reference sites.

The validation results justified successful applica-tions of the ANN models to the prediction andassessment of stream habitats based on an indepen-dent database for test sites.

Future research will focus on following issues:1. Representations of categorical data need to be

improved based on rational classification. Or-dered categorical criteria should not be split astheir order clearly affects weights of the connec-tion links in neural networks. Conversely, non-ordered categorical criteria (simple notations)should be split into separate inputs to avoidmisinterpretation of notations as arithmeticnumbers.

2. Comprehensive sensitivity analyses will furthercontribute to the elucidation of relationshipbetween environmental condition and the distri-bution of stream macroinvertebrates.

H. Hoang et al. / Ecological Modelling 195 (2001) 195–206206

3. Macroinvertebrate data need to be classifiedeither taxonomically or ecologically (functionalgroups, trophic levels, etc.). The classification ofmacroinvertebrates may overcome difficultiesfor ANN training caused by the distribution ofpresence/absence (1/0) data in cases where spe-cies had too high or too low probability ofoccurrence.

4. Cause and nature of overpopulation as pre-dicted for some test sites need to be clarified.

5. Training and validation using databases fromother Australian stream systems will contributeto a generalisation of the ANN stream models.

Acknowledgements

The authors wish to thank Diane Conrick, ChrisMarshall, Marcus Stowar, Senad Adams and othermembers of the Queensland Department of Natu-ral Resources who assisted with the collection ofstream data used in this study. Additionally, theauthors thank Peter Negus and two anonymousreviewers for valuable comments, which led to animproved paper.

References

Allan, J.D., 1995. Stream Ecology. Structure and Functioningof Running Waters. Chapman & Hall, London.

Coysh, J., Nichols, S., Simpson, S., Norris, R., Barmuta, L.,Chessman, B., Blackman, P., 2000. Australian River As-

sessment System — National River Health Program Pre-dictive Model Manual. CRC Freshwater Ecology,University of Canberra, Canberra, ACT.

Hauer, F.R., Lamberti, G.A. (Eds.), 1996. Methods in StreamEcology. Academic Press, New York.

Hilsenhoff, W.L., 1988. Rapid field assessment of organicpollution with a family-level biotic index. J. N. Am. Ben-thol. Soc. 7, 65–68.

Lampert, W., Sommer, U., 1997. Limnoecology: The Ecologyof Lakes and Streams. Oxford University Press, NewYork.

Moss, D., Furse, M.T., Wright, J.F., Armitage, P.D., 1987.The prediction of macro-invertebrate fauna of unpollutedrunning-water sites in Great Britain using environmentaldata. Freshwater Biol. 17, 41–52.

Pudmenzky, A., Marshall, J., Choy, S., 1998. Preliminaryapplication of artificial neural network model for predict-ing macroinvertebrates in rivers. Freshwater BiologicalMonitoring Report No. 9, The State of Queensland, De-partment of Natural Resources, 1–7.

Recknagel, F., Wilson, H., 2000. Elucidation and prediction ofaquatic ecosystems by artificial neural networks. In: Lek,S., Guegan, J.F. (Eds.), Artificial Neuronal Networks:Application to Ecology and Evolution. Springer Verlag,Berlin, New York.

Schleiter, I.M., Borchardt, D., Wagner, R., Dapper, T.,Schmidt, K.-D., Schmidt, H.-H., Werner, H., 1999. Mod-elling water quality, bioindication and population dynam-ics in lotic ecosystems using neural networks. Ecol. Model.120, 271–286.

Simpson, S., Norris, R., Barmuta, L., Blackman, P., 1997.Australian River Assessment System — National RiverHealth Program Predictive Model Manual (first draft).CRC Freshwater Ecology, University of Canberra, Can-berra, ACT.

Walley, W.J., Fontama, V.N., 1998. Neural network predic-tors of average score per taxon and number of families atunpolluted river sites in Great Britain. Water Res. 32 (2),613–622.