Chemometric application in classification and assessment of monitoring locations of an urban river...

10
Analytica Chimica Acta 582 (2007) 390–399 Chemometric application in classification and assessment of monitoring locations of an urban river system Prakash Raj Kannel a,, Seockheon Lee a , Sushil Raj Kanel b , Siddhi Pratap Khan c a Water Environment & Remediation Research Center, Korea Institute of Science and Technology, P.O. Box 131, Cheongryang, Seoul 130-650, Republic of Korea b Department of Environmental Science and Engineering, Gwangju Institute of Science and Technology, 1 Oryong-dong, Buk-gu, Gwangju 500-712, Republic of Korea c Groundwater Resources Development Project, Department of Irrigation, Nepal Received 30 April 2006; received in revised form 2 July 2006; accepted 6 September 2006 Available online 8 September 2006 Abstract The study presents the application of selected chemometric techniques: cluster analysis, principal component analysis, factor analysis and discriminant analysis, to classify a river water quality and evaluation of the pollution data. Seventeen stations, monitored for 16 physical and chemical parameters in 4 seasons during the period 1999–2003, located at the Bagmati river basin in Kathmandu Valley, Nepal were selected for the purpose of this study. The results allowed, determining natural clusters of monitoring stations with similar pollution char- acteristics and identifying main discriminant variables that are important for regional water quality variation and possible pollution sources affecting the river water quality. The analysis enabled to group 17 monitoring sites into 3 regions with 5 major discriminating variables: EC, DO, CL, NO 2 N and BOD. Results revealed that some locations were under the high influence of municipal contamination and some others under the influence of minerals. This study demonstrated that chemometric method is effective for river water classification, and for rapid assessment of water qualities, using the representative sites; it could serve to optimize cost and time without losing any significance of the outcome. © 2006 Elsevier B.V. All rights reserved. Keywords: Water quality; Classification; Bagmati River; Chemometrics 1. Introduction With an increased understanding of the importance of fresh waters systems to public benefit and to aquatic life, assessment and classification of water qualities for effective management options is becoming concern. The development of a surface water monitoring network is a critical element in the assess- ment, restoration, and protection of stream water quality [1]. The assessments of water quality contaminations require monitoring of a wide range of physical, chemical and biological parameters. The usual situation is the measurement of multiple parameters, taken at different monitoring times, and from many monitoring stations. Therefore a complex data matrix is frequently needed to evaluate water quality [2]. However, the long-term monitor- Corresponding author. Tel.: +82 2 958 5848; fax: +82 2 958 6854. E-mail address: [email protected] (P.R. Kannel). ing programs of water quality produce large sets of data, are often difficult to interpret [3]. Selection of the most suitable sta- tistical methods is fundamental to obtaining meaningful results, especially when assessing large and irregular chemical datasets [4]. Chemometric methods (also known as multivariate statisti- cal techniques) are increasingly in use, which provide several avenues for exploratory assessment of water quality data sets and classification of water qualities. Chemometric methods identify the natural clustering pattern and group variables on the basis of similarities between the samples. The most common methods of chemometric methods for classification are namely, cluster analysis (CA) and principal component analysis (PCA) with factor analysis (FA). The discriminant analysis (DA) is used to confirm the groups found by means of the CA and PCA. These multidimensional data analysis methods are increasingly in use for environmental studies dealing with measurements and monitoring. 0003-2670/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2006.09.006

Transcript of Chemometric application in classification and assessment of monitoring locations of an urban river...

A

dasaaDuao©

K

1

waowmaoTtst

0d

Analytica Chimica Acta 582 (2007) 390–399

Chemometric application in classification and assessment ofmonitoring locations of an urban river system

Prakash Raj Kannel a,∗, Seockheon Lee a, Sushil Raj Kanel b, Siddhi Pratap Khan c

a Water Environment & Remediation Research Center, Korea Institute of Science and Technology,P.O. Box 131, Cheongryang, Seoul 130-650, Republic of Korea

b Department of Environmental Science and Engineering, Gwangju Institute of Science and Technology,1 Oryong-dong, Buk-gu, Gwangju 500-712, Republic of Korea

c Groundwater Resources Development Project, Department of Irrigation, Nepal

Received 30 April 2006; received in revised form 2 July 2006; accepted 6 September 2006Available online 8 September 2006

bstract

The study presents the application of selected chemometric techniques: cluster analysis, principal component analysis, factor analysis andiscriminant analysis, to classify a river water quality and evaluation of the pollution data. Seventeen stations, monitored for 16 physicalnd chemical parameters in 4 seasons during the period 1999–2003, located at the Bagmati river basin in Kathmandu Valley, Nepal wereelected for the purpose of this study. The results allowed, determining natural clusters of monitoring stations with similar pollution char-cteristics and identifying main discriminant variables that are important for regional water quality variation and possible pollution sourcesffecting the river water quality. The analysis enabled to group 17 monitoring sites into 3 regions with 5 major discriminating variables: EC,O, CL, NO N and BOD. Results revealed that some locations were under the high influence of municipal contamination and some others

2

nder the influence of minerals. This study demonstrated that chemometric method is effective for river water classification, and for rapidssessment of water qualities, using the representative sites; it could serve to optimize cost and time without losing any significance of theutcome.

2006 Elsevier B.V. All rights reserved.

iote[

cacts

eywords: Water quality; Classification; Bagmati River; Chemometrics

. Introduction

With an increased understanding of the importance of freshaters systems to public benefit and to aquatic life, assessment

nd classification of water qualities for effective managementptions is becoming concern. The development of a surfaceater monitoring network is a critical element in the assess-ent, restoration, and protection of stream water quality [1]. The

ssessments of water quality contaminations require monitoringf a wide range of physical, chemical and biological parameters.he usual situation is the measurement of multiple parameters,

aken at different monitoring times, and from many monitoringtations. Therefore a complex data matrix is frequently neededo evaluate water quality [2]. However, the long-term monitor-

∗ Corresponding author. Tel.: +82 2 958 5848; fax: +82 2 958 6854.E-mail address: [email protected] (P.R. Kannel).

oaftTim

003-2670/$ – see front matter © 2006 Elsevier B.V. All rights reserved.oi:10.1016/j.aca.2006.09.006

ng programs of water quality produce large sets of data, areften difficult to interpret [3]. Selection of the most suitable sta-istical methods is fundamental to obtaining meaningful results,specially when assessing large and irregular chemical datasets4].

Chemometric methods (also known as multivariate statisti-al techniques) are increasingly in use, which provide severalvenues for exploratory assessment of water quality data sets andlassification of water qualities. Chemometric methods identifyhe natural clustering pattern and group variables on the basis ofimilarities between the samples. The most common methodsf chemometric methods for classification are namely, clusternalysis (CA) and principal component analysis (PCA) withactor analysis (FA). The discriminant analysis (DA) is used

o confirm the groups found by means of the CA and PCA.hese multidimensional data analysis methods are increasingly

n use for environmental studies dealing with measurements andonitoring.

P.R. Kannel et al. / Analytica Chimica Acta 582 (2007) 390–399 391

ing st

tattdmt[nrmoPigiaatBa[

ittc

ds

2

2

tMaBetsa1cJti[oi

Fig. 1. Map of Kathmandu Valley showing water quality monitor

The applications of different chemometric methods viz., clus-er analysis (CA), principal component analysis (PCA), factornalysis (FA) and discriminant analysis (DA) aids in reducinghe complexity of large data sets and offers better interpreta-ion and understanding of water quality [5–14]. Environmentalata are not, in general, normally distributed [15]. However,ost of the multivariate methods are based on the normal dis-

ribution of the data such as correlation analysis, FA and DA15]. Serious violation of normality, such as too high skew-ess and outliers, can impair the variogram structure and theesults. This requires the raw data set to transform into nor-al data matrix using transformation methods. Various meth-

ds are available to transform the data such as Logarithm,ower, Box–Cox. Logarithmic transformation is widely applied

n order to normalize positively skewed data sets, which areenerally found in the environmental data sets. However, its observed that data sets in environmental sciences do notlways follow the lognormal distribution, and in such cases,power transformation is needed, and Box–Cox transforma-

ion is one of the most frequently used of these [16,17]. Theox–Cox method, developed by Box and Cox [17] is a generalnd widely used method for transforming data in linear models18].

In this study, a data matrix, obtained during a 5-year mon-

toring period, is subjected to different multivariate statisticalechniques CA, PCA, FA, DA to extract the information abouthe similarities or dissimilarities between sampling sites, tolassify the river water in clusters, to identify the important

dlau

ations R1–R10 along Bagmati River and T1–T7 in its tributaries.

iscriminant variables, and the influencing possible pollutionources affecting the river water quality.

. Materials and methods

.1. Study area

Bagmati river basin (drainage area 3500 km2) is drained byhe Bagmati river with six major tributaries: Hanumante khola,

anahara khola, Dhobi khola, Tukucha khola, Bishnumati kholand Nakkhu khola. This study covers the drainage area of theagmati river within the Kathmandu valley (Fig. 1), which cov-rs an area of about 650 km2 with altitude varying from 1220o 2800 m above mean sea level. The valley is of nearly roundhaped with diameter of about 25–30 km and is in warm temper-te climate zone, which receives annual average rainfall of about900 mm. The valley has warm temperate semitropical type oflimate with warmest month in May–June and coldest month inanuary–February, the temperature of which ranges from about 0o above 34 ◦C. The land use pattern according to 2000 estimates: forestry 31%, agriculture 41% and non-agriculture land 28%19]. The main river Bagmati originates at Shivapuri Lekh northf Kathmandu city. The average flow of the river at Khokanas 18.7 m3 s−1. The river is an important source of water for

rinking, industrial, irrigation and recreation for about 1.6 mil-ion people. The river provides almost 92% of the wet seasonnd 60% of the dry season municipal water supply [20]. Waterse of this river is 66% for irrigation, 31% for water supply and

3 Chimi

3amwa

2

iatwtkk(opsap(dowota

2

ddttn

aKttwtcw

W

waT

wgotdcwtwb

Z

wddfi

TW

S

1111111

92 P.R. Kannel et al. / Analytica

% for industrial use [21]. The river is the holiest in Nepal withnumber of temples located on its banks and is worshiped byillions of Hindu people from all over the world. Every Hinduishes that he/she be cremated on the banks of this river [22]

nd devotees conduct daily rituals and takes bath in the river.

.2. Monitored parameters and analytical methods

Water samples were collected seasonally (four times a year)n pre-monsoon, monsoon, post-monsoon and winter seasonst 17 sites for the period of 5 years (1999–2003). The moni-oring stations (Fig. 1) cover about 25 km along the main riverith 10 stations (R1–R10) along the main stem and seven sta-

ions (T1–T7) along the six different tributaries: Hanumantehola (T1), Manahara khola (T2), Dhobi khola (T3), Tukuchahola (T4), Bishnumati khola (T5, T7), and Nakkhu kholaT6). The data sets taken in this study comprised of continu-usly monitored 16 water quality parameters: water temperature,H, electrical conductivity (EC), dissolved oxygen (DO), totaluspended solids (TSS), calcium (Ca), magnesium (Mg), totallkalinity as CaCO3 (Talk), chlorides (CL), orthophosphates ashosphorus (PO4P), total phosphorus (TP), ammonia nitrogenNH4N), nitrate nitrogen (NO3N), nitrite nitrogen (NO2N), 5ays biochemical oxygen demand as O2 (BOD) and chemicalxygen demand as O2 (COD). All the water quality parametersere sampled, preserved and analyzed as per standard methodsf APHA-AWWA-WPCF [23] and USGS [24]. The basic statis-ics of the data set and methods of analysis on river water qualityre summarized in Table 1 and the results are in Table 2.

.3. Data treatment and chemometric methods

The standardized skewness and standardized kurtosis wereetermined to see whether the sample comes from a normal

istribution. Values of these statistics outside the range of −2o +2 indicate significant departures from normality. The sta-istical analysis of data showed that most of the data are notormally distributed and are skewed (standardized skewness

vosc

able 1ater quality parameters, units and analytical methods used during 1999–2003 for su

. no. Parameters Abbreviation Units

1 Temperature Wtemp ◦C2 pH pH pH units3 Dissolved oxygen DO mg L−1

4 Electrical conductivity EC �s cm−1

5 Total suspended solids TSS mg L−1

6 Total alkalinity Talk mg L−1

7 Calcium Ca mg L−1

8 Magnesium Mg mg L−1

9 Chloride CL mg L−1

0 Inorganic phosphorus PO4P mg L−1

1 Total phosphorus TP mg L−1

2 Ammonia nitrogen NH4N mg L−1

3 Nitrite nitrogen NO2N mg L−1

4 Nitrate nitrogen NO3N mg L−1

5 Biochemical oxygen demand BOD mg L−1

6 Chemical oxygen demand COD mg L−1

ca Acta 582 (2007) 390–399

nd kurtosis were −0.79 to 34.94 and −4.25 to 131.89). Theolmogorove–Smirnov (K–S) statistic [25] was also used to test

he goodness of fit of the data to normal distribution. Accordingo the K–S test, not all the variables were normally distributedith 95% confidence. Box–Cox transformation was used to

ransform the data set in normal form. The transformations pro-edure is designed to determine an optimal transformation for Yhile fitting a linear regression model [25]

= β0 + β1X + ε

here W is the transformed variable, X is the regression vari-bles, β0 and β1 are the unknown parameters and ε is the error.he dependent variable W is related to Y according to equation

W = 1 + (Y + λ2)λ1 − 1

λ1Gλ1−1 for λ1 �= 0,

W = G ln(Y + λ2) for λ1 = 0

here λ1 and λ2 are the transformation parameter and G is theeometric mean of Y + λ2. The optimal transformation is thene that minimizes the mean squared error for W. Quantita-ive variables had different units of measurement and widelyiffering ranges. It is first necessary to standardize data beforeluster analysis [26] because each water quality measurementas made at different units. Standardization of the data ensures

hat each variable had the same influence in the analysis. The dataere standardized (to the Z score with mean = 0 and S.D. = 1)y applying the following equation [27]:

= x − x′

σ

here x is the original value of measured parameter, z the stan-ardized value, x′ the average value of variable andσ the standardeviation. Spearman rank-order correlations (Spearman R coef-cient) were used to study the correlation structure between

ariables. Spearman coefficients are computed from the ranksf the data values rather than from the values themselves. Con-equently, they are less sensitive to outliers than the Pearsonoefficients.

rface waters of the Bagmati River

Analytical methods Instruments

Instrumental ThermometerInstrumental Portable HACH meterInstrumental HANNA, HI 9142Instrumental Conductivity meterFiltration and gravimetric Temperature controllable ovenDigital titrimetric, electronic HACH-DREL/2000, USADigital titrimetric, complexometric HACH-DREL/2000Digital titrimetric HACH-DREL/2000Digital titrimetric HACH-DR/2000Phosphomolybdate method UV 2000 spectrophotometerAscorbic acid reduction method UV 2000 spectrophotometerNesslirization method UV 2000 spectrophotometerDiazotisation method UV 2000 spectrophotometerCadmium reduction HACH-DR/20005-Day incubation, 20 ◦C BOD incubator and DO meterPotassium dichromate oxidation Refluxing assemble

P.R.K

anneletal./Analytica

Chim

icaA

cta582

(2007)390–399

393Table 2aMean values of water quality measurement along Bagmati River and its tributaries in 1999–2003

Parameters R1 R2 R3 R4 R5 R6 R7 R8

Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D.

Water temperature 19.24 5.79 20.11 5.58 20.35 5.44 20.99 4.10 21.10 5.63 20.76 5.84 20.44 6.41 20.69 5.51pH 7.37 1.14 7.16 0.70 7.16 0.78 7.21 0.65 7.17 0.58 7.25 0.62 7.21 0.47 7.21 0.45EC 51.80 16.28 91.71 86.31 104.65 77.42 185.76 155.02 315.76 208.78 304.88 189.14 452.82 231.16 476.59 291.98DO 7.69 1.04 6.69 1.42 6.19 1.31 3.98 1.82 3.02 2.17 3.26 2.41 2.41 1.65 2.22 1.69TSS 102.23 119.16 134.46 164.30 125.89 117.55 202.82 218.20 199.01 236.37 512.68 741.67 427.06 489.67 435.24 514.72Ca 8.53 5.12 10.48 7.04 11.89 9.93 14.77 8.47 19.30 7.93 25.56 12.43 30.22 10.72 29.64 9.05Mg 5.73 7.01 5.91 4.99 5.35 5.11 8.14 6.23 10.25 10.64 7.65 5.27 13.23 8.35 13.66 8.73Total alkalinity 28.07 13.70 35.33 27.47 37.67 21.56 67.33 52.78 111.49 73.12 116.05 72.48 162.10 98.01 169.46 98.68Chloride 5.44 2.95 6.94 3.84 7.67 4.41 13.63 8.98 24.36 18.26 22.00 15.27 30.73 15.92 39.33 30.65PO4 as P 0.31 0.49 0.37 0.58 0.37 0.71 0.75 0.97 1.70 1.73 2.09 1.98 1.76 1.64 1.89 1.95TP as P 0.37 0.56 0.40 0.58 0.41 0.71 0.84 1.04 1.97 2.14 2.39 2.25 1.88 1.87 1.98 2.10Ammonia as N 1.70 2.49 3.95 4.79 3.78 3.96 7.54 8.12 13.04 13.36 9.77 11.37 11.84 10.33 15.85 16.88Nitrate as N 1.55 1.75 0.93 0.97 1.24 1.61 2.23 2.91 2.29 4.02 2.01 3.06 4.16 6.52 3.36 5.61Nitrite as N 0.08 0.13 0.17 0.28 0.22 0.23 0.28 0.28 0.21 0.31 0.27 0.31 0.31 0.53 0.45 0.67BOD 5.41 4.22 12.14 8.11 15.87 10.59 41.36 38.32 56.03 49.44 47.96 44.58 49.24 35.83 62.65 43.07COD 16.78 15.97 26.68 15.27 31.04 16.01 59.84 41.28 85.08 52.75 65.69 46.07 82.34 60.52 102.05 62.89

S.D.: standard deviation.

Table 2bMean values of water quality measurement along Bagmati River and its tributaries in 1999–2003

Parameters R9 R10 T1 T2 T3 T4 T5 T6 T7

Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D. Average S.D.

Water temperature 19.66 5.73 20.36 6.01 19.48 5.98 21.16 5.51 20.91 5.12 20.14 4.84 21.13 5.82 20.15 6.00 20.13 6.04pH 7.40 0.58 7.61 0.50 7.45 0.52 7.56 0.80 7.26 0.39 7.24 0.38 7.37 0.46 8.20 0.70 7.60 0.61EC 435.47 233.22 382.24 218.68 378.76 242.45 103.24 51.82 520.24 280.62 889.59 312.64 489.29 286.05 178.71 41.35 167.18 90.12DO 2.24 1.74 3.62 2.29 4.01 2.57 6.41 1.76 2.14 1.64 1.84 1.39 2.27 1.66 7.58 1.46 6.11 1.64TSS 243.90 239.00 250.46 355.69 177.01 213.16 237.04 271.83 451.72 459.86 251.97 200.50 387.29 516.27 334.95 478.01 160.71 174.46Ca 29.76 8.60 34.96 11.21 34.34 13.03 11.09 5.09 28.23 8.66 41.34 10.38 36.99 13.73 33.31 9.73 26.73 16.53Mg 13.98 8.32 11.54 8.62 15.93 9.35 7.07 5.87 12.56 6.45 16.09 9.48 15.68 11.29 9.95 5.53 10.15 10.36Total alkalinity 151.20 85.67 137.03 80.08 172.28 103.41 44.30 24.68 184.09 101.67 275.78 88.48 190.16 105.33 93.19 34.41 82.64 50.89Chloride 29.14 16.66 23.44 14.52 22.80 15.49 7.50 3.61 39.35 22.95 69.84 40.23 29.34 19.11 4.70 2.33 16.48 21.49PO4 as P 2.95 2.29 2.09 1.76 2.12 2.60 0.39 0.56 2.49 2.31 5.71 4.49 2.45 3.03 0.22 0.29 0.44 0.84TP as P 3.03 2.40 2.32 2.15 2.20 2.72 0.44 0.56 2.69 2.66 6.61 7.13 2.73 3.54 0.26 0.33 0.49 0.88Ammonia as N 18.84 18.96 12.27 13.97 12.00 13.31 1.79 1.79 23.10 37.56 37.89 50.78 21.00 25.54 1.36 1.67 3.02 3.67Nitrate as N 3.91 7.51 1.54 2.82 2.15 3.34 1.66 1.99 4.68 7.98 4.89 9.05 4.22 7.71 1.57 2.65 2.27 3.96Nitrite as N 0.47 0.75 0.36 0.49 0.27 0.37 0.20 0.29 0.31 0.56 0.43 0.76 0.36 0.58 0.20 0.43 0.17 0.30BOD 56.80 47.46 39.09 47.07 43.92 45.33 17.17 12.56 66.25 39.98 80.29 65.34 58.75 47.72 7.10 4.49 16.14 10.34COD 93.10 50.90 59.15 55.34 61.07 50.04 27.43 16.56 95.59 51.38 130.45 69.95 97.87 70.05 20.87 30.72 24.07 11.83

3 Chimi

2

efttgesMrtam[

d

wtt

2

fntIlnwc

z

wtt

2

osif

z

wfon

2

mmd

[otd

f

wtob

tIbasc

3

eaFbmonitoring sites R1, R2, R3, T2, T6, T7 and corresponds to theless polluted sites. These stations lie in the rural areas far frommunicipal pollution except the station T2 located in the tributaryManahara khola, which flows through the areas in less pollution

94 P.R. Kannel et al. / Analytica

.3.1. Cluster analysis (CA)CA classifies objects (cases) into classes (clusters), so that

ach object is similar to the others within a class but differentrom those in other classes with respect to a predetermined selec-ion criterion [28,29]. Hierarchical agglomerative clustering ishe most common approach typically illustrated by a dendro-ram [30]. CA using the Ward’s method is regarded as a veryfficient method and was applied to the standardized data con-idering previous reports from the literatures [28,31,29,6,32].

any applications of CA to water quality assessment have beeneported [33,31,6–10,14]. Cluster analysis using city block dis-ance dampens the effect of outliers, as the average differencescross dimensions are not squared. The distance between twoonitoring point locations using city block distance is given by

25]

(x, y) =p∑

i=1

|xi − yi|

here, d(x, y) denotes the Euclidean/city block distance betweenwo items represented by xi, yi and p the dimensional space ofhe variables.

.3.2. Principle component analysis (PCA)Principle component analysis [29] is a technique widely used

or reducing the dimensions of multivariate problems. As aon-parametric method of classification, it makes no assump-ions about the underlying statistical data distribution [6,32,9].t reduces the dimensionality of data set by explaining the corre-ation amongst a large number of variables in terms of a smallerumber of underlying factors (principal components or PCs)ithout losing much information [34,35,6,9,32]. The principal

omponent (PC) can be expressed as [5]

ij = ai1x1j + ai2x2j + · · · + aimxmj

here z is the component score, a is the component loading, xhe measured value of variable, i is the component number, j ishe sample number and m is the total number of variables.

.3.3. Factor analysis (FA)The rotation of the axis defined by PCA produces new groups

f variables called varifactors (VFs). VFs usually group thetudied variables in accordance with common features and cannclude unobservable, hypothetical, latent variables [6,32]. Inactor analysis, the basic concept is expressed as [5]

ij = af1f1i + af2f2i + · · · + afmfmi + efi

here z is the measured value of a variable, a the factor loading,the factor score, e the residual term accounting for errors orther sources of variation, i the sample number, j the variableumber, and m the total number of factors.

.3.4. Discriminant analysis (DA)

DA provides statistical classification of samples sharing com-

on properties and is performed with prior knowledge ofembership of objects to a particular group. It builds up a

iscriminant function for each group operating on raw data

Fqm

ca Acta 582 (2007) 390–399

29,9,26,5,36]. Such a function represents a surface dividingur data space into regions. DA has same discriminant abilityo experimental data with and without standardization [9]. Theiscriminant function has the form [9,29,36]

(Gi) = ki +n∑

j=1

wijpij

here i is the number of groups (Gi). ki is the constant inherento each group, n is the number of parameters used to classify a setf data into a given group. wij the weight coefficient, assignedy DA to a given selected parameter (pij).

In this study, DAs were performed on data matrix by usinghe standard, forward stepwise and backward stepwise modes.n forward stepwise mode, variables are included step-by-stepeginning with the more significant until no significant changesre obtained. In backward stepwise mode, variables are removedtep-by-step beginning with the less significant until no signifi-ant changes are obtained [37].

. Results and discussions

The dendrogram of the location pattern (locations of differ-nt pollution state) resulting from the CA (with ward’s methodnd city block distance) of measured data set is presented inig. 2. The dendrogram shows that the monitoring locations cane grouped into three main clusters. Cluster-I is formed by the

ig. 2. Dendrogram of clustering of sampling sites according to surface wateruality characteristics of the Bagmati River and its tributaries using wards’sethod and city block distance metric.

Chimica Acta 582 (2007) 390–399 395

aBRT(t3rfiw

tipthkmhittgc

badilwi

pAvisatn

Fm

cvt

tctd(mff90taicmTd

Ft

P.R. Kannel et al. / Analytica

ctivities. Ha and Pokhrel [22] in a study, of the upstream area ofagmati (Gaurighat watershed), which covers the stations R1,2 and R3, found that pollution contribution is 89% BOD, 73%N, 97% TP from living beings and 5% BOD from industries

with no nutrients). The land use (forest, agricultural, residen-ial, commercial) pollution contributes 6% BOD, 27% TN and% TP. The contribution of animal generated pollution is less asural people use animal wastes as fertilizer in the field. It con-rmed that the pollution in these areas is mainly from humanastes.The cluster II formed by R4, R5, R6, R10, and T1 corresponds

o the medium pollution sites, all lies in the city area. Station T1s located in a tributary (Hanumante khola), which carries highollution flowing through the Bhaktapur municipality. The clus-er III formed by R7, R8, R9, T3, T4 and T5 corresponds to theighly polluted sites. The three tributaries Dhobi khola, Tukuchahola and Bishnumati khola flowing through the Kathmanduunicipality containing stations T3, T4 and T5, respectively are

eavily polluted by municipal wastes. The station T4 character-zed by long cluster is highly polluted which flows throughouthe heart of the Kathmandu municipality. It is to be noted thathe station R10 in cluster II is far from others considering theireographic locations. This indicates the existence of assimilativeapacity and self-purification of the river as it flows downstream.

The surface waters in the clusters II and III are degradedecause of inadequate treatment facilities that have acceler-ted the discharge of untreated wastes and wastewater fromomestics, industries and hospitals [38,39]. The major pollutingndustries in the Kathmandu valley are vegetable oil, distilleries,eather, carpet, beer and dairy [40]. MOPE [41] has reported thatater pollution in the urban areas is mainly related to the munic-

pal sewerage system and storm-water drainage.On the basis of eigenvalues >1, PCA evolved three princi-

le components explaining about 90.5% of the total variance.ccording to the eigenvalue criterion, only the PCs with eigen-alues greater than one are considered important. This criterions based on the fact that the average eigenvalues of the auto-

caled data is just one [27]. This facilitated the explanation ofll the 17 monitoring stations for the original 16 variables inhe reduced space by three sets of calculated principal compo-ents (PCs). The scatter plot (Fig. 3) for the first two principal

1iifi

ig. 4. (a) Plot of discriminant functions for Box-transformed mean monitoring data sransformed for all monitoring data showing three clusters: LP, MP and HP.

ig. 3. Scatter plot of the principal component scores for Box–Cox transformedonitoring data showing three clusters.

omponent scores (neglecting the effect of third PC which hasariance of 7.53%) shows the clear differentiation of the moni-oring locations as grouped by CA.

The river water classification obtained from CA was fur-her confirmed though DA. The discriminant functions werealculated using (i) mean data excluding the variables: wateremperature and pH as their inclusion resulted in the linearlyependent variables and have high wilk lamda >0.85 [27] andii) all measured data (to confirm the classification formed usingean data). The DA resulted in two discriminant orthogonal

unctions. The mean data analysis resulted in the discriminatingunction 1 with p-values of 0.0001 wilks lamda of 0.000176,4.66% variance and discriminant function 2 with p-value of.0579, wilks lamda of 0.054388, 5.34% variance. The first func-ion with high information (94.66%) is statistically significantt the 95% confidence level. Clearly, the plot of the discrim-nant functions (Fig. 4a) shows that the monitoring locationsan be divided into three groups: high pollution region (HP),edium pollution region (MP) and low pollution region (LP).he plot of the discriminant functions using all the monitoredata as shown in Fig. 4(b) resulted in the discriminating function

with p-values of 0.0000, wilks lamda of 0.296 and discrim-

nant function 2 with p-value of 0.2208, wilks lamda of 0.935n standard mode. The most of the information is contained inrst discriminant factor with 96.86% of total variance. Clearly

howing three clusters: LP, MP and HP. (b) Discriminant functions for Box–Cox

396 P.R. Kannel et al. / Analytica Chimica Acta 582 (2007) 390–399

Table 3aClassification function coefficients of DA for all data measurements in Bagmati river basin

Parameters Standard mode Forward/backward stepwise mode

HP LP MP HP LP MP

Water temperature 0.280 −0.399 0.144pH −0.198 0.277 −0.103EC 0.982 −0.730 −0.327 0.961 −0.785 −0.215DO −0.989 1.270 −0.330 −0.874 1.190 −0.379TSS 0.664 −0.676 0.011 0.731 −0.740 0.010Ca 0.337 −0.626 0.353Mg 0.342 −0.373 0.038Talk −0.636 0.926 −0.341CL 0.161 −0.053 −0.124PO4P 1.254 −1.553 0.289TP −1.127 1.025 0.192NH3N 0.117 −0.044 −0.085NO3N −0.043 0.175 −0.162NO2N 0.308 −0.465 0.186 0.277 −0.430 0.186BOD 0.188 −0.734 0.660 0.683 −1.042 0.432C .109

C .251 −2.166 −2.602 −1.188

tp

tTmvm6wsDmt

i

Table 3bClassification matrix of DA for all data measurement in Bagmati river basin

Regions Predicted regions (%)

HP LP MP

Standard modeHP 70.59 0.00 29.41LP 1.96 88.24 9.80MP 27.06 16.47 56.47

Percent of cases correctly classified: 72.66%

Forward/backward stepwise modesHP 66.67 0.00 33.33

TS

WpEDTCMTCPTNNNBC

B

OD 0.492 −0.400 −0

onstant −2.334 −2.861 −1

he plot between two discriminant factors resulted in the threeollution regions with some overlapping.

To find the discriminating variables, the data was subjectedo standard, forward and backward stepwise DA (Table 3a).he standard mode yielded the corresponding correlationatrixes assigning 67.82% correctly using 16 discriminant

ariables (Table 3b). The forward stepwise/backward stepwiseodes yielded the corresponding correlation matrixes assigning

7.82% correctly using only 5 discriminant variables (Table 3b)ith little difference in match for each region compared with the

tandard mode (with some differences in medium region). Thus,A results suggest that EC, DO, TSS, NO2N and BOD are the

ost significant parameters (Table 3a) to discriminate between

he three regions as LP. MP and HP.The correlation matrix, presented in Table 4, showed the high

nterdependence between particular variables such as the high

LP 0.00 87.25 12.75MP 30.59 23.53 45.88

Percent of cases correctly classified: 67.82%

able 4pearman rank correlation coefficients on raw mean data of the Bagmati River and its tributaries (1999–2003)

Water temperature pH EC DO TSS Ca Mg Talk CL PO4P TP NH4N NO3N NO2N BOD COD

ater temperature 1.00H 0.00 1.00C −0.04 0.11 1.00O 0.15 0.26 −0.72 1.00SS 0.25 −0.24 −0.04 −0.13 1.00a −0.10 0.28 0.82 −0.47 −0.08 1.00g −0.06 0.38 0.57 −0.25 −0.03 0.59 1.00

alk −0.05 0.17 0.92 −0.64 0.03 0.84 0.60 1.00L −0.20 −0.12 0.84 −0.73 0.01 0.64 0.43 0.78 1.00O4P −0.14 −0.06 0.64 −0.62 0.02 0.43 0.31 0.59 0.64 1.00P −0.17 −0.07 0.65 −0.63 0.00 0.44 0.33 0.61 0.63 0.96 1.00H4N −0.12 0.03 0.70 −0.56 −0.11 0.52 0.50 0.60 0.65 0.67 0.64 1.00O3N −0.23 0.07 0.04 0.06 0.02 0.12 0.29 0.06 0.12 0.18 0.16 0.25 1.00O2N 0.00 −0.20 0.23 −0.27 −0.04 0.15 0.00 0.15 0.28 0.18 0.23 0.13 0.04 1.00OD −0.05 0.08 0.73 −0.63 0.02 0.56 0.53 0.68 0.68 0.51 0.47 0.64 0.19 0.07 1.00OD −0.09 −0.09 0.65 −0.62 0.16 0.47 0.43 0.60 0.64 0.55 0.50 0.57 0.27 0.04 0.84 1.00

old values are coefficients higher or equal to than 0.7, italic values are higher or equal to than 0.5.

P.R. Kannel et al. / Analytica Chimica Acta 582 (2007) 390–399 397

Table 5Factor loadings (varimax rotation)

Parameters VF1 VF2 VF3

Water temperature 0.002 0.908 −0.205pH −0.241 −0.070 0.937EC 0.966 0.094 0.020DO −0.890 −0.332 0.246TSS 0.434 0.750 0.230Ca 0.816 0.053 0.534Mg 0.910 −0.022 0.277Talk 0.969 0.092 0.148Chloride 0.948 0.171 −0.161PO4P 0.956 0.104 −0.160TP 0.950 0.119 −0.165NH3N 0.946 0.141 −0.247NO3N 0.845 0.223 −0.033NO2N 0.848 0.181 0.007BOD 0.917 0.285 −0.240COD 0.929 0.249 −0.213

Eigenvalue 11.686 1.722 1.228V

cTF

v(weNoNsiahoc[twtow

shaCoc

tttc

Ft

lt

ptlod

rukrTTK

Trtchtatwtries including sediment flow from catchment and bank erosions.

ariance 73.035 10.761 7.676

orrelations between EC and Talk, BOD and COD, CL and EC.he redundancy of information suggests the application of theA in order to reduce the dimensionality of the dataset.

With the eigenvalue criteria (eigenvalue > 1), the FA witharimax rotation (with mean data) resulted in three varifactorsVFs) comprised of 91.47% total variance (Table 5). Factor 1,ith the highest grouping power, is highly correlated to param-

ters: EC, DO, Ca, Mg, Talk, CL, PO4P, TP, NH4N, NO3N,O2N, BOD and COD. It, thus, describes the chemical factorf pollution. DO is negatively correlated as BOD, COD, NO3N,H4N are oxidized in expense of dissolved oxygen. The main

ources of NH4N, PO4P, TP and CL are domestics and munic-pal wastewaters. While, the sources of NO3N, NO2N, Ca, Mgre anthropogenic as well as runoff from agricultural fields. Theigh levels of BOD and COD at urban sites (Table 2) indicaterganic contamination mainly from municipal wastewater. Theontribution from industries is only 7% [40]. Ha and Pokhrel22] in a study at the Gaurighat watershed area (area above sta-ion R4), reported that the contribution of industries is 5% BODith no nutrients. This is further evident from the high correla-

ion between BOD and COD (Table 4). There are almost absentf treatment facilities in the valley except one, which treats theastewater covering the area above the station R4.Factor 2 correlated with water temperature and suspended

olids, is physical factor and indicates their origin as run-off withigh load of solids and wastes from agricultural fields, domesticsnd municipalities. Factor 3 correlates with two parameters (pH,a) and can be considered as a physicochemical factor, the originf which is likely from dissolution of soil constituents mainlyarbonates.

The pollution sources in the river system are identified byhe representation of the factor scores versus monitoring loca-

ions (Fig. 5). The high scores correspond to high influence ofhe factor on the sampling sites. Factor score 1 (corresponds tohemical factor) is crucial for the condition of the monitoring

Fil

ig. 5. Factor scores (with varimax rotation) for Box–Cox transformed moni-oring data.

ocations in the medium pollution region (MP) and high pollu-ion region (HP).

The regions HP and MP located in the city area are mainlyolluted by human wastes. The human generated pollution inhe municipalities has no options to treat on-site due to limitedand and it reaches the river system very quickly due to the lackf the sewage and municipal wastewater treatment facility, openefecation on riverbanks, etc. [22].

Further, the tributary stations T3, T4, T5 (cluster III) and mainiver stations R7, R8, R9 (cluster III) are loaded with higher val-es of factor score 1. The reason is that the tributaries Dhobihola (T3), Tukucha khola (T4), Bishnumati khola (T5) car-ies heavily polluted water. The highest factor score at station4 suggests that it is the most polluted station in the tributaryukucha khola, which flows entirely through the heart of the cityathmandu.The score factors are lowest at the stations R1, R2, R3,

2, T6, T7 (cluster I), indicating comparatively least pollutedegion. There is increased loading of factor 2 (water tempera-ure and TSS), in general, as the river moves downstream in theity area accumulating pollutions. It is to be noted that there isigher inflow of suspended solids as the river meets the tribu-aries Manahara (T2), Dhobi khola (T3), Bishnumati khola (T5)nd Nakhu khola (T6) (see Table 2). The rise in temperatureogether with increased suspended solids indicates towards theastewater inflow through domestics, municipalities and indus-

actor score 3 (corresponding to pH, Ca) identifies the point T6n the tributary Nakkhu khola that flows through the areas withimestone. There exists a cement factory in between station R10

3 Chimi

amw

4

tdtgI(rmrptmalnsTfqso

A

Tm

R

[

[

[

[[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

98 P.R. Kannel et al. / Analytica

nd Nakkhu khola–Bagmati River junction. Extensive limestoneining activities exists at upstream region of the Nakkhu kholaatershed.

. Conclusions

The chemometric analysis confirmed the classification forhe measurement results of the water quality variables measureduring the period 1999–2003 in the Bagmati River basin withinhe Kathmandu valley. The chemometric analysis, using CA,rouped the 17 monitoring locations in three regions: clusters(low pollution), cluster II (medium pollution) and cluster III

high pollution), and was confirmed by PCA and DA. The DAesults suggested that EC, TSS, DO, NO2N, and BOD are theost significant parameters to discriminate between the three

egions. Three varifactors obtained from FA indicated that thearameters responsible for water-quality variations are relatedo chemicals (organics, dissolved oxygen, nutrients, solutes and

inerals), physical (suspended solids and water temperature)nd physicochemical/minerals. The results revealed that someocations were under the influence of highly municipal contami-ation (such as the stations located in and around city areas) andome others under the influence of minerals (such as the station6). The results showed that chemometric analysis is effective

or river water classification, and for rapid assessment of waterualities, using the representative sites, the classification coulderve to optimize cost and time without losing any significancef the outcome.

cknowledgements

This work was supported by Korea Institute of Science andechnology (South Korea), Groundwater Resources Develop-ent Project and Melamchi Water Supply Board (Nepal).

eferences

[1] Y. Ouyang, Water Res. 39 (2006) 2621.[2] D. Chapman, in: D. Chapman on behalf of UNESCO, WHO and UNEP

(Ed.), Water Quality Assessment, Chapman & Hall, London, 1992, 585 pp.[3] W. Dixon, B. Chiswell, Rewiew of aquatic monitoring program design,

Water Res. 30 (1996) 1935.[4] V.H. McNeil, M.E. Cox, M.M. Preda, Assessment of chemical water types

and their spatial variation using multi-stage cluster analysis, Queensland,Australia, J. Hydrol. 310 (2005) 181.

[5] K.P. Singh, A. Malik, S. Sinha, Water quality assessment and apportionmentof pollution sources of Gomti River (India) using multivariate statisticaltechniques: a case study, Anal. Chim. Acta 538 (2005) 355.

[6] M. Vega, R. Pardo, E. Barrado, L. Deban, Assesment of seasonal and pol-luting effects on the quality of river water by exploratory data analysis,Water Res. 32 (1998) 3581.

[7] J.Y. Lee, J.Y. Cheon, K.K. Lee, S.Y. Lee, M.H. Lee, Statistical evaluation ofgeochemical parameter distribution in a ground water system contaminatedwith petroleum hydrocarbons, J. Environ. Qual. 30 (2001) 1548.

[8] S. Adams, R. Titus, K. Pietesen, G. Tredoux, C. Harris, Hydrochemicalcharacteristic of aquifers near Sutherland in the Western Karoo, South

Africa, J. Hydrol. 241 (2001) 91.

[9] D.A. Wunderlin, M. Diaz, M.M.V. Ame, S.F. Pesce, A.C. Hued, M. Bistoni,Patern recognition techniques for the evaluation of spatial and temporalvariations in water quality. A case study: Suquia River basin (Cordoba-Artgentina), Water Res. 35 (2001) 2881.

[

[

ca Acta 582 (2007) 390–399

10] R. Reghunath, T.R.S. Murthy, B.R. Raghavan, The utility of multivariatestatistical techniques in hydrogeochemical studies: an example from Kar-nataka, India, Water Res. 36 (2002) 2437.

11] S.D. Brown, R.K. Skogerboe, B.R. Kowalski, Pattern recognition assess-ment of water quality data: coal strip mine drainage, Chemosphere 9 (1980)265.

12] M.M. Morales, P. Marti, A. Llopis, L. Campos, S. Sagrado, Anal. Chim.Acta 394 (1999) 109.

13] E. Perona, I. Bonilla, P. Mateo, Sci. Tot. Environ. 241 (1999) 75.14] S. Shrestha, F. Kazama, Assessment of surface water quality using multi-

variate statistical techniques: a case study of the Fuji River basin, Japan,Env. Modell. Soft. 22 (2007) 464–475.

15] C. Reimann, P. Filzmoser, Normal and lognormal data distributionin geochemistry: derath of myth. Consequences for statistical treat-ment of geochemical and environmental data, Environ. Geol. 39 (1999)1001.

16] G.E.P. Box, D.R. Cox, An analysis of transformations, J. Roy. Stati. Soc.Ser. B 26 (1964) 211.

17] C.S. Zhang, O. Selinus, J. Schedin, Statistical analyses on heavy metalcontents in till and root samples in an area of southeastern Sweden, Sci.Total Environ. 212 (1998) 217.

18] L.J. Edwards, A.A. Hamilton, Errors-in-variables and the Box–Cox trans-formation, Comput. Stat. Data Anal. 20 (1995) 131.

19] MPPW. Optimizing water use in Kathmandu Valley Project, Ministry ofPhysical Planning Works, HMG/Nepal, Draft Final Report, ADB TA 3700NEP, 2003.

20] CBS, A compendium on environment statistics Nepal, Central Bureau ofStatistics, Kathmandu, Nepal, 1998.

21] MHPP, Bagmati Basin Water Management Strategy and Investment Pro-gram, His Majesty Government of Nepal, Ministry of Housing and physicalplanning, Kathmandu, Nepal, 1994.

22] S.-R. Ha, D. Pokhrel, Water quality management planning zone develop-ment by introducing a GIS tool in Kathmandu Valley, Nepal, Water Sci.Technol. 44 (2001) 209.

23] APHA-AWWA-WPCF, Standard Methods for Examination of Water andWastewater, 19th ed., American Public Health Association, AmericanWater Works Association, Water Pollution Control Federation. WashingtonDC, 1995.

24] USGS, Methods for Collection and Analysis of Water Samples for Dis-solved Minerals and Gases. Techniques of Water-Resources Investigations,U.S. Geological Survey, Washington, DC, 1974.

25] Statgraphics, Statgraphics Centurion XV, 2006. http://www.statgraphics.com/.

26] P.A. Rogerson, Statistical Methods for Geography, Sage Publications, Lon-don, 2001.

27] T. Kowalkowski, R. Zbytniewski, J. Szpejna, B. Buszewski, Applica-tion of chemometrics in river water classification, Water Res. 40 (2006)744.

28] M.J. Adams, The principles of multivariate data analysis, in: P.R. Ashurst,M.J. Dennis (Eds.), Analytical Methods of Food Authentication, BlackieAcademic & Professional, London, UK, 1998, p. 350.

29] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis,3rd ed., Prentice Hall, Englewood Cliffs, NJ, 1992, 642 pp.

30] J.E. McKenna Jr., An enhanced cluster analysis program with bootstrapsignificance testing for ecological community analysis, Environ. Modell.Software 18 (2003) 205.

31] M.A.S. Graca, C.N. Coimbra, The elaboration of indices to assess biolog-ical water quality: a case study, Water Res. 32 (1998) 380.

32] B. Helena, R. Pardo, M. Vega, E. Barrado, J.M. Fernandez, L. Fernan-dez, Temporal evolution of groundwater composition in an alluvial aquifer(Pisuerga River, Spain) by principal component analysis, Water Res. 34(2000) 807.

33] G.N. Chen, Assessment of environmental water with fuzzy cluster analysis

and fuzzy recognition, Anal. Chim. Acta 271 (1993) 115.

34] J.E. Jackson, A Users Guide to Principal Components, Wiley, New York,1991.

35] R.R. Meglen, Examining large databases: a chemometric approach usingprincipal component analysis, Mar. Chem. 39 (1992) 217.

Chimi

[

[

[

[

P.R. Kannel et al. / Analytica

36] K.P. Singh, A. Malik, D. Mohan, S. Sinha, Multivariate statistical tech-niques for the evaluation of spatial and temporal variations in water qualityof Gomti River (India): a case study, Water Res. 38 (2004) 3980.

37] J.S. Camara, M.A. Alves, J.C. Marques, Multivariate analysis for the clas-

sification and differentiation of Madeira wines according to main grapevarieties, Talanta 68 (2006) 1512.

38] MOPE, State of the Environment, Ministry of Population and Environment(MOPE), Kathmandu, His Majesty’s Government, Kathmandu, Nepal,2000.

[

[

ca Acta 582 (2007) 390–399 399

39] UNPDC, Final report of Conservation and Development MasterPlan for Bagmati, Bishnumati and Dhobikhola River Corridors,United Nation Development Committee (UNPDC), Kathmandu, Nepal,1999.

40] S.R. Devkota, C.P. Neupane, Industrial pollution inventory of the Kath-mandu Valley and Nepal, Kathmandu: Industrial Pollution Control Man-agement Project, (HMG/MOI/UNIDO/91029), 1994.

41] MOPE, ICIMOD, SACEP, NORAD, UNEP, State of the Environment:Nepal, 2001, 181 pp.