Representing soil pollution by heavy metals using continuous limitation scores

14
www.elsevier.com/locate/cageo Author’s Accepted Manuscript Representing soil pollution by heavy metals using continuous limitation scores Marija Romi´ c, Tomislav Hengl, Davor Romi´ c, Stjepan Husnjak PII: S0098-3004(07)00090-8 DOI: doi:10.1016/j.cageo.2007.05.002 Reference: CAGEO 1831 To appear in: Computers & Geosciences Received date: 23 April 2005 Revised date: 11 October 2006 Cite this article as: Marija Romi´ c, Tomislav Hengl, Davor Romi´ c and Stjepan Husnjak, Representing soil pollution by heavy metals using continuous limitation scores, Computers & Geosciences (2007), doi:10.1016/j.cageo.2007.05.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Transcript of Representing soil pollution by heavy metals using continuous limitation scores

www.elsevier.com/locate/cageo

Author’s Accepted Manuscript

Representing soil pollution by heavy metals usingcontinuous limitation scores

Marija Romic,Tomislav Hengl, Davor Romic, StjepanHusnjak

PII: S0098-3004(07)00090-8DOI: doi:10.1016/j.cageo.2007.05.002Reference: CAGEO 1831

To appear in: Computers & Geosciences

Received date: 23 April 2005Revised date: 11 October 2006

Cite this article as: Marija Romic, Tomislav Hengl, Davor Romic and Stjepan Husnjak,Representing soil pollution by heavy metals using continuous limitation scores, Computers& Geosciences (2007), doi:10.1016/j.cageo.2007.05.002

This is a PDF file of an unedited manuscript that has been accepted for publication. Asa service to our customers we are providing this early version of the manuscript. Themanuscript will undergo copyediting, typesetting, and review of the resulting galley proofbefore it is published in its final citable form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers that applyto the journal pertain.

Accep

ted m

anusc

ript

Representing soil pollutionbyheavymetals using continuous

limitation scores

Marija Romic a,1 Tomislav Hengl b Davor Romic a Stjepan Husnjak a

aFaculty of Agriculture, Svetosimunska 25, 10000 Zagreb, Croatia

bEuropean Commission, Directorate General JRC, Institute for Environment and Sustainability, TP 280, Via E. Fermi 1,

I-21020 Ispra (VA), Italy

Abstract

The paper suggests a methodology to represent overall soil pollution in a sampled area using continuous limitation scores.

The interpolated heavy metal concentrations are first transformed to limitation scores using the exponential transfer function

determined by using two threshold values: permissible concentration (0 limitation points) and seriously polluted soil (4

limitation points). The limitation scores can then be summed to produce the map of cumulative limitation scores and visualize

the most critically polluted areas. The methodology was illustrated using the 784 soil samples analyzed for Cd, Cr, Cu, Ni,

Pb and Zn in the central region of Croatia. The samples were taken at 1Ö1 and 2Ö2 km grids and at fixed depths of 20 cm.

Heavy metal concentrations in soil were determined by ICP-OES after microwave assisted aqua regia digestion. The sampled

concentrations were interpolated using block regression-kriging with geology and land cover maps, terrain parameters and

industrialization parameters as auxiliary predictors. The results showed that the best auxiliary predictors are geological map,

ground water depth, NDVI and slope map and distance to urban areas. The spatial prediction was satisfactory for Cd, Ni, Pb

and Zn, and somewhat less satisfactory for Cu and Cr. The final map of cumulative limitation scores showed that 33.5% of the

total area is suitable for organic agriculture and 7.2% of the total area is seriously polluted by one or more heavy metals. This

procedure can be used to assess suitability of soils for agricultural production and as a basis for possible legal commitments

to maintain the soil quality.

Key words: heavy metal concentrations, regression-kriging, limitation scores, spatial planning, GIS

Submitted on 23 April 2005 to Computers and Geosciences,

special issue Geostats-UK conference 2005, Belfast; first revi-

sion on 21 March 2006; second revision on 11 October 2006;

1 Tel.: +385-1-2394014; fax: +385-1-2315300. E-mail ad-

dress: [email protected]

1 Introduction 1

The problem of soil pollution by heavy metals has been re- 2

ceiving an increasing attention in the last few decades. In 3

Europe, decision makers and spatial planners more and more 4

require information on soil quality for different purposes: to 5

locate areas suitable for organic (ecologically clean) farming 6

and agro-tourism; to select sites suitable for conversion of 7

agricultural to non-agricultural land, particularly for urban- 8

Preprint submitted to Computers and geosciences 12 October 2006

Accep

ted m

anusc

ript

ization; setting up protection zones for groundwater pumped1

for drinking water; to estimate costs of remediation of con-2

taminated areas and similar. Heavy metals occur naturally3

in rocks and soils, but increasingly higher quantities of them4

are being released into the environment by anthropogenic5

activities. Every decision on the application of any measures6

in the environment relating to soil quality and management,7

whether statutory regulations or practical actions, must be8

based on reliable and comparable data on the status of this9

part of environment in the given area. Various aspects must10

be considered by the society to provide a sustainable en-11

vironment, including a soil clean of heavy metal pollution.12

The first among them is to identify environments (or areas)13

in which anthropogenic loading of heavy metals puts ecosys-14

tems and their inhabitants at health risk. Maps indicating15

areas with pollution risks can provide decision-makers or lo-16

cal authorities with critical information for delineating areas17

suitable for the planned land use or soil clean up (Van der18

Gaast et al., 1998; Broos et al., 1999). Maximum permissi-19

ble concentrations of heavy metals in soil are now regulated20

by law in many countries.21

Before any solution for the problem of soil heavy metal pollu-22

tion can be suggested, a distinction needs to be made between23

natural anomalies and those resulting from human activities.24

Namely, it often happens that also natural concentrations25

and distribution of potentially toxic metals could present26

health problems, like in the case of chromium, cobalt, and27

particularly nickel in ultramafic soils (Proctor and Baker,28

1994). Rock type and geological-geochemical processes can29

change markedly in a relatively small area, resulting in great30

spatial variability in the soil content of elements. Soils in the31

vicinity of urban areas and industry are exposed to input of32

potentially toxic elements, and the situation of agricultural33

soils gets additionally complicated due to continuous appli-34

cation of agrochemicals.35

In practice, soil pollution by heavy metals is commonly as-36

sessed by interpolating concentrations of heavy metals sam-37

pled at point locations, so that each heavy metal is repre-38

sented in a separate map (Webster and Oliver, 2001; Juang39

et al., 2003). The first problem of working with maps of sepa-40

rate heavy metal concentrations (in further text HMCs)41

is that the limiting values for polluted soils are commonly42

set as crisp boundaries. For example, a soil is polluted by 43

zinc and not suitable for organic agriculture if the measured 44

values are larger than 150 mg kg−1 (Official Gazette, 2001). 45

This means that a soil with zinc concentration of 149 mg kg−146

and a soil with a concentration of 151 mg kg−1 will be clas- 47

sified differently although the difference may be due to the 48

measurement or interpolation error. Similarly, if the concen- 49

tration of zinc at a location is 151 mg kg−1 and at neighbor- 50

ing location 300 mg kg−1, both locations will be classified as 51

not suitable although the latter shows two times higher con- 52

centration. The second problem with HMCs is that different 53

elements come in different ranges of values. This makes it 54

fairly difficult to get the picture about the overall soil qual- 55

ity. For example the threshold value for zinc is 150 mg kg−156

and for cadmium 0.8 mg kg−1. If we measure, at a point, 57

values Zn=130 (suitable) and Cd=1.1 (not suitable), this 58

makes this location unsuitable but how serious is the prob- 59

lem? Now imagine a case with tens of HMCs — how can 60

we sum these values to get the compound picture about the 61

quality of soil? 62

To solve a problem of presenting overall polluted areas, 63

Romic and Romic (2003) applied factor analysis prior to in- 64

terpolation and then interpolated only the first factor fac- 65

tor indicating anthropogenic loads of heavy metals. Van der 66

Gaast et al. (1998) used maps of background values of soil 67

contaminants focusing on the 90-percentiles. Hanesch et al. 68

(2001) tested fuzzy classification algorithms to distinguish 69

different sources of pollution. Amini et al. (2004) classified 70

HMCs using unsupervised fuzzy k-means to partition the 71

values optimally. The final outputs are maps of memberships 72

to each cluster, which commonly reflect the combination of 73

most correlated heavy metals. In all these examples the pro- 74

cedures are statistically valid, but the meaning of such fac- 75

tors and continuous memberships is hard to interpret. In 76

practice, decision makers usually only wish to see the areas 77

that are polluted without any training in (geo)statistics. 78

In this paper, we propose an approach to interpolate sampled 79

heavy metal concentrations using numerous environmental 80

predictors and then represent the overall pollution by using 81

the continuous limitation scores. We advocate the use of 82

cumulative limitation scores because they can be summed 83

and used to represent areas of overall high pollution. Such 84

2

Accep

ted m

anusc

ript

visualizations can supplement maps of separate HMCs so1

that the end-users can more easily delineate areas of high2

overall pollution and focus their actions where their are more3

needed.4

2 Materials and methods5

2.1 Spatial interpolation6

For spatial interpolation of HMCs we used the regression-7

kriging (Odeh et al., 1995), also known as Universal krig-8

ing (Webster and Oliver, 2001) or Kriging with External9

Drift (Goovaerts, 1997) (see also the article on Regression-10

kriging published in the same issue of this journal). This11

technique is especially attractive as it can employ both our12

empirical knowledge about the distribution of HMCs and13

the spatial autocorrelation between the point samples. It14

will also minimize the artificial point patterns in the fi-15

nal predictions typical for plain kriging techniques (Juang16

et al., 2003). We used the generic framework suggested by17

Hengl et al. (2004), which requires several processing steps18

in R (http://r-project.org), Integrated Land and Water19

Information System (ILWIS) (see (Unit Geo Software De-20

velopment, 2001) and http://itc.nl/ilwis/) and GSTAT21

(http://gstat.org) packages. The first step is the logit-22

transformation of all target variables (in this case HMCs),23

which will (in most cases) ensure the normality of residuals.24

The original, target parameter (z) is first transformed to a25

relative (indicator) variable by:26

z++ = ln

�z+

1− z+

�; 0 < z+ < 1 (1)

where z+ is the target variable standardised to the 0 to 127

range:28

z+ =z − zmin

zmax − zmin; zmin < z < zmax (2)

and zmin and zmax are the physical minimum and maximum29

of z. This means that all new predicted values will be in-30

between these two thresholds. So if we measured concentra-31

tion of Zn = 88, the indicator variable is (88 − 0)/1000 =32

0.088 and the logit-transform is logit(Zn) = −2.338. In this 33

case, the lower threshold (0) is the physical limit of the val- 34

ues, while the upper threshold (1000) is an arbitrary num- 35

ber. Setting up these thresholds prevents from making pre- 36

dictions that do not have a physical meaning (e.g. negative 37

values). 38

The advantage of transforming the original heavy metal con- 39

centrations is that they can now be all set to a common 40

range (e.g. -10.000 to 10.000), which also means that the re- 41

sults of the statistical analysis can be directly compared. For 42

example, variograms of several parameters can be displayed 43

together (see later Figure 4). This would not be possible if 44

the original values were used as the range of values can be 45

very different. For example, for Cd the standard deviation 46

is 0.34 and for Cr it is 15.9 or fifty times larger. 47

After the transformation, the HMCs are estimated using the 48

regression-kriging model: 49

z(s0) = qT0 · β + λT

0 · e (3)

where q0 is a vector of p + 1 predictors at s0, β is a vector 50

of p + 1 estimated drift model coefficients, λ0 is a vector of 51

n kriging weights and e is a vector of n residuals. 52

The prediction accuracy of our interpolation technique is 53

commonly analyzed using two measures — mean prediction 54

error (MPE): 55

MPE =1

lXj=1

[z(sj)− z∗(sj)] (4)

and the root mean square prediction error (RMSPE): 56

RMSPE =

vuut1

lXj=1

[z(sj)− z∗(sj)]2 (5)

both calculated at validation points (z∗(sj)), where l is the 57

number of validation points. For detailed instructions on 58

how to run regression-kriging see also the attached article on 59

regression-kriging published in the same issue of this journal. 60

3

Accep

ted m

anusc

ript

After the regression modelling in R, we fitted the vari-1

ogram models of residuals using automatic fitting options2

in GSTAT. In all cases, we used the exponential model and3

initial variogram with nugget parameter=0, sill parame-4

ter=sampled variance and range=10% of spatial extent of5

the data. Once we determined the most significant predic-6

tors, estimated the regression model and the variogram of7

residuals, we created GSTAT scripts to produce the final8

predictions. Note that GSTAT, in fact, implements the so9

called “kriging with external drift” approach, which is com-10

putationally slower but gives the same predictions. Finally,11

the fitted HMCs were back-transformed to the original scale12

by:13

z(s0) =ez++(s0)

1 + ez++(s0)· (zmax − zmin) + zmin (6)

In practice, the regression-kriging consists of the four main14

steps. After the estimation of regression coefficients, the15

trend can be fitted by using a map calculation in ILWIS:16

Zn t REG = -1.5812 + 0.0197*SLOPE + 0.0001*URBAN -

0.0001*ROADS - 0.0225*WINDE90 - 0.0423*WINDE180

+ 0.0293*WINDE225 - 0.0422*WINDE270 -

0.029*WINDE360 + 0.1646*GEO03 + 0.1178*GEO04

- 0.147*GEO07 - 0.2396*GEO08 - 0.1602*GEO09 -

0.409*GEO11 + 0.2147*GEO12

17

where Zn t REG is the fitted trend on the transformed vari-18

able and SLOPE, URBAN, WINDE90 etc. are different predictors19

(raster maps). The second step is to derive residuals and20

then fit variograms. The third step is to make predictions21

of values at all locations using universal kriging in GSTAT22

(Pebesma, 2004). Finally, the fitted trend and residuals can23

be summed and back-transformed using the following com-24

mand:25

Zn RK.mpr = iff(isundef(MASK), ?, exp(Zn t REG +

RES Zn t)/ (1 + exp(Zn t REG + RES Zn t))*1000)26

where RES Zn t are the interpolated residuals using regres-27

sion kriging, Zn RK is the final prediction map based on the28

regression-kriging and MASK is the map used to mask out only29

the areas of interest. In this case, only the agricultural soils 30

have been sampled and analyzed. 31

2.2 Continuous limitation scores 32

Traditionally, suitability maps are derived as Boolean maps 33

(yes or no) where none of the dangerous HMCs exceeds the 34

threshold value (see Table 1). In ILWIS, such a spatial query 35

would look like this: 36

SUITABLE{dom=Bool} = iff((Cd RK>0.8) OR

(Cr RK>50) OR (Cu RK>50) OR (Ni RK>30) OR

(Pb RK>50) OR (Zn RK>150), 0, 1)

37

This means that only the areas that do not exceed ANY of 38

the given thresholds can be considered as being suitable for 39

agricultural production. Here the problem is obviously that 40

the intensity of pollution within the polluted areas is un- 41

known. Our approach is somewhat different in the sense that 42

we also want to spatially represent the overall soil pollution. 43

For this we use the concept of limitation scores. 44

Table 1

Transformation coefficients calculated for given threshold

concentrations. X1 — maximum concentration of contami-

nant to maintain multifunctionality, X2 — serious soil pol-

lution. Official threshold levels used in Croatia.

X1 X2 ln(b0) b1

mg kg−1 mg kg−1

Cd 0.8 2 0.392 1.756

Cr 50 100 -9.083 2.322

Cu 50 100 -9.083 2.322

Ni 30 60 -7.897 2.322

Pb 50 150 -5.731 1.465

Zn 150 300 -11.634 2.322

After the HMCs have been interpolated, they can be con- 45

verted to limitation scores, which will then allow us to sum 46

different maps of HMCs. Such a scoring system is often used 47

in land evaluation studies (Triantafilis et al., 2001). For each 48

4

Accep

ted m

anusc

ript

evaluation parameter, thresholds and limitation scores are1

predefined and then can be implemented for the whole area.2

For example, a slope map is typically used to give suitability3

scores to a certain area. Triantafilis et al. (2001) assigned to4

each slope class a limitation score based on some empirical5

rules: 0 for 0-2% slope class, 1 for 2-8%, 3 for 9-16%, 9 for6

17-25% and 27 for slopes >25%. Note that in this case the7

limitation scores increase exponentially with the increase of8

slope. Although the slope difference between the second and9

third class is only two and half times, the third class gets10

three times more limitation points.11

We propose here that the limitation scores, instead of making12

classes of HMCs, can be derived directly by using a simple13

transfer function that converts HMCs directly to limitation14

scores (LS). A flexible transfer function, also used in this15

paper, is the exponential:16

LS =

8>><>>:

b0 ·HMC b1 − 1 if HMC ≥ X1

0 if HMC < X1

(7)

where LS are the limitation scores, b0 and b1 are the coef-17

ficients, HMC is heavy metal concentration and X1 is the18

permissible or baseline concentration. An example of how19

HMCs are transformed to limitation scores can be seen in20

Figure 1. In this case we also assume that the cost of re-21

mediation increases exponentially with HMC. The b0 and b122

coefficients can be estimated by solving the linear regression23

model:24

ln(LS + 1 ) = ln(b0) + b1 · ln(HMC ) (8)

In this case we used three known points to estimate the25

two unknowns. For example, for Cr we used LS + 1=026

for HMC=0, LS + 1=1 for HMC=50 and LS + 1=5 for27

HMC=100. The first threshold (50) for Cr is the permissible28

concentration and the second threshold (100) is the criti-29

cal concentration that classifies this soil as being polluted.30

After we determine the coefficients of the transfer function31

for each HMC, we can directly derive limitation scores in32

ILWIS using:33

LS Zn{dom=value;vr=0.0:50.0:0.1}=iff(Zn RK<150,

0, exp(-11.634+2.322*ln(Zn RK))-1)34

where LS Zn is the derived map with limitation scores for 35

Zn and Zn RK is the Zn concentration interpolated using 36

regression-kriging. 37

Fig. 1. Transforming HMCs to limitation scores — the func-

tion is determined by two thresholds: permissible or baseline

concentration (0) and serious soil pollution (4).

2.3 Study area and sampling methods 38

The study area includes Zagreb city and the surrounding 39

county in north-western Croatia(E15°20’–16°44’, N45°25’– 40

46°05’) (Figure 2). The region exhibits a variety of soils de- 41

veloped on diverse lithologies. The main bodies of the moun- 42

tain ranges (Zumberak, Medvednica and some other smaller 43

ranges) consist mainly of Palaeozoic and Mesozoic rocks 44

— para-metamorphic rocks, ortho-metamorphic rocks, inge- 45

nious rocks and clastic sedimentary rocks. A major portion 46

of the region consists of Tertiary rocks — limestones, marls, 47

clastites, igneous rocks and Quaternary rocks — mostly al- 48

luvium sediments of the rivers and their tributaries (Miko 49

et al., 2001). The dominant land use systems in the area are 50

cultivations of mostly corn and vegetables in the floodplain 51

region and vineyards, orchards and partly pastures, in the 52

hilly part. 53

5

Accep

ted m

anusc

ript

The data set used is a part of the multi-element geochemi-1

cal mapping project covering 3700 km2 of agricultural land2

in Zagreb region, Croatia. The basic sampling grid was a3

square mesh with sampling points at intervals of 2Ö2 km,4

and 1Ö1 km in the area of higher urbanization. By doubling5

the inspection density we wanted to reduce spatial prediction6

error in areas where we expected higher values. A total of 7847

topsoil samples were collected. We used composite samples8

made up of 10 increments collected from the soil upper 20 cm9

in a cross pattern, with a 5 m distance between increments.10

Site descriptions were registered at the time of sampling to11

record the sample location in relation to land use and ma-12

jor environmental features. Soil samples were digested with13

aqua regia in accordance with the HR ISO 11466 procedure14

at the Analytical laboratory of the Faculty of Agriculture,15

University of Zagreb. Heavy metal concentrations (Cd, Cr,16

Cu, Ni, Pb and Zn) in soil extracts were determined by induc-17

tively coupled plasma optical emission spectrometry (Vista18

MPX AX, Varian). The choice of the method was dictated19

by Croatian Government regulations, which define the limit20

values of potentially toxic substances (including trace met-21

als) in agricultural soils (Official Gazette, 1992), as well as22

soil quality criteria for organic production (Official Gazette,23

2001).24

For this study area, we assumed that the distribution of25

HMCs is systematic, i.e. controlled by the environmental26

and anthropogenic factors. Indeed, Romic and Romic (2003)27

already showed that the distribution of HMCs in part of28

the study area is primarily controlled by: (a) geology, (b)29

industrial impact — traffic, heating plants, chemical industry30

and airports and (c) external factors — some heavy metals31

are brought by the Sava River, which has been exposed to32

intensive pollution by mining, industries and cities in recent33

history. A portion of heavy metals is wind-blown from the34

industrial region of north Italy (Antonic and Legovic, 2004).35

Following the empirical knowledge about the studied area,36

we produced a list of potential predictors that were used37

as auxiliary data in the RK system. Eight GIS layers were38

prepared in ILWIS:39

Geological map (GEO) — This layer was produced from40

the geological map of Croatia at scale 1:100K. This com-41

prised six map sheets that were first georeferenced and 42

then converted into a polygon map in ILWIS GIS. Each 43

map unit was converted to an indicator map, which was 44

then used as a predictor. 45

Land cover map (LAND) — This layer was produced by 46

classifying a LANDSAT 7 satellite sensor image from 20th 47

March 2003. The date corresponds to the sampling period. 48

It was also selected because the vegetation was just start- 49

ing to develop so that the soil surface visibility was good. 50

The LAND map was used also to mask out areas such as 51

forests and water bodies that were not of interest for the 52

project. 53

Normalized Difference Vegetation Index (NDVI) — 54

The NDVI, derived from the same LANDSAT image, 55

reflects the actual green mass, i.e. vegetation cover. The 56

urbanized areas typically show small or even negative 57

NDVI values. 58

Depth to the water table (lnGWD) — This is an ap- 59

proximate variable and was derived in two steps: first the 60

water table elevation map was estimated by fitting a 2nd 61

degree trend function to the point map showing the wa- 62

ter table at main rivers, then the original DEM was sub- 63

stracted from this map that approximates the fall of the 64

water table. The GWD represents the flooding potential 65

of an area and and can be used in estimation of the level 66

of the ground water table. The map was log-transformed 67

to emphasize smaller depths. 68

Slope (SLOPE) — Slope was derived in ILWIS as 69

the standard terrain parameter (see http://spatial- 70

-analyst.net for terrain parameterization scripts). 71

Distance to urban areas (URBAN) — This variable 72

describes the proximity to sources of pollution. URBAN 73

was derived in two steps: first the urban areas were 74

masked out in the land cover map, then a buffer to these 75

areas was derived using the distance operation in ILWIS. 76

This means that all areas outside the urban areas received 77

a distance, while the urban areas themselves received a 78

zero value. 79

Distance to roads (ROADS) — Similar to the URBAN 80

map, this map was derived as a buffer to the road network. 81

6

Accep

ted m

anusc

ript

0 25 km

N

Croatiageological map

land cover map

distance to urban areasNDVI from Landsat imagedepth to ground water (lnGWD)

control

interpolation

Fig. 2. Location of the study area Zagreb city and Zagreb county: sampled locations used for interpolation and mask map

showing agricultural areas. The key auxiliary predictors: geological and land cover maps, distance to urban areas, NDVI and

ln of ground water depth.

Wind exposition (WINDE) — Wind exposition was1

calculated as relative slope insolation for eight positions2

(azimuths): 45◦, 90◦, 135◦, 180◦, 225◦, 270◦, 315◦ and3

360◦ using the vertical exposition angle of 5◦. A map with4

a name WINDE45 means azimuth of 45◦.5

3 Results and discussion6

3.1 Regression analysis7

The first screening of data showed that almost all HMCs8

have asymmetrical distributions, clearly shifted toward the9

lower values. After the logit transformation, the distribu- 10

tions were closer to approximately normal (Figure 3), which 11

allowed us to do further statistical analysis. This confirms 12

that logit transformation is an important step prior to actual 13

interpolation. The step-wise regression analysis in R selected 14

geological map (GEO), ground water depth (GWD), NDVI, 15

slope map (SLOPE) and distance to urban areas (URBAN) 16

as the best auxiliary predictors (Table 2). The number of pre- 17

dictors selected with step-wise filtering still remained fairly 18

large (on average, 19 out of 31). This confirmed that all pre- 19

dictors pre-selected have importance for the mapping of the 20

HMCs. 21

7

Accep

ted m

anusc

ript

The auxiliary predictors accounted for 31.5% of the total1

variability on average (Table 2). Most satisfactory was es-2

timation of Cu (R2=0.51) and Cd (R2=0.46), while some-3

what less satisfactory was estimation of Zn (R2=0.20) and Cr4

(R2=0.17). Interpolated maps of HMCs resemble our empiri-5

cal knowledge. Especially the floodplains and lowest terraces6

of the Sava river were strongly correlated with high concen-7

trations of Cd, Zn and Pb. This suggest that the recent sedi-8

mentation of the river deposits is the most probable cause of9

the accumulation of heavy metals (Romic and Romic, 2003).10

Emission by anthropogenic sources is especially dominant in11

southeastern part of Zagreb city (Velika Gorica). This area12

has been expanding rapidly in the last decade. Direct indus-13

trial emission is the most prominent source especially of Pb,14

Cd and Zn.15

The geological map was the most useful predictor in all cases.16

However, geological strata was not always the direct cause of17

HMCs. For example, high copper concentrations were actu-18

ally related to the land use. Hilly and mountainous regions19

in surroundings of Zagreb are geologically heterogenous. In20

the northern part, the old Paleozoic and Mesozoic moun-21

tain core comprise belts of Tertiary hills, the south-western22

part are Tertiary sediments and Pleistocene loams, forming23

well-protected, amphitheatre-shaped areas. These locations24

have been occupied almost exclusively by vineyards by many25

decades. Accumulation of copper in the vineyard soils is the26

most common effect of continuing protection of grapevine27

against fungal diseases.28

In the case of Cd, the strong correlation was probably due29

to the clear connection between Cd and geological material.30

Several carbonate soils developed on limestone contain also31

rather high cadmium concentrations. Romic et al. (2004)32

studied origin and preferential feature of metal retention in33

the vineyard topsoil of NW Croatia using multivariate statis-34

tics and pointed out the importance of CaCO3 for cadmium35

retention in soil.36

3.2 Geostatistical analysis37

Spatial autocorrelation of residuals was distinct in all cases38

except for the Cu. The variogram of Cu showed the pure39

nugget effect (Goovaerts, 1997), which means that the resid-40

Fig. 3. Histrogram of HMC (Cd) before and after logit trans-

formation. The logit transformation ensures the normality

of the target variable, which is an important prerequisite for

regression-kriging.

uals were practically uncorrelated (pure regression analysis 41

is sufficient). We already mentioned that in part of the area 42

the origin of Cu can be related to the copper-based fungi- 43

cides, i.e. land use system (vineyards). In this case, plots are 44

very irregularly placed so that it is hard to detect any spa- 45

tial correlation at this working scale (2Ö2 km grids). The 46

average distance at which we measured spatial correlation 47

ranged from 2 (Cd) to 10 km (Pb), or about 6 km on average 48

(Table 2). The automatically fitted variograms in GSTAT 49

can be seen in Figure 4. The shape of the fitted variogram 50

gives us an idea about the speed and intensity of horizon- 51

tal diffusion of HMCs in an open environment: in this case 52

Pb diffuses faster (longer range), while Cu does not seem to 53

diffuse at all (pure nugget effect). 54

Because the short range (nugget) variation was rather high, 55

we have decided to use the block-kriging option in GSTAT to 56

derive final predictions. We have set the block size at 100 m to 57

correspond to the output grid size (Hengl, 2006). Note that 58

the block-kriging does not give much different output than 59

punctual estimates. However, it has a powerful property to 60

adjust for the local outliers (usually very high HMCs) and 61

the final prediction error (UK variance) will be much lower, 62

i.e. more precise (Hengl, 2006). 63

8

Accep

ted m

anusc

ript

0.8 100

100 70

50

6.0

150

0.4 60

55 40

30

3.0

100

0.0 20

10 10

10

0.0

50

-1mg kg

-1mg kg

-1mg kg -1mg kg

-1mg kg -1mg kg

Cd Cr

Cu Ni

Pb

LS

(a)

(b)

Zn

Fig. 5. (a) Interpolated maps for Cd, Cr, Cu, Ni, Pb and Zn. Masked areas (white) are forests and water bodies. (b) Map of

cumulative limitation scores showing overall soil pollution. ** PRINT IN COLOR **

3.3 Final maps of heavy metals1

Spatial prediction, when cross-checked at control points, was2

satisfactory for Cd, Ni, Pb and Zn, and somewhat less satis-3

factory for Cu and Cr. In all cases, the RMSPE at 50 control4

points did not exceed 80% of the original variation of the5

heavy metals (Table 2). In average, the RMSPE at control 6

points was 56% of the total variation (STD) of the HMCs. 7

Interpolated maps for Cd, Cr, Cu, Ni, Pb and Zn can be seen 8

in Figure 5a and the summary map of overall pollution in 9

Figure 5b. In this case, the reddish areas (high values) indi- 10

cate problematic areas where the concentrations are either 11

9

Accep

ted m

anusc

ript

Fig. 4. Variograms estimated for residuals after transforma-

tion and regression analysis. Because logit transforms are

used, all HMCs are at same scale so the variograms can be

compared directly. In this case Cd is correlated at shorter

and Pb at longer distances, Cu is not correlated at all.

above the recommended limit, or above the critical limit. At1

first look, it appears that there is strong correlation between2

the distributions of heavy metals. Especially distributions3

of Pb, Zn and Ni seem to be spatially correlated. Further4

statistical analysis of interpolated maps confirmed that all5

HMCs are correlated with correlation coefficient (r) ranging6

from 0.24 to 0.72. The most strongly correlated metals in7

soil were Zn and Ni (r = 0.72), Ni and Cd (r = 0.71) and Zn8

and Pb (r = 0.70). Cr and Cu are least strongly correlated9

with other heavy metals.10

The final suitability map (Figure 6a) shows that 33.5% of11

the total area is suitable for organic agricultural production.12

These are the areas where none of the HMCs exceeds the13

permissible concentration (> X1). On the other hand, 7.2%14

of the total area is critically polluted by one or more heavy15

metals (any of HMCs > X2). This leaves about 59.3% of16

the area as marginally suitable soil for organic agriculture.17

This map can be compared with the representation of the18

cumulative limitation scores, which shows much greater de-19

tail and contrast (Figure 6b). Clearly, the location around20

the city of Velika Gorica, close to the Zagreb airport, and the21

areas where the vineyards are located are the most critically22

polluted areas.23

0 25 km

6.0

3.0

0.0

Not suitable

Suitable

LS

(a)

(b)

Fig. 6. Comparison of crisp and continuous interpretation

maps: (a) Boolean map showing locations suitable for organic

agricultural production; (b) cumulative limitation scores.

4 Discussion and Conclusion 24

The developed procedure for geostatistical analysis of HMC 25

data enabled us to identify a number of contamination 26

hotspots and to map the cumulative contamination by heavy 27

metals. Regression-kriging has shown to be a powerful inter- 28

polation technique because it utilizes all possible linear cor- 29

relations (with auxiliary predictors and auto-correlation). 30

An alternative to regression-kriging would be to run multi- 31

variable interpolation (all at once) on sets of HMCs, which is 32

also possible in the GSTAT package (Pebesma, 2004). This 33

might be computationally challenging because interpolation 34

of separate HMCs used in this case study lasted more than 35

several hours on a standard PC. In the current case study, 36

each parameter was evaluated and interpolated separately, 37

which allowed us to do more in-depth exploratory analysis 38

(step-wise regression). 39

An advantage of using limitation scores is that the map of cu- 40

mulative limitation scores can be directly interpreted as the 41

map of overall soil pollution. Unlike when the factors, fuzzy 42

10

Accep

ted m

anusc

ript

Table 2

Summary results for regression and geostatistical analysis of data. STD — standard deviation of the original data, m —

number of predictors selected in the step-wise regression, RSE — residual square error or remaining residuals after regression,

MPE — mean prediction error at control points, RMSPE — root mean square prediction error at control points.

Fitted using step-wise regression Residuals Control

STD m Most significant predictors R2 RSE Nugget Sill Range MPE RMSPE

Cd 0.34 28 GEO, NDVI, lnGWD 0.46 0.498 0.020 0.229 860 -0.020 0.217

Cr 15.9 15 GEO, URBAN 0.17 0.293 0.000 0.089 1152 1.6 12.8

Cu 108.4 18 GEO, WINDE180, lnGWD 0.51 0.746 0.395 0.395 – -5.0 16.1

Ni 18.1 17 GEO, SLOPE 0.28 0.432 0.092 0.202 2372 0.7 9.2

Pb 15.7 22 GEO, LAND, WINDE45 0.27 0.455 0.122 0.228 3852 1.0 9.5

Zn 32.8 15 GEO, SLOPE, WINDE270 0.20 0.341 0.080 0.119 1708 -0.5 21.7

classes or probability percentages of exceeding a threshold1

value are used to represent the pollution by heavy metals.2

Such maps can supplement the maps of separate HMCs and3

serve decision makers who require a single map representing4

amount of overall pollution. Note that the formulas in Eq. 75

can easily adopt any model between the cost and concentra-6

tion. The most important thing about the limitation scores7

is that they are standardized and can be summed for differ-8

ent HMCs. The limitation of using the scores is that the high9

overall pollution can be due to very high values of a single10

element, or due to a cumulative effect of a large number of11

HMCs (Figure 7). This means that the map of cumulative12

limitation scores should be only used to delineate the most13

critical areas, but the user then needs to return to the sep-14

arate maps of HMCs.15

Note that we did not evaluate the acidity of soils, which is16

also an important factor for the pollution of soils. Mol et al.17

(2003) showed that the mobility, i.e. bioavailability, of heavy18

metals in soil will increase as the soils become more acid. In19

our case study, most of the soils showed neutral to slightly20

acid reaction (average pH of 6.8 with std. of 1.01). In areas21

where the soil acidity is a more serious problem, it would22

be also important to map pH in soils and then convert this23

variable to limitation scores or use this information to calcu-24

late weighted limitation scores from the input concentration25

values.26

Cd CdCr CrCu CuNi NiPb PbZn Zn

(a) (b)

Fig. 7. High cumulative limitation scores can be a result of

(a) the cumulative effect of multiple elements, or (b) a single

element that shows very high values.

Our hope is that this methodological framework will open 27

new perspectives. The following step will be to think of meth- 28

ods to relate the cumulative limitation scores directly with 29

the remediation costs (Broos et al., 1999), i.e. to estimate 30

the financial losses connected with the constrained use of 31

the land. Different ratios could have been used for different 32

HMCs. A more objective approach would be to work with 33

real figures from real-life projects and then adjust the coeffi- 34

cients statistically. Another idea for future research is to use 35

magnetic susceptibility field images (Schmidt et al., 2005) 36

as predictors in mapping soil pollutants. Note that the only 37

requirement would be that such images are available for the 38

whole study area. One could also make a special case study 39

11

Accep

ted m

anusc

ript

only to observe how HMCs change at different scales, i.e. at1

different distances and with auxiliary predictors with differ-2

ent grid sizes. Our experience is that we need to consider3

building up much more detailed maps of auxiliary predic-4

tors, especially the ones related to the urbanization — maps5

of active heating plants, density of traffic etc. In addition,6

one might consider the methodology of error propagation7

(Heuvelink, 1998) to derive the composite uncertainty of the8

final soil pollution map. At the moment, multiple conditional9

simulations with such a large amount of data in GSTAT are10

almost impossible due to the computational complexity and11

size of the input maps. Geostatistical simulations would help12

us get an idea about the propagated uncertainty, but can13

also be used as an input to a more complex environmental14

data modelling.15

References16

Amini, M., Afyuni, M., Fathianpour, N., Khademi, H., Fluh-17

ler, H., 2004. Continuous soil pollution mapping using18

fuzzy logic and spatial interpolation. Geoderma 124 (3-4),19

223–233.20

Antonic, O. and Legovic, T., 1999. Estimating the direction21

of an unknown air pollution source using a digital elevation22

model and a sample of deposition. Ecological Modelling,23

124 (1): 85–95.24

Broos, M. J., Aarts, L., van Tooren, C. F., Stein, A., 1999.25

Quantification of the effects of spatially varying environ-26

mental contaminants into a cost model for soil remedia-27

tion. Journal of Environmental Management 56 (2), 133–28

145.29

Goovaerts, P., 1997. Geostatistics for Natural Resources30

Evaluation. Oxford University Press, New York, pp. 483.31

Hanesch, M., Scholger, R., Dekkers, M. J., 2001. The ap-32

plication of fuzzy c-means cluster analysis and non-linear33

mapping to a soil data set for the detection of polluted34

sites. Physics and Chemistry of the Earth, Part A: Solid35

Earth and Geodesy 26 (11-12), 885–891.36

Hengl, T., Heuvelink, G., Stein, A., 2004. A generic frame-37

work for spatial prediction of soil variables based on38

regression-kriging. Geoderma 122 (1-2), 75–93.39

Hengl, T., 2006. Finding the right pixel size. Computers &40

Geosciences, 32(9), 1283–1298.41

Heuvelink, G., 1998. Error propagation in environmental 42

modelling with GIS. Taylor & Francis, London, UK. 43

Juang, K.W., Chen, Y.S., Lee, D.Y., 2003. Using sequen- 44

tial simulation to assess the uncertainty of delineating 45

heavy-metal contaminated soils. Environmental pollution 46

127:229–238. 47

Miko, S., Halamic, J., Peh, Z., Galovic, L., 2001. Geochemical 48

baseline mapping of soils developed on diverse bedrock 49

from two regions in Croatia. Geologica Croatica 54 (1), 50

53–118. 51

Mol, G., Vriend, S., van Gaans, P., 2003. Monitoring soil acid- 52

ification. Conceptual considerations and practical solu- 53

tions based on current practice in the Netherlands. Chem- 54

ical geology 203 (1-2), 3417–3441. 55

Odeh, I., McBratney, A., Chittleborough, D., 1995. Fur- 56

ther results on prediction of soil properties from terrain 57

attributes: heterotopic cokriging and regression-kriging. 58

Geoderma 67 (3-4), 215–226. 59

Official Gazette, 1992. Regulation on protection of agricul- 60

tural land in the Republic of Croatia (in Croatian). Vol. 61

No. 15/92. Narodne novine, Zagreb, Croatia. 62

Official Gazette, 2001. Regulation on organic agricultural 63

production and products quality (in Croatian). Vol. 64

No. 12/01. Narodne novine, Zagreb, Croatia. 65

Pebesma, E. J., 2004. Multivariable geostatistics in s: the 66

gstat package. Computers & Geosciences 30 (7), 683–691. 67

Proctor, J., Baker, A., 1994. The importance of nickel for 68

plant growth in ultramafic (serpentine) soils. In: Ross, S. 69

(Ed.), Toxic metals in soil-plant system. Wiley, New York, 70

pp. 417–432. 71

Reimann, C., de Caritat, P., 2005. Distinguishing between 72

natural and anthropogenic sources for elements in the 73

environment: regional geochemical surveys versus enrich- 74

ment factors. Science of The Total Environment 337 (1-3), 75

91–107. 76

Romic, M., Romic, D., 2003. Heavy metals distribution in 77

agricultural topsoils in urban areas. Environmental Geol- 78

ogy 43, 795–805. 79

Romic, M., Romic, D., Dolanjski, D., Stricevic, I., 2004. 80

Heavy metals accumulation in topsoil from the wine- 81

growing regions. part 1. factors which control retention. 82

Agriculturae Conspectus Scientificus 69, 1–10. 83

Schmidt, A., Yarnold, R., Hill, M., Ashmore, M., 2005. Mag- 84

12

Accep

ted m

anusc

ript

netic susceptibility as proxy for heavy metal pollution:1

a site study. Journal of Geochemical Exploration, 85(3),2

109–117.3

Triantafilis, J., Ward, W., McBratney, A., 2001. Land suit-4

ability assesment in the namoi valley of australia, using a5

continuous model. Australian journal of Soil Research 39,6

273–290.7

Unit Geo Software Development, 2001. ILWIS 3.0 Academic8

user’s guide. ITC, Enschede.9

URL http://www.itc.nl/ilwis/10

Van der Gaast, N., Leenaers, H., Zegwaard, J., 1998. The11

grey areas in soil pollution risk mapping the distinction12

between cases of soil pollution and increased background13

levels. Journal of Hazardous Materials 61 (1-3), 249–255.14

Webster, R., Oliver, M., 2001. Geostatistics for Environmen-15

tal Scientists. Statistics in Practice. John Wiley & Sons,16

Chichester.17

13