Prediction of landslide susceptibility using rare events logistic regression: A case-study in the...

19
Prediction of landslide susceptibility using rare events logistic regression: A case-study in the Flemish Ardennes (Belgium) M. Van Den Eeckhaut a,b, , T. Vanwalleghem a,c , J. Poesen a , G. Govers a , G. Verstraeten a,b , L. Vandekerckhove d a Physical and Regional Geography Research Group, K.U. Leuven, Redingenstraat 16, B-3000 Leuven, Belgium b Fund for Scientific ResearchFlanders, Belgium c K.U. Leuven Research Fund, Belgium d Land Division, Ministry of Flanders, Belgium Received 23 August 2005; received in revised form 1 December 2005; accepted 2 December 2005 Available online 19 January 2006 Abstract In this article a statistical multivariate method, i.e., rare events logistic regression, is evaluated for the creation of a landslide susceptibility map in a 200 km 2 study area of the Flemish Ardennes (Belgium). The methodology is based on the hypothesis that future landslides will have the same causal factors as the landslides initiated in the past. The information on the past landslides comes from a landslide inventory map obtained by detailed field surveys and by the analysis of LIDAR (Light Detection and Ranging)-derived hillshade maps. Information on the causal factors (e.g., slope gradient, aspect, lithology, and soil drainage) was extracted from digital elevation models derived from LIDAR and from topographical, lithological and soil maps. In landslide- affected areas, however, we did not use the present-day hillslope gradient. In order to reflect the hillslope condition prior to landsliding, the pre-landslide hillslope was reconstructed and its gradient was used in the analysis. Because of their limited spatial occurrence, the landslides in the study area can be regarded as rare events. Rare events logistic regression differs from ordinary logistic regression because it takes into account the low proportion of 1s (landslides) to 0s (no landslides) in the study area by incorporating three correction measures: the endogenous stratified sampling of the dataset, the prior correction of the intercept and the correction of the probabilities to include the estimation uncertainty. For the study area, significant model results were obtained, with pre-landslide hillslope gradient and three different clayey lithologies being important predictor variables. Receiver Operating Characteristic (ROC) curves and the Kappa index were used to validate the model. Both show a good agreement between the observed and predicted values of the validation dataset. Based on a qualified judgement, the created landslide susceptibility map was classified into four classes, i.e., very high, high, moderate and low susceptibility. If interpreted correctly, this classified susceptibility map is an important tool for the delineation of zones where prevention measures are needed and human interference should be limited in order to avoid property damage due to landslides. © 2005 Elsevier B.V. All rights reserved. Keywords: Landslide susceptibility map; Rare events logistic regression; Deep-seated landslides; Pre-landslide hillslope gradient 1. Introduction Hilly lands adjacent to large cities are often populated. For example, the combination of a close location to cities Geomorphology 76 (2006) 392 410 www.elsevier.com/locate/geomorph Corresponding author. Physical and Regional Geography Research Group, K.U. Leuven, Redingenstraat 16, B-3000 Leuven, Belgium. Tel.: +32 16 326428; fax: +32 16 326400. E-mail address: [email protected] (M. Van Den Eeckhaut). 0169-555X/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.geomorph.2005.12.003

Transcript of Prediction of landslide susceptibility using rare events logistic regression: A case-study in the...

2006) 392–410www.elsevier.com/locate/geomorph

Geomorphology 76 (

Prediction of landslide susceptibility using rare events logisticregression: A case-study in the Flemish Ardennes (Belgium)

M. Van Den Eeckhaut a,b,⁎, T. Vanwalleghem a,c, J. Poesen a, G. Govers a,G. Verstraeten a,b, L. Vandekerckhove d

a Physical and Regional Geography Research Group, K.U. Leuven, Redingenstraat 16, B-3000 Leuven, Belgiumb Fund for Scientific Research—Flanders, Belgium

c K.U. Leuven Research Fund, Belgiumd Land Division, Ministry of Flanders, Belgium

Received 23 August 2005; received in revised form 1 December 2005; accepted 2 December 2005Available online 19 January 2006

Abstract

In this article a statistical multivariate method, i.e., rare events logistic regression, is evaluated for the creation of a landslidesusceptibility map in a 200 km2 study area of the Flemish Ardennes (Belgium). The methodology is based on the hypothesis thatfuture landslides will have the same causal factors as the landslides initiated in the past. The information on the past landslidescomes from a landslide inventory map obtained by detailed field surveys and by the analysis of LIDAR (Light Detection andRanging)-derived hillshade maps. Information on the causal factors (e.g., slope gradient, aspect, lithology, and soil drainage) wasextracted from digital elevation models derived from LIDAR and from topographical, lithological and soil maps. In landslide-affected areas, however, we did not use the present-day hillslope gradient. In order to reflect the hillslope condition prior tolandsliding, the pre-landslide hillslope was reconstructed and its gradient was used in the analysis. Because of their limited spatialoccurrence, the landslides in the study area can be regarded as “rare events”. Rare events logistic regression differs from ordinarylogistic regression because it takes into account the low proportion of 1s (landslides) to 0s (no landslides) in the study area byincorporating three correction measures: the endogenous stratified sampling of the dataset, the prior correction of the intercept andthe correction of the probabilities to include the estimation uncertainty. For the study area, significant model results were obtained,with pre-landslide hillslope gradient and three different clayey lithologies being important predictor variables. Receiver OperatingCharacteristic (ROC) curves and the Kappa index were used to validate the model. Both show a good agreement between theobserved and predicted values of the validation dataset. Based on a qualified judgement, the created landslide susceptibility mapwas classified into four classes, i.e., very high, high, moderate and low susceptibility. If interpreted correctly, this classifiedsusceptibility map is an important tool for the delineation of zones where prevention measures are needed and human interferenceshould be limited in order to avoid property damage due to landslides.© 2005 Elsevier B.V. All rights reserved.

Keywords: Landslide susceptibility map; Rare events logistic regression; Deep-seated landslides; Pre-landslide hillslope gradient

⁎ Corresponding author. Physical and Regional Geography ResearchGroup, K.U. Leuven, Redingenstraat 16, B-3000 Leuven, Belgium.Tel.: +32 16 326428; fax: +32 16 326400.

E-mail address: [email protected](M. Van Den Eeckhaut).

0169-555X/$ - see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.geomorph.2005.12.003

1. Introduction

Hilly lands adjacent to large cities are often populated.For example, the combination of a close location to cities

393M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

such as Brussels andGhent and a green and hilly characterhas turned the Flemish Ardennes in Belgium into anattractive residential area. Although newspapers in theregion regularly report material damage due to theinitiation or reactivation of a landslide, only few peopleare fully aware of their impact. A detailed field survey incombination with the interpretation of LIDAR-derivedhillshade maps in the Flemish Ardennes revealed that a200 km2 study area is marked by at least 116 old, largelandslides and 29 recently initiated shallow landslides(Fig. 1; Ost et al., 2003; Van Den Eeckhaut et al., 2005a).Most of the large deep-seated landslides are located under

Fig. 1. Location of the study area in Belgium. The hilly character and the valleFlanders, published by OC GIS-Vlaanderen [MVG-LIN-AWZ and MVG-LINseated landslides.

forest and can be classified as dormant. They have beeninactive during more than one annual cycle of seasons buttheir causes of movement remain apparent (Cruden andVarnes, 1996). However, human activities on or in thevicinity of these inactive landslides decrease the stabilityof the inherently unstable hillslopes; especially landlevelling by locally removing and adding hillslopematerial for the construction of houses and otherinfrastructure, as well as a poor water management suchas absence of a drainage system, creation of ponds and theconstruction of swimming pools are important. Analysisof newspaper articles, interviews with local residents and

y asymmetry are visible on the LIDAR-derived hillshade map (DEM of-AMINAL] in 2005). White polygons: 29 shallow and 116 large, deep-

394 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

technical reports has resulted in a database consisting ofrecorded initiations and reactivations of 28 shallow and 27deep-seated landslides since 1900 (Van Den Eeckhaut etal., 2005b). These initiations or reactivations mainlyoccurred after periods of persistent rainfall betweenJanuary and March. They often caused damage tobuildings and infrastructure (Fig. 2), economical lossesand psychological problems.

In order to reduce the damage caused by landslideinitiations and reactivations, a landslide susceptibilitymap is needed. As defined by Brabb (1984) such a mapportrays the spatial distribution of the actual andpotential slope failures. In contrast with landslide hazardmaps, a susceptibility map provides no information onthe timing and the magnitude of the predicted landslideevent (Carrara et al., 1995; Guzzetti et al., 2005). In thecase of the Flemish Ardennes where no information onthe timing of the deep-seated landslides is available, alandslide susceptibility map would be a useful tool notonly for geomorphologists but also for local and regionalauthorities, decision makers, planners, insurance com-

Fig. 2. Photographs of landslides in the study area. (a) Main scarp of an old dJune 2003). Person indicates scale. Note tilted trees on the left; (b) orthopholandslide. The slice that fell of the main scarp in March 2001 was part of a fDecember 2003 (Kluisbergen, March 2004). Persons are standing on the foo

panies and inhabitants of the region. Authorities, forexample, will need to link specific land use regulations tothe susceptibility zones, so that prevention measures(e.g., limitation of human activities in the area) could betaken on hillslope sections prone to landslide initiation.Similarly remediation strategies could be designed forhillslopes prone to landslide reactivation.

There are many methods to create landslide suscep-tibility maps. More information on methodologies forspatial landslide susceptibility assessments can be foundin review papers (e.g., Soeters and van Westen, 1996;Guzzetti et al., 1999; Dai et al., 2002; Chung and Fabbri,2005). These methods all have their own advantages anddisadvantages, but most of them are founded on thesame conceptual model (Carrara et al., 1995; Guzzetti etal., 1999):

1) mapping of landslides in the study area;2) mapping of environmental factors which are sup-

posed to be directly or indirectly correlated withslope instability;

eep-seated landslide which was reactivated in March 2001 (Maarkedal,to (provided by OC GIS-Vlaanderen in 1998) of the same deep-seatedootball field; (c) damage to a garden caused by a landslide initiated int of the landslide.

395M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

3) estimation of the relative contribution of these factorsin generating slope-failures; and

4) classification of hillslope sections into domains ofdifferent susceptibility levels.

This conceptual model is based on the hypothesisthat “the past and the present landslide locations are thekey to the future” (Carrara et al., 1995; Zêzere, 2002).More specifically locations susceptible to landslideswill be selected because of their similarity in environ-mental characteristics to those of the landslides alreadymapped in the study area. Among all the methodologiesused to model landslide susceptibility, statistical andphysically-based models are the most widely used (e.g.,Guzzetti et al., 1999; Dai et al., 2002). For modellingthe susceptibility to landslides in large areas wheredetailed geotechnical data are lacking and thushampering the use of physically-based models, logisticregression is nowadays a common statistical modellingtechnique (e.g., Carrara et al., 1995; Atkinson andMassari, 1998; Begueria and Lorente, 1999; Vanackeret al., 2003; Ayalew and Yamagishi, 2005). For such ananalysis the study area is divided into mapping unitssuch as grid cells, terrain units, slope units or uniquecondition units. Each mapping unit is given a codetypically ‘1’ and ‘0’ to indicate, respectively, the‘presence’ or ‘absence’ of a landslide. With logisticregression one tries to find the best-fitting modeldescribing the relationship between the dependentvariable (in this case the presence or absence oflandslides) and a set of independent variables. In thedomain of political sciences King and Zeng (2001)recently reported that ordinary logistic regressionsharply underestimates probabilities if the number of1s (presence) in the population is dozens to thousandsof times smaller than the number of 0s (absence). Toenable the application of logistic regression for suchrare events, they suggested three correction measures:i.e., an endogenous stratified sampling of the dataset, aprior correction of the intercept and a correction of theprobabilities to include the estimation uncertainty.Landslides in regions such as the Flemish Ardennescan be regarded as rare, because their spatial distribu-tion is usually much more limited than slopes withoutlandslides, although reactivations of each landslide canoccur regularly. To our knowledge, only Begueria andLorente (1999) dealt with mass movement using amethod suitable for rare spatial events, through theirstudy of 136 debris flows in a 54.6 km2 study area in theSpanish Pyrenees. However, they only applied the firsttwo correction measures suggested by King and Zeng(2001).

The main objective of this study is to evaluate thesuitability of rare events logistic regression to create alandslide susceptibility map. As this is a relatively newtechnique, a detailed description of the methodology isgiven first. Then rare events logistic regression isapplied to a 200 km2 study area in the FlemishArdennes. Attention is paid to the readability andclassification of the obtained landslide susceptibilitymap because it will be consulted for different purposes.

2. Study area

In the study area of the Flemish Ardennes, topo-graphical, lithological, hydrological and seismic char-acteristics have contributed to the presence oflandslides. Fig. 1 shows the hilly character of thestudy area. Altitudes range from 20 m a.s.l. in the valleyof the River Scheldt to 150 m a.s.l. on the Tertiary hillsin the southern part of the area. Hillslopes are generallynot very steep. Only 2% of the hillslope sections haveslope gradients larger than 20%. Due to the valleyasymmetry, these steeper slope sections are located onhillslopes with a south to northwest orientation.

The lithology consists of loose Tertiary marinesediments which are overlain by Quaternary aeolianloess with varying thickness. The Tertiary sediments canbe grouped into an alternation of clayey sand layers andclay layers with a dip of less than 0.4% to the NNE(Jacobs et al., 1999a). A more detailed description of thelithologies which outcrop in the study area is summa-rized in Table 1 (Jacobs et al., 1999a,b). The KortrijkFormation covers about 67% of the whole study area.Only the three uppermost members of this formation arepresent. The Saint-Maur Member (KoSm) is a silty clay,located in the flatter areas of both the most southern partof the study area and the valley of the River Scheldt(Fig. 1). This member is overlain by the Moen Member(KoMo). Comprising 40% of the total study area, thislayer consists of clayey coarse silts with intercalations ofclay layers. The member is further overlain by theAalbeke Member (KoAa), a relatively thin layer ofhomogenous blue massive clays containing more than50% of clay. The Aalbeke Member encircles the TieltFormation (Tt), with micaceous and glauconitic clayeysands and a clay content ranging between 27% and 30%,covering a relatively wide part of the study area (i.e.,27%). The overlaid Ghent Formation is subdivided intothe clays of the Merelbeke Member and the glauconiticsands of the Vlierzele Member. The other three youngestTertiary formations are found in less than 1.5% of thestudy area. On the tops of the highest Tertiary hills,located in the southern part of the study area, the fine

Table 1Geological layers in the Flemish Ardennes (after Jacobs et al., 1999a,b)

Chrono-stratigraphy

Formation Member Lithology Averagethickness (m)

Tertiary Pliocene NP NP NP NP

Miocene Diest (Di) Glauconitic, oxidized sand 2 to 3

Oligocene NP NP NP NP

Eocene Maldegem (Ma) Homogeneous blue clay and glauconitic sandy clay 2 to 3

Lede (Ld) Sand 5

Gent (Ge) Vlierzele (GeVl) Glauconitic sand with clay lenses 5Merelbeke(GeMe)

Dark clay with sand lenses 5

Tielt (Tt) Micaceous and glauconitic clayey sand, alternatingwith clay layers

20 to 30

Kortrijk (Ko) Aalbeke (KoAa) Homogeneous blue massive clay 10Moen (KoMo) Clayey coarse silt to fine sand with clay layers 45St.-Maur (KoSm) Silty clay 27Mont-Héribu (KoMh) Clayey sand, sandy or silty clay NP

Paleocene NP NP NP NP

Member: lithostratigraphical subdivision of rock bodies within a formation (Whittow, 1984); NP: not present in study area.

396 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

sands of the Lede Formation are overlain by homoge-neous blue clays and glauconitic sandy clays of theMaldegem Formation. These formations are furtheroverlain by the yellow brownish glauconitic sands of theDiest Formation, which are locally lithified and protectthe hills against erosion.

Lithology, with its alternation of clayey sand layersand clay layers, has an important influence on thehydrology in the study area, as perched water tables arebuilt up in the permeable sandy layers resting on lesspermeable clays. Many springs occur where theseperched water tables are cut by topography. Theexfiltrating groundwater has led to a high drainage andvalley density which is visible in Fig. 1.

Several historical earthquakes including one in 1938which occurred in the village of Nukerke in the studyarea (M=5.6; the strongest local record for the 20thcentury), prove that the faults crossing and bordering theregion are still active (De Vos et al., 1993; De Vos, 1997).For the initiation of the old deep-seated landslides, aseismic origin, perhaps in combination with persistentrainfall, may not be excluded (Ost et al., 2003).

Land use and land cover correspond to lithology, soiltype, topography and hydrology pattern. Croplands arelocated on the plateaus of the lower hills, and pasturesdominate on gentle and moderately sloping hillslopes.

The Tertiary hills and steep hillslopes are forested (I.W.O.N.L., 1987).

3. Materials and methods

The creation of the landslide susceptibility map isbased on the aforementioned conceptual model sug-gested by Carrara et al. (1995) and Guzzetti et al. (1999).Its fourfold structure is used during the description ofthe materials and methods.

3.1. Mapping of landslides

The selected 200 km2 study area is part of the430 km2 study area investigated by Ost et al. (2003)and Van Den Eeckhaut et al. (2005a). The landslideinventory maps they constructed are digital vector mapsresulting from a detailed field survey. During the fieldsurvey, observed landslides were mapped on a 1 :10,000topographical map published by the National Geograph-ic Institute (NGI) in 1972. Although landslide inventorymaps are generally created through a combination ofaerial photo interpretation (API) and detailed fieldchecks (Carrara et al., 1992; van Westen et al., 1999;Metternicht et al., 2005), API proved not to be aneffective mapping technique in the Flemish Ardennes

397M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

(Van Den Eeckhaut et al., 2005a), because 85% of thelandslides observed in the field are partly or completelycovered with forest. The difficulty in using API fordelineating landslides on densely forested hillslopes wasalso noted by Brardinoni et al. (2003) and Schulz (2004).Recently the field survey-based landslide inventory mapof Van Den Eeckhaut et al. (2005a) was checked andcorrected in a GIS environment (IDRISI Kilimanjaro)using LIDAR-derived slope, hillshade and contour mapswith a 5 m resolution. Detailed information on thecreation of these maps will be given by Van DenEeckhaut et al. (submitted for publication). During thischeck most attention was paid to the correction of theboundaries of the landslides mapped in the field,because, compared to the mapped landslides on thetopographical map, direct digitizing on the LIDAR-derived maps enabled a more accurate mapping of theseboundaries. However, as the landslides are old, not alllandslide parts could be corrected with the sameaccuracy. As in Schulz (2004), the main scarp locationswere mapped with high certainty, but the lateral marginsor flanks were with less certainty, and the landslide toeswere with the least certainty reflecting the subduedmorphology of the landslide toes.

Following the division proposed by Keefer (1984),the landslide inventory distinguishes between 29shallow (depthb3 m) and 116 deep-seated (≧3 m)landslides. Guzzetti et al. (1999) recommend notcreating one landslide susceptibility map for differenttypes of landslides. Due to the small number ofshallow complex landslides (i.e., 29) and the fact thateight of them are located within the accumulationzone of a deep-seated landslide, no susceptibility mapwas created for the shallow landslides. A typical large,deep-seated landslide (e.g., Fig. 2a, b) has an affectedarea ranging from 0.35 to 30.1 ha with a mean of 4.4ha (200 m long and 220 m wide), and has a mainscarp height ranging from 3 to 18 m with mean of8.5 m. Generally, it is located on a slope sectionwith a gradient of 16%. The 116 deep-seatedlandslides are all earth slides, which can be subdividedinto ‘rotational’, ‘complex’ and ‘possible’ ones.Containing 95 of the 116 landslides, the rotationalslides are the most dominant group. The 10 complexslides are rotational slides with flow characteristics inthe accumulation zone. Finally, as recommended byGuzzetti et al. (2000), 13 suspicious hillslope sectionsor possible landslides, mainly characterized by anunclear main scarp, were also mapped.

In this study all 116 deep-seated landslides will beused for rare events logistic regression analysis. As wepreferred to use grid cells as terrain units, the landslide

inventory was rasterized. A 10 m resolution waschosen in order to create a detailed landslidesusceptibility map. Logistic regression assumes thedata to be statistically independent (Cliff and Ord,1981). However, the landslide and non-landslide cellsusually show a positive spatial autocorrelation (Hos-mer and Lemeshow, 1989; Vanacker et al., 2003), i.e.,cells adjacent to a (non-)landslide cell tend also to be(non-)landslide cells. Therefore, when logistic regres-sion analysis using all the cells within the study areais performed, computed test statistics are too oftendeclared significant under the null-hypothesis(Legendre and Legendre, 1998). In order to avoidthis problem, only one cell of each of the 116 land-slides is used for the creation of the susceptibilitymap. As our aim is to predict the probability oflandslide initiation, it is only relevant to select a cellfrom the depletion zone (Atkinson and Massari, 1998).Thus, the central cell of the depletion zone wasselected to represent each landslide (Fig. 3), whichmeans that one of the main errors in a landslideinventory map, the boundaries of the landslides(Carrara et al., 1992; Malamud et al., 2004; VanDen Eeckhaut et al., 2005a), are not taken into theanalysis.

3.2. Mapping of environmental factors expected toaffect slope stability

The predictor variables used in this study aretopographical (terrain height, hillslope gradient, as-pect, profile and plan curvature), lithological, hydro-logical (soil drainage, upslope contributing area anddistance to rivers) and seismic (distance to faults)causal factors (Table 2). All topographical variablesare derived from the LIDAR data (DEM of Flanders,provided by OC GIS-Vlaanderen [MVG-LIN-AWZand MVG-LIN-AMINAL] in 2005) using the standardroutines available in IDRISI Kilimanjaro. NormallyLIDAR-derived maps have a smaller resolution (e.g.,1 m), but for the analysis with the above-notedlandslide data, grid size was increased to 10 m. In thisstudy the ‘pre-landslide hillslope gradient’ was usedinstead of the actual hillslope gradient. For non-landslide cells the gradient is the same as thatcalculated from the LIDAR-derived DEM. For thedepletion area of a landslide, the gradient prior to thelandsliding was generally smaller than the present-daygradient. As shown in Fig. 3, the original slope of thedepletion zone was reconstructed as the difference inaltitude between the highest and lowest points of thedepletion zone (i.e., Ht−Hb) divided by the horizontal

Fig. 3. Typical landslide in the Flemish Ardennes (Nieuw Kerkhof, Ronse) with indications of the depletion zone and its central cell, which is used inthe rare events logistic regression analysis. This figure also illustrates how the pre-landslide hillslope gradient was estimated.

398 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

distance between the points (Le). Then the slope wasassigned to the central landslide cell of the depletionzone.

Information on lithology and soil drainage isderived from the 1 :50,000 Tertiary geological mapand the 1 :20,000 soil map, published by OC GIS-Vlaanderen in 2001. The contents of both maps areavailable as digital vector data, and they wereconverted to raster data with a 10 m resolution. Theupslope contributing area, used as a surrogate fordischarge of both surface runoff and subsurface flow,was calculated in the WaTEM model (Desmet and

Govers, 1996; Van Oost et al., 2000). Distance torivers and faults was determined in IDRISI Kilimanjaroafter digitizing the rivers from the 1 :10,000 topo-graphical map published by NGI in 1972, and thefaults from the geological map of the Brabant Massif(De Vos et al., 1993). As already mentioned the timingof the initiation of the large, deep-seated landslides isnot known. Therefore land use, which is subject tochanges within decades is not incorporated as pre-dictor variable.

In order to check the representativeness of the centralcell of the depletion zone in terms of the prediction

Table 2Response and predictor variables used in rare events logistic regression modelling

Variable Type Source

Response (dependent)Presence of a landslide (1–0) Categorical Ost et al. (2003)

Van Den Eeckhaut et al. (2005a)Predictor (independent)Terrain height (m, above sea level) Continuous DHM, OC GIS-Vlaanderena

Pre-landslide hillslope gradient (%) Continuous DHM, OC GIS-Vlaanderena

Aspect (°) Categorical DHM, OC GIS-Vlaanderena

North (N) DummyNortheast (NE) DummyEast (E) DummySoutheast (SE) DummySouth (S) DummySouthwest (SW) DummyWest (W) DummyNorthwest (NW) Reference group

Profile curvature (m−1) Continuous DHM, OC GIS-Vlaanderena

Plan curvature (m−1) Continuous DHM, OC GIS-Vlaanderena

Lithology Categorical OC GIS-Vlaanderenb

Diest Formation (Di) DummyMaldegem Formation (Ma) DummyLede Formation (Ld) DummyGent Formation (Ge) Reference groupTielt Formation(Tt) DummyKortrijk Formation Aalbeke Member (KoAa) DummyKortrijk Formation Moen Member (KoMo) DummyKortrijk Formation St-Maur Member (KoSm) Dummy

Soil drainage Categorical OC GIS-Vlaanderenb

Dry soils (D) Reference groupWet soils (W) DummyVery wet soils (VW) DummyNo information on soil drainage (NI) Dummy

Upslope contributing area (ha) Continuous DHM, OC GIS-Vlaanderena

Distance to nearest river (m) Continuous NGIc

Distance to faults (m) Continuous De Vos et al. (1993)

All maps have a 10 m resolution.aData provided in 2004; b2001; c1972.

399M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

variables, Chi2 tests were made for each variable to testwhether our data sampled from the 116 landslide cells(central cell of each depletion area) has the samefrequency distribution as the data sampled from all thecells located within depletion zones. In all cases thehypothesis of the same distribution was accepted at asignificance level of 0.01.

3.3. Estimation of relative contribution of factorscontrolling slope failures

3.3.1. Univariate tests of associationA univariate analysis preceded the multivariate rare

events logistic regression. For both the same sample wasused. Chi2 (χ2) was used to test the association betweeneach predictor variable and the occurrence of landslides.

Then Cramer's V (Kendall and Stuart, 1979) wasderived from Chi2 to tests the strength and type ofassociation:

V ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

v2

NminðR−1;C−1Þ

sð1Þ

where N is the sample size, R is the number of therows in the contingency table, and C is that of thecolumns.

In fact V compares the observed distribution with theone expected under no relationship, and it standardizesthis comparison eliminating the effect of N and the sizeand shape of the contingency table. Values of V rangebetween 0 and 1 (Kendall and Stuart, 1979; Vanacker etal., 2003).

400 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

3.3.2. Rare events logistic regressionOrdinary logistic regression (OLR) describes the

relationship between a dichotomous response variable(Y, i.e., the presence or absence of a landslide) and a setof explanatory variables (x1, x2,… , xn). The explanatoryvariables may be continuous or discrete (with dummyvariables) and do not need a normal frequencydistribution. The logistic response function can bewritten as (Allison, 2001):

PðY ¼ 1Þ ¼ p¼ 1

1þ e−ðaþh1x1þh2x2þ: : :þhnxnÞð2Þ

where p is the probability of occurrence of a landslide, αis the intercept and βi is the coefficient for theindependent variables xi estimated by maximumlikelihood.

Eq. (2) can be linearized with the followingtransformation in which the natural logarithm of theodds, log p

1−p

� �, called the logit, is linearly related with

the independent variables:

logp

1−p

� �¼ aþ h1x1 þ h2x2 þ : : : þ hnxn

¼ aþ hX ð3Þ

In logistic regression the significance of the coeffi-cients βi is tested with the Wald test, which is obtainedby comparing the maximum likelihood estimate ofevery βi with its estimated standard error (Hosmer andLemeshow, 1989). A coefficient is significant if thetested null hypothesis that the estimated coefficient is 0can be rejected at a 0.01 (or 0.05) significance level.

Despite its popularity, logistic regression may causesome problems if the total area affected by landslides ismuch smaller than the total study area. In such studyareas the occurrence of landslides can be seen as rareevents as noted before. The term ‘rare event data’ wasintroduced in political sciences by King and Zeng(2001) to describe binary dependent variables withdozens to thousands of times fewer 1s than 0s. Whenmodelling rare events, many statistical procedures,including logistic regression, can sharply underestimatethe probability of occurrence of the event (King andZeng, 2000, 2001). To avoid these problems, King andZeng (2001) incorporated three corrections into ordinallogistic regression and called their method “rare eventslogistic regression”.

The first correction deals with the selection of arepresentative sample. Instead of selecting an equalnumber of landslide and non-landslide cells, 1 to 5 timesmore non-landslides cells have to be selected. This type

of selection is known as choice-based or endogenousstratified sampling. As mentioned before for each of the116 landslides only one cell, i.e., the central cell of thedepletion zone, was used. Stratified random sampling of580 cells located outside a mapped landslide wasundertaken. As a landslide prediction model does nothave a scientific significance without measuring thevalidity of the results (Chung et al., 2002; Chung andFabbri, 2005) the sample of 696 cells was subdividedinto calibration and validation datasets. The calibrationdataset contained 80% of the cells (93 landslide cellsand 465 non-landslide cells), and the validation setcontained the remaining 20% of the cells (23 landslidecells and 115 non-landslide cells).

Selecting dependent variables may introduce asampling bias on the logistic coefficients and thereforea second correction, called prior correction, is needed(King and Zeng, 2000, 2001). By using the actualfraction of 1s in the population, τ, and the observedfraction of 1s in the sample data, y , the correctedintercept α is can be calculated as:

a ¼ a−ln1−ss

� �y

1−y

� �� �ð4Þ

However, calculation of the probabilities pi, bysubstituting the corrected intercept α of Eq. (4) intoEq. (2), results in an underestimation of the probabil-ities, because the estimation uncertainty on the coeffi-cients βi is neglected. The third correction deals withthis problem. By adding a correction factor Ci to pi,corrected probabilities are obtained as:

PðYi ¼ 1Þ ¼ pi þ Ci ð5ÞFor each observation Ci is calculated as:

Ci ¼ ð0:5−piÞpið1−pÞXV ðhÞX V ð6Þwhere X is a 1×(n+1) vector of values for eachindependent variable, X ′ is the transpose of X and V(β)the variance–covariance matrix.

Model fitting via logistic regression is sensitive tocollinearities among the independent variables (Hosmerand Lemeshow, 1989). Therefore multicollinearitydiagnostic statistics produced by linear regression, i.e.,Variation Inflation Factors (VIF) and Tolerance (TOL),were calculated using the SAS software and variableswith VIF of N2 and TOL of b0.4 were excluded fromthe logistic analysis (Allison, 2001). After exclusion ofthe highly correlated dependent variables, stepwiselogistic regression was carried out in SAS in order toincorporate only the predictor variables with animportant contribution to the presence of landslides. In

401M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

this study the significance level of the score Chi2 forentering the model was set at 0.15. The significancelevel of the Wald Chi2 for a variable to stay was set at0.05. Then rare events logistic regression was computedwith the relogit module in the Zelig software, a productworking with the R software (Imai et al., 2005). Severalparameters (i.e., Akaike's information criterion (AIC),the negative of twice the log likelihood (−2 log L), areaunder the ROC curve (AUC), Somers' Dxy andGoodmann Kruskal Gamma) were used to analyse theappropriateness of the models and to select the bestmodel (SAS, 1985). AIC penalizes the log-likelihoodfor estimating more parameters and thus selects a modelthat fits well but has a minimum number of parameters.In general, lower values of AIC and −2 log Lcorrespond to more desirable models (Allison, 2001).AUC, Somers' Dxy and Goodmann–Kruskal Gammavary between 0 and 1, with large values correspondingto stronger associations between the predicted andobserved values (Allison, 2001).

Finally the predicted probabilities were computedfor the validation set and the 138 observations wereclassified into one of the response levels (i.e., 1 or 0)using different probabilities as cut-off values. Cellswith a probability above this cut-off value wereclassified as a landslide cell whereas those with lowerprobabilities were classified as non-landslide cells. Foreach cut-off value, a confusion matrix was designedwhich enabled the determination of the percentages ofcorrectly classified observations and the number offalse positive (FP), false negative (FN), true positive(TP) and true negative (TN) observations. Theobtained model is valid if a cut-off value leads to ahigh percentage of correctly classified observationsand a low number of false positive and false negativeobservations (Gobin et al., 2001). The validity of amodel is graphically represented on a ROC (ReceiverOperating Characteristic) curve. To enable the creationof ROC curves, the sensitivity and specificity werecalculated for the different cut-off values. Thesensitivity is the number of correctly predictedlandslide cells (TP) over the total number of predictedlandslide cells (TP+FN), whereas the specificity is thenumber of correctly predicted non-landslides cells(TN) over the total number of predicted non-landslidescells (FP+TN). ROC curves plot ‘sensitivity’ versus‘1 — specificity’ of a logistic model as the cut-offvalue varies from 0 to 1. In our study the curves wereused to evaluate the prediction models. A ROC curvewould run vertically from (0, 0) to (0, 1) and thenhorizontally to (1, 1) for a model with a perfectaccuracy. On the other hand, a model performing no

better than random guessing would run diagonallyfrom (0, 0) to (1, 1) (Lasko et al., 2005).

A second validation method used in this study is theKappa index (Hoehler, 2000). Cohen's Kappa index (κ)determines the agreement between two classificationswith a nominal or ordinal scale and is calculated as(Cohen, 1960; Kappa Multicalc Page, 2005):

j ¼ Pobs−Pexp

1−Pexpð7Þ

where Pobs is the observed agreements and Pexp is theexpected agreements, which are calculated as:

Pobs ¼ TPþ TNN

ð8Þand

Pexp ¼ ðTPþ FNÞðTPþ FPÞ þ ðFPþ TNÞðFNþ TNÞN 2

ð9ÞA κ value of 1 is obtained in case of a perfect

agreement between the model and reality, whereas a κvalue of 0 means that the agreement is no better thanchance. In between these two values, agreement changesfrom slight (b0.20) over fair (0.20–0.40), moderate(0.40–0.60), substantial (0.60–0.80) to nearly perfect(0.80–1). The worst case, i.e., agreement is worse thanchance, results in negative κ values. Byrt et al. (1993)observed that κ gives an underestimation of thesuggested agreement when bias and/or prevalenceeffects are present. Prevalence effects occur when theoverall proportion of positive and negative results issubstantially different from 50% whereas bias effectsbecome important when the real and modelled classifi-cations differ in proportion of positive results. ThereforeByrt et al. (1993) suggested the following equation forPrevalence and Bias Adjusted Kappa (PABAK):

PABAK ¼ 2Pobs−1 ð10ÞFor this equation, the aforementioned range of κ is

also applied. Although Hoehler (2000) suggested thatthis correction of κ is usually unnecessary, PABAK wasused in our analysis as both prevalence and bias effectsare present.

3.4. Classification of hillslope sections into differentlandslide susceptibility levels

Based on the variables derived from digital maps, alandslide susceptibility map was created showing values

402 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

ranging between 0 and 1, being the probability that thecell will be located within a landslide depletion zone.However, given the fact that the presence of a landslideis a rare event, the probabilities will generally be verylow and the susceptibility map will be very difficult tointerpret by persons not familiar with the theory behindthe used methodology. The landslide susceptibility mapfor the Flemish Ardennes provides not only scientificresults but also a tool for regional and local authorities,planners and residents. In order to facilitate mapinterpretation for these users, the landslide susceptibilitymap was classified based on the authors' judgement. Informer studies the number of landslide susceptibilityclasses generally varied from two (i.e., stable orunstable, e.g., Begueria and Lorente, 1999) to five(very low, low, moderate, high or very high suscepti-bility; e.g., AGS Subcommittee, 2000). In this study thelandslide susceptibility was classified into four classes:low, moderate, high and very high.

4. Results and discussion

4.1. Univariate tests of association

For all the predictor variables collected for thisanalysis, the Chi2 tests confirm a highly significantdifference between the distribution of values for thecells affected by landslides and that for the unaffectedcells (significance level 0.01; Table 3). Especially pre-landslide hillslope gradient (Cramer's V=0.921), lithol-ogy (0.425), terrain height (0.419), and profile/plancurvature (0.328 and 0.316, respectively) have thehighest predictive power. These results suggest thatlandslide depletion zones are mainly located on steeperhillslope sections (N25%) with concavity with clayeymaterial (Tt, KoAa, KoMo; Table 1) and altitudesbetween 60 and 105 m above sea level. However, in thisunivariate analysis, the intercorrelation between vari-

Table 3Results of univariate analysis

Significant predictorvariable (significance level is 0.01)

χ2 Cramer's V

Terrain height above sea level (m) 122.1 0.419Pre-landslide hillslope gradient (%) 589.9 0.921Aspect (°) 62.3 0.299Upslope contributing area (ha) 15.2 0.148Profile curvature (m−1) 74.9 0.328Plan curvature (m−1) 69.6 0.316Lithology 125.6 0.425Soil drainage 35.7 0.226Distance to rivers (m) 8.8 0.112Distance to faults (m) 12.6 0.135

ables was not taken into account. In the next section, theresults of the multivariate rare events logistic regressionmodel are presented and discussed.

4.2. Rare events logistic regression

Calculating VIF and TOL using SAS revealed thatthe variable terrain height together with plan or profilecurvature had to be excluded from the logistic analysisin order to avoid multicollinearity. The stepwise logisticregression carried out in SAS resulted in a modelincluding the predictor variables of pre-landslidehillslope gradient and lithology of the Tielt (Tt), KortrijkAalbeke (KoAa) and Kortrijk Moen (KoMo) Forma-tions characterized by clayey layers (Table 1). The otherdependent variables in Table 2 were not significant atthe 0.05 level. The four significant predictor variableswere used for the rare events logistic regression with theZelig software, and the following relation, written in theform of Eq. (3), was found:

LogitðpiÞ¼ log

pi1−pi

� �

¼ −23:76þð0:46�pre�landslides hillslope gradientÞþð6:87�TtÞþð8:99� KoAaÞþð6:77�KoMoÞ ð11Þ

Eq. (11) is referred to as ‘multivariate model 1’ inTables 4 and 5. Wald Chi2 tests revealed that the fourindependent variables are significant at the 0.01significance level (Table 4). The intercept and coeffi-cients of the most important univariate model, i.e., themodel based on pre-landslide hillslope gradient, are alsoshown in Table 4.

In Table 5 the uni- and multi-variate models arecompared. The values of AIC and −2 log L arecomparable and therefore only AIC is discussed here.The decrease in AIC observed between the model withonly the intercept (504.8) and the univariate model(138.2) indicates that the predictive power of the modelincreased drastically by including the pre-landslidehillslope gradient as a variable. The multivariatemodel 1 has an AIC of 103.3, meaning that the modelperformance was improved by the addition of the threeother predictor variables. However, as the decrease ismore modest, the predictive power of the lithologicalvariables is less important than that of the pre-landslide

Table 4Results of rare events logistic regression modelling

Model Coefficient (β) Standard error Wald χ2 Pr (Nχ2) Odds ratio (eβ)

Univariate modelIntercept α −15.54 0.67 −22.75 1.83e−81Pre-landslide hillslope gradient 0.38 0.04 10.74 1.35e−24 1.46

Multivariate model 1Intercept α −23.76 2.15 -11.03 1.05e−25Pre-landslide hillslope gradient 0.46 0.06 8.38 4.47e−16 1.58Tt 6.87 1.41 4.89 1.33e−06 962.95KoAa 8.99 1.66 5.41 9.48e−08 8022.46KoMo 6.77 1.49 4.55 6.52e−06 871.31

Multivariate model 2Intercept α −16.01 1.04 −15.43 7.31e−45Terrain height 0.005 7.58e−03 6.30 3.36e−10 1.005NE −3.10 0.95 −3.25 1.21e−03 0.045E −3.01 0.94 −3.20 1.46e−03 0.05SE −1.21 0.51 −2.36 1.85e−02 0.30Tt 3.06 0.71 4.32 1.83e−05 21.33KoAa 4.67 0.78 5.96 4.56e−09 106.70KoMo 2.84 0.81 3.50 5.04e−04 17.12Wet 1.25 0.28 4.43 1.13e−05 3.49

All predictor variables are significant (significance level is 0.01).Multivariate model 1 corresponds to Eq. (11). Multivariate model 2 was obtained after predictor variable pre-landslide hillslope gradient wasremoved from the dataset.

403M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

hillslope gradient. These results prove what was alreadyobserved in 4.1, namely that pre-landslide hillslopegradient is the most important controlling factor for theinitiation of landslides in the study area. The area under

Table 5Comparison of the multivariate and univariate rare events logisticregression models

Evaluation statistic Univariatemodel

Multivariatemodel 1

Multivariatemodel 2

−2 log L intercept only 502.8 502.8 502.8−2 log L intercept

and covariates134.2 93.3 299.5

AIC intercept only 504.8 504.8 504.8AIC intercept and

covariates138.2 103.3 317.5

Area under theROC curve (AUC)

0.987 0.992 0.909

Somers' Dxy 0.972 0.983 0.819Goodman–Kruskal

gamma0.972 0.983 0.819

Univariate model is only based on pre-landslide hillslope gradient.Multivariate model 1 (Eq. (11)) contains pre-landslide hillslopegradient and three lithological layers rich in clay (Tt, KoAa, andKoMo) as independent variables. Multivariate model 2 is based onterrain height, three lithologies rich in clay, slope aspect and thepresence of wet soils. −2 log L: −2 log(likelihood); AIC: Akaike'sinformation criterion; see 3.3.2. for more information on evaluationstatistics.

the ROC curve (AUC), Somers' Dxy and Goodmann–Kruskal gamma, all calculated from the concordant anddiscordant pairs of the calibration dataset, are very highfor both models (Table 5). Compared to the values of theunivariate model, those of the multivariate model 1 areslightly better. AUC, for example, is 0.992 and 0.987 forthe multi- and uni-variate models, respectively. Thissmall difference is also visible on the ROC curve shownin Fig. 4. However, Table 5 and Fig. 4 confirm that themultivariate model 1 performs better than the univariatemodel and that not only pre-landslide hillslope gradientbut also lithology is an important factor controlling theinitiation of landslides in the study area. All significantpredictor variables have a positive influence on theinitiation of landslides (Table 4). An odds ratio (eβ) of1.58 for the continuous variable pre-landslide hillslopegradient indicates that the probability of landslideinitiation increases by 58%, namely 100 (eβ−1), forevery percent increase of the hillslope gradient. For thethree categorical lithological variables, eβ gives thechange in the odds compared to the reference category,i.e., the Gent Formation, characterized by a clay layeroverlain by a sandy layer with clay lenses. Theprobabilities of landsliding for the Tielt, KortrijkAalbeke or Kortrijk Moen Formations are 960, 8020and 870 times higher than in the Gent Formation,respectively. Comparison of confusion matrices (Table

Fig. 4. Receiver Operating Characteristic (ROC) curves of calibration and validation datasets. Inset: a detail of the upper left part of the graph. SeeTable 4 for more information on the different models.

404 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

6a) of the calibration dataset for a range of cut-off valuesrevealed that for the multivariate model 1, the highestpercentage of correctly classified observations and thelowest number of false positive and false negativeobservations were obtained with a cut-off value of0.00005. In total 96% of the 558 cells were correctlyclassified (Table 6b). Only 1 of the 93 landslide cells and20 of the 465 non-landslide cells were incorrectlyclassified. The very low cut-off value is due to thelimited proportion of landslide-affected areas, and itcorresponds well with the value of 0.00006 used in thisanalysis.

The above-mentioned results indicate that wesuccessfully calibrated a multivariate rare events logisticregression model based mainly on pre-landslide hill-slope gradient. In order to evaluate the effects of othervariables, the Relogit module was also run without pre-landslide hillslope gradient. As the multicollinearityanalysis noted before showed that terrain height was

collinear only with pre-landslide hillslope gradient,terrain height was included in the dataset analysed. Theresultant rare events logistic regression model is referredto as ‘multivariate model 2’ (Tables 4 and 5). In thismodel, not only lithologies rich in clay (Tt, KoAa andKoMo; Table 1), but also terrain height, slope aspect(NE, E, and SE) and soil drainage conditions (wet soils)become important and significant predictor variables(Table 4). Terrain height has a relatively minor influenceon the occurrence of landslides as its odds ratio (eβ) isclose to unity. Slope aspect reflects valley-asymmetry.As reported before, slopes facing the south to northwestare steeper than slopes facing the north to northwest.Analysis of the odds ratio reveals that, compared to thereference category (NW; Table 2), 22, 20 and 3 timesless landslides are expected to occur on NE, E and SEhillslopes, respectively. Apart from the slope steepness,wind-driven rain mainly blowing from between S andWSW (Bodeux, 1977; Sneyers and Vandiepenbeeck,

Table 6Confusion matrices

(a) Logistic model

1 0

Reality 1 TP FN0 FP TN

% correctly classified pixels= (TP+TN)/N

(b) Logistic model

1 (116) 0 (442)

Reality 1 (93) 92 10 (465) 20 445

% correctly classified pixels=96%

(c) Logistic model

1 (22) 0 (116)

Reality 1 (23) 20 30 (115) 2 113

% correctly classified pixels=96%

(a) Definition (TP: true positive; FN: false negative; FP: false positive; TN: true negative; N: total number of observations); (b) confusion matrix ofcalibration dataset for logistic regression model 1 using a cut-off value of 0.00005. All cells with a probability above 0.00005 were classified as alandslide and were given a value of 1, remaining cells were classified as no landslide and were given a value of 0; (c) same as (b) but for validationdataset. Again the best result was obtained with a cut-off value of 0.00005.

405M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

1995) might favour the presence of landslides on S- toWSW-oriented slopes through its positive effect onshear stress. Table 4 also shows that wet soils areapproximately 3.5 times more susceptible to landslidingthan dry soils. However, both Table 5 and Fig. 4 provethat the multivariate model 1 performs better than themodel 2, although the latter is not a bad model.

The validation of the multivariate model 1 wasperformed through its application to the 138 cellswhich were not used for the calibration of the model.The resultant confusion matrix (Table 6c) shows thatfor a cut-off value of 0.00005, 132 of the 138 cells(96%) were perfectly classified. Three of the 21landslide cells were incorrectly classified, becausethey had very low pre-landslide hillslope gradients.The ROC curve of the validation dataset (Fig. 4) isonly slightly lower than the curve for the calibrationdataset of the multivariate model 1. A Cohen's Kappaκ of 0.87 and a PABAK of 0.93 were obtained,reflecting a nearly perfect agreement between reality andthe model. The validation of the multivariate model 2was also performed. Again 0.00005 was the best cut-offvalue, and corresponding κ and PABAK were 0.48 and0.58, respectively, indicating a moderate agreementbetween reality and the model.

Based on the validation results, we can conclude thatthe multivariate regression model 1 is capable ofpredicting hillslope sections vulnerable to landslides in

the study area, and therefore a susceptibility map basedon Eq. (11) can contribute to the assessment of thehazardous zones. This susceptibility map, created usingIDRISI Kilimanjaro, is shown in Fig. 5. As nocontinuous grid data of pre-landslide hillslope gradientare available, the LIDAR-derived DEM was used. Thisonly affects calculation for the depletion zones ofexisting landslides, where the main scarp cell will havesteeper hillslope gradients than the values used for thecalibration and validation of the model. Although valueson the susceptibility map range from 0 to 1, most cellshave very low probabilities. Only 7.8% of the study areahas a probability larger than 0.00005. In case ofindependent observations, the sum of all probabilitieson the susceptibility map should be the number ofpositives. Due to the spatial autocorrelation, the sum ofthe probabilities for all cells is much larger than 116,even when for the mapped landslides only the centralcells of the depletion zone is taken into account. Overall,the susceptibility map is quite similar to the hillslopegradient map. The aforementioned valley asymmetry,directly incorporated in the multivariate model 2, is alsovisible; hillslope sections facing the south to northwestare clearly more susceptible to landslides than thehillslope sections facing the north to southeast, whichhave more gentle slope gradients. The 116 deep-seatedlandslides, also shown in Fig. 5, are located in areas withhigher landslide susceptibility. Finally, it should be

Fig. 5. Landslide susceptibility map of the study area from rare events logistic regression. The 116 large, deep-seated landslides are indicated usingwhite polygons. Values in legend are probabilities calculated with the multivariate model 1 (Eq. (11)).

406 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

noted that some roads and railroads with steep banks areindicated as susceptible to landslides. This observationreflects the use of detailed topographic informationobtained by LIDAR, and it is acceptable as small slopefailures were often observed along road or railroadbanks during our field survey.

An important question is whether the use of rareevents logistic regression really increased the qualityof the created landslide susceptibility map. In order toget an idea of the difference between ordinary logisticregression (OLR) and rare events logistic regression,OLR was carried out using the same sample (i.e., with580 cells used for the calibration of the rare eventslogistic models) as well as two larger samples. Each ofthe larger samples contained 93 landslide cells (i.e., thelandslide cells taken from the calibration dataset of the

rare events logistic regression) and 25,000 or 79,453non-landslides cells, which comprised about 1.3% or4% of the total study area, respectively. The conclu-sions drawn from the comparison of the logisticmodels were similar to those reported by King andZeng (2001). OLR overestimated the observations withlow P (Y=1) and underestimated those with large P(Y=1). The underestimation enlarged as the differencein sample size between the landslide and non-landslidecells increased. The uncorrected landslide probabilitiesof the 25,093 and the 79,546 samples are ca. 10 and40 times smaller than the probabilities calculated withrare events logistic regression, respectively, whichproves that the assessment of landslide susceptibilityusing OLR is problematic when landslides arerelatively rare.

Table 7Classification of landslide susceptibility cells into four classes based on the multivariate model 1

Probability Susceptibilityto landslides

% in study area % in depletion area % in landslide area

Cum. Cum. Cum.

0.0012–1 Very high 2.4 2.4 85.4 85.2 23.9 23.90.00005–0.0012 High 5.4 7.8 9.3 94.7 29.5 53.40.00001–0.00005 Moderate 6.9 14.7 2.0 96.7 20.6 74.00–0.00001 Low 85.3 100 3.3 100 26.0 100

Cum.: cumulative distribution.

407M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

4.3. Classification of hillslope sections into classes ofsusceptibility to landsliding

The obtained landslide susceptibility map (Fig. 5)was classified into four different categories: very high,high, moderate and low susceptibility (Table 7), basedon the authors' judgement about the cut-off values aftermany trials. All cells with probabilities above 0.0012were classified as very susceptible. This class onlycontains 2.4% of the total study area; but 85% of thecells are located within landslide depletion zones, and24% of the cells are within the total area affected bylandslides. The lower limit for the high susceptibilityclass was set at 0.00005, because it provided the best

Fig. 6. Orthophoto (provided by OC GIS-Vlaanderen in 1998) showing alandslide susceptibility cells with a 10 m resolution. This figure illustrates thatthe site. A reactivation of this deep-seated landslide can cause damage tosusceptibility.

combination of true positives and true negatives for boththe calibration and validation datasets.

In order to limit property damage due to landslides inthe study area, this susceptibility map should beconsulted by local and regional authorities and landuse planners. Land use changes planned on vulnerablesites should be regulated. Prevention measures such asconstructing appropriate drainage systems and founda-tions should be taken to maintain the stability of the site.Special attention should be paid to those locations withvery high to moderate susceptibilities to landsliding.These three classes comprise almost 14% of the totalstudy area. However, persons using the susceptibilitymap have to be aware that the model is especially

large, deep-seated landslide (Zwalm; black line), with the classifiedthe susceptibility of a site also depends on the susceptibility upslope ofbuildings A and B downslope, located in a zone with low landslide

408 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

designed for the prediction of landslide initiation zonesand that not all past and future accumulation zones areindicated on the map. If one wants to check whether acertain place is susceptible to landsliding, it is necessaryto check the susceptibility of not only the place but alsoadjacent areas especially upslope areas. The importanceof this is illustrated with a case study from the studyarea. Fig. 6 shows a part of an orthophoto for the studyarea with the classified susceptibility map. An old, deep-seated landslide is indicated on the photo. This figureclearly shows the good classification result for thelandslide depletion zone, but the accumulation zone isonly partly classified as susceptible to landsliding.According to the map, some large buildings in Fig. 6 arelocated on a safe site. However, the present-dayboundaries of the deep-seated landslide may movedownslope if the landslide is reactivated; then thesebuildings could be severely damaged. Indeed, abouttwenty years ago, small reactivations of the landslidealready caused damage to the buildings (Owner of thebuilding, personal communication). Therefore, a pro-tection wall was constructed at the toe of the landslideimmediately upslope of the buildings. This example alsoshows that for the evaluation of existing landslide areasa combined use of a landslide susceptibility map and alandslide inventory map is most appropriate.

5. Conclusions

This study has demonstrated that rare event logisticregression is suitable for the creation of landslidesusceptibility maps. This relatively new technique ingeomorphological studies was evaluated in a 200 km2

study area in the Flemish Ardennes where the landslideaffected area is considerably smaller than the unaffectedarea. The best-calibrated model (the multivariate model1, Eq. (11)) incorporates the most important factorscausing landslides in the study area: pre-landslidehillslope gradient and the outcrop of a clayey lithology.Comparison of the ROC curves and measures ofassociation including Akaike's information criterion,the area under the ROC curve, Somers' Dxy andGoodmann–Kruskall gamma for the multivariate andunivariate models revealed that the susceptibility tolandsliding is mainly determined by pre-landslidehillslope gradient. Removing this variable from thedataset resulted in a worse but still acceptable model,containing terrain height, slope aspect, the presence ofclayey lithologies and wet soils as significant predictorvariables. The multivariate model 1 was validated usingROC curves, Cohen's Kappa, and the prevalence andbias adjusted Kappa. These all showed a nearly perfect

agreement between the observed and predicted proba-bilities of the validation dataset, which allowed us toconclude that the model is capable of predictinghillslope sections prone to landsliding. Due to the highspatial predictive power of pre-landslide hillslopegradient, the created susceptibility map is quite similarto the slope gradient map. Because of the rare landslideoccurrence in the study area, most grid cells in thelandslide susceptibility map have very low probabilities.Then cells in the map were classified into foursusceptibility classes. Special attention should be paidwhen human interventions are planned in zones withvery high to moderate landslide susceptibilities. The factthat the model was calibrated and validated withinformation from the landslide depletion zone shouldbe considered when using the susceptibility map. Aslandslide accumulation zones are not always com-pletely classified as susceptible to landslides, it isimportant to check the susceptibility of not only thesite itself but also its surroundings especially the areaupslope of the site. Finally a comparison of ordinarylogistic regression with rare events logistic regressionindicated that the proposed statistical method per-forms well in determining landslide susceptibilitywhere landslides are rare spatial objects. In otherareas where landslides are abundant, the method mayperform less efficiently compared to ordinary logisticregression.

Acknowledgements

This research is supported by the Fund for ScientificResearch—Flanders. The authors thank M.C. Van-maercke-Gottigny and L. Ost for providing valuablebackground information on the landslides in the studyarea. AMINAL (Land Division) is thanked for provid-ing us the LIDAR data. We are also grateful to Prof. G.King and Prof. L. Zeng for clarifying some of the resultsobtained during this analysis and to the editor andreviewers for their critical remarks that helped inimproving the manuscript.

References

AGS Sub-committee, 2000. Landslide risk management. AustralianGeomechanics Technical Report 35, 49–92.

Allison, P.D., 2001. Logistic Regression Using the SAS System:Theory and Application. Wiley Interscience, New York. 288 pp.

Atkinson, P.M., Massari, R., 1998. Generalised linear modelling ofsusceptibility to landsliding in the central Apennines. Computers& Geosciences 24, 373–385.

Ayalew, L., Yamagishi, H., 2005. The application of GIS-basedlogistic regression for landslide susceptibility mapping in the

409M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

Kakuda–Yahiko Mountains, central Japan. Geomorphology 65,15–31.

Begueria, S., Lorente, A., 1999. Landslide hazard mapping bymultivariate statistics; comparison of methods and case study inthe Spanish Pyrenees. The Damocles Project Work, Contract NoEVG1-CT 1999-00007. Technical Report, 20 pp.

Bodeux, A., 1977. Wind Velocity and Wind Direction in Belgium (inDutch). Koninklijk Meteorologisch Instituut, Ukkel. 171 pp.

Brabb, E.E., 1984. Innovative approaches to landslide hazardmapping. Proceeding of the 4th International Symposium onLandslides. Toronto, vol. 1, pp. 307–324.

Brardinoni, F., Slaymaker, O., Hassan, M.A., 2003. Landslideinventory in a rugged forested watershed: a comparison betweenair-photo and field survey data. Geomorphology 54, 179–196.

Byrt, T., Bishop, L., Carlin, J.B., 1993. Bias, prevalence and kappa.Journal of Clinical Epidemiology 46, 423–429.

Carrara, A., Cardinali, M., Guzzetti, F., 1992. Uncertainty in assessinglandslide hazard and risk. ITC Journal 2, 172–183.

Carrara, A., Cardinali, M., Guzzetti, F., Reichenbach, P., 1995. GIStechnology in mapping landslide hazard. In: Carrara, A., Guzzetti,F. (Eds.), Geographical Information Systems in Assessing NaturalHazards. Kluwer Acad. Publ, Dordrecht, pp. 135–176.

Chung, C.F., Kojima, H., Fabbri, A.G., 2002. Stability analysis ofprediction models for landslide hazard mapping. In: Allison, R.J.(Ed.), Applied Geomorphology: Theory and Practice. John Wileyand Sons, London, pp. 1–19.

Chung, C.F., Fabbri, A.G., 2005. Systematic procedures of landslidehazard mapping for risk assessment using spatial predictionmodels. In: Glade, T., Anderson, M., Crozier, M.J. (Eds.),Landslide Hazard and Risk. John Wiley and Sons, Chichester,pp. 139–174.

Cliff, A.D., Ord, J.K., 1981. Spatial Processes: Models andApplications. Pion, London. 266 pp.

Cohen, J.A., 1960. A coefficient of agreement for nominal scales.Educational and Psychological Measurement 20, 37–46.

Cruden, D.M., Varnes, D.J., 1996. Landslide types and processes.In: Turner, A.K., Schuster, R.L. (Eds.), Landslides: Investiga-tion and Mitigation. Transportation Research Board SpecialReport, vol. 247. National Academy Press, Washington D.C.,pp. 36–75.

Dai, F.C., Lee, C.F., Nhai, Y.Y., 2002. Landslide risk assessment andmanagement: an overview. Engineering Geology 64, 65–87.

Desmet, P.J.J., Govers, G., 1996. Comparison of routing algorithms fordigital elevation models and their application for predictingephemeral gullies. International Journal of Geographical Informa-tion Systems 10, 311–331.

De Vos, W., Verniers, J., Herbosch, A., Vanguestaine, M., 1993. A newgeological map of the Brabant Massif, Belgium. GeologicalMagazine 130, 606–611.

De Vos, W., 1997. Influence of the granitic batholith of Flanders onAcadian and later deformation (Brabant Massif, Belgium).Aardkundige Mededelingen 8, 49–52.

Gobin, A., Campling, P., Feyen, J., 2001. Logistic modelling toidentify and monitor local land management systems. AgriculturalSystems 67, 1–20.

Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999.Landslide hazard evaluation: a review of current techniques andtheir application in a multi-scale study, central Italy. Geomorphol-ogy 31, 181–216.

Guzzetti, F., Cardinali, M., Reichenbach, P., Carrara, A., 2000.Comparing landslide maps: a case study in the upper Tiber Riverbasin, central Italy. Environmental Management 25, 247–263.

Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., Ardizzone, F.,2005. Probabilistic landslide hazard assessment at the basin scale.Geomorphology 72, 272–299.

Hoehler, F.K., 2000. Bias and prevalence effects on kappa viewed interms of sensitivity and specificity. Journal of Clinical Epidemi-ology 53, 499–503.

Hosmer, D.W., Lemeshow, S., 1989. Applied Regression Analysis.Wiley, New York, p. 307.

Imai, K., King, G., Lau, OK, 2005. Zelig: everyone's statisticalsoftware, Version 2.1-3, User's Manual. 197 pp. (http://gking.harvard.edu/zelig/docs/zelig.pdf).

I.W.O.N.L., 1987. Text Clarifying the Belgian Soil Map, Map Sheet98E Ronse (in Dutch)Uitgegeven onder de auspiciën van hetInstituut tot aanmoediging van het Wetenschappelijk Onderzoek inNijverheid en Landbouw, Brussels. 163 pp.

Jacobs, P., De Ceukelaire, M., De Breuck, W., De Moor, G., 1999a.Text clarifying the Belgian geological map, Flemish region, MapSheet 29 Kortrijk, map scale 1 /50,000 (in Dutch). Ministerie vanEconomische zaken en Ministerie van de Vlaamse Gemeenschap,Brussels. 68 pp.

Jacobs, P., Van Lancker, V., De Ceukelaire, M., De Breuck, W., DeMoor, G., 1999b. Text clarifying the Belgian geological map,Flemish region, Map Sheet 30 Geraardsbergen, map scale 1 /50,000 (in Dutch). Ministerie van Economische zaken enMinisterie van de Vlaamse Gemeenschap, Brussels. 58 pp.

Kappa Multicalc Page, 2005. (http://medcalc3000.com/Kappa_MC.htm) website visited 23/06/2005.

Keefer, D.K., 1984. Landslides caused by earthquakes. GeologicalSociety of American Bulletin 95, 406–421.

Kendall, M., Stuart, A., 1979. The Advanced Theory of Statistics:Inference and Relationship. Griffin, London. 748 pp.

King, G., Zeng, L., 2000. Explaining rare events in internationalrelations. International Organisation 55, 693–715.

King, G., Zeng, L., 2001. Logistic regression in rare events data.Political Analysis 9, 137–163.

Lasko, T.A., Bhagwat, J.G., Zou, K.H., Ohno-Machado, L., 2005. Theuse of receiver operating characteristic curves in biomedicalinformatics. Journal of Biomedical Informatics 38, 404–415.

Legendre, P., Legendre, L., 1998. Numerical Ecology, Second Englishedition. Elsevier, Amsterdam. 853 pp.

Malamud, B.D., Turcotte, D.L., Guzzetti, F., Reichenbach, P., 2004.Landslide inventories and their statistical properties. Earth SurfaceProcesses and Landforms 29, 687–711.

Metternicht, G., Hurni, L., Gogu, R., 2005. Remote sensing oflandslides: an analysis of the potential contribution to geo-spatialsystems for hazard assessment in mountainous environments.Remote Sensing of Environment 98, 284–303.

Ost, L., Van Den Eeckhaut, M., Poesen, J., Vanmaercke-Gottigny, M.C., 2003. Characteristics and spatial distribution of large landslidesin the Flemish Ardennes (Belgium). Zeitschrift für Geomorpho-logie 47, 329–350.

SAS Institute, 1985. SAS User's Guide, Version 5 edition. SASInstitute, Cary, NC. 956 pp.

Schulz, W.H., 2004. Landslides mapped using LIDAR imagery,Seattle, Washington. USGS Open File Report 2004-1396.Washington. 11 pp.

Sneyers, R., Vandiepenbeeck, M., 1995. Notice Sur le Climat de laBelgique (in French). Koninklijk Meteorologisch Instituut vanBelgië, Ukkel. 62 pp.

Soeters, R., van Westen, C.J., 1996. Slope stability recognitionanalysis and zonation. In: Turner, A.K., Schuster, R.L. (Eds.),Landslides: Investigation and Mitigation. Transportation Research

410 M. Van Den Eeckhaut et al. / Geomorphology 76 (2006) 392–410

Board Special Report, vol. 247. National Academy Press,Washington D.C., pp. 129–177.

Vanacker, V., Vanderschraeghe, M., Govers, G., Willems, E., Poesen,J., Deckers, J., De Bievre, B., 2003. Linking hydrological, infiniteslope stability and land-use change models through GIS forassessing the impact of deforestation on slope stability in highAndes watersheds. Geomorphology 52, 299–315.

Van Den Eeckhaut, M., Poesen, J., Verstraeten, G., Vanacker, V.,Moeyersons, J., Nyssen, J., van Beek, L.P.H., 2005a. Theeffectiveness of hillshade maps and expert knowledge in mappingold deep-seated landslides. Geomorphology 67, 351–363.

Van Den Eeckhaut, M., Poesen, J., Verstraeten, G., Govers, G., 2005b.Exploratory study on the landslides in the Flemish Ardennes: PartI. Study area, literature analysis, mapping and classification of thelandslides (+Fiches), statistical and spatial analysis and method-ology for creating a landslide risk map (in Dutch). Rapport inOpdracht van de Vlaamse Gemeenschap,. AMINAL, AfdelingLand, Brussels. 154 pp.

Van Den Eeckhaut, M., Poesen, J., Verstraeten, G., Vanacker, V.,Moeyersons, J., Nyssen, J., van Beek, L.P.H., Vandekerckhove, L.(submitted for publication). The use of LIDAR-derived images formapping old landslides under forest. Remote Sensing andEnvironment.

Van Oost, K., Govers, G., Desmet, P., 2000. Evaluating the effects ofchanges in landscape structure on soil erosion by water and tillage.Landscape Ecology 15, 577–589.

van Westen, C.J., Seijmonsbergen, A.C., Mantovani, F., 1999.Comparing landslide hazard maps. Natural Hazards 20, 137–158.

Whittow, J., 1984. Dictionary of Physical Geography. PenguinReference, London. 591 pp.

Zêzere, J.L., 2002. Landslide susceptibility assessment consideringlandslide typology, a case study in the area north of Lisbon(Portugal). Natural Hazards and Earth System Sciences 2, 73–82.