An assessment of multivariate and bivariate approaches in landslide susceptibility mapping: a case...

ORI GIN AL PA PER

An assessment of multivariate and bivariate approachesin landslide susceptibility mapping: a case studyof Duzkoy district

Taskin Kavzoglu • Emrehan Kutlug Sahin • Ismail Colkesen

Received: 8 April 2013 / Accepted: 1 November 2014 / Published online: 9 November 2014� Springer Science+Business Media Dordrecht 2014

Abstract Landslide susceptibility maps are valuable sources for disaster mitigation works

and future investments of local authorities in unstable hazard-prone areas. However, there are

limitations and uncertainties inherent in landslide susceptibility assessment. For this purpose,

many methods have been suggested and applied in the literature, which are generally cate-

gorized as bivariate and multivariate. Here, in this paper, the most popular and widely used

multivariate [support vector regression (SVR), logistic regression (LR) and decision tree

(DT)] and bivariate methods [frequency ratio (FR), weight of evidence (WOE) and statistical

index (SI)] were compared with respect to their performances in landslide susceptibility

modeling problem. Duzkoy district of Trabzon Province was selected due to its unique

topographical and lithological characteristics, magnifying shallow landslide risk potential.

Slope, lithology, land cover, aspect, normalized difference vegetation index, soil thickness,

drainage density, topographical wetness index and elevation were employed as landslide

occurrence factors. Accuracy measures based on confusion matrix (i.e., overall accuracy and

Kappa coefficient) and receiver operating characteristic (ROC) curve were employed to

compare the performances of the methods. Furthermore, McNemar’s test was employed to

analyze the statistical significance of differences in method performances. The results indi-

cated that multivariate approaches (i.e., SVR, LR and DT) outperformed the bivariate

methods (i.e., FR, SI and WOE) by about 13 %. Within the multivariate approaches, SVR

method performed the best with the highest accuracy, while FR method was the most

effective and accurate bivariate method. Interpretation of AUC values and the McNemar’s

statistical test results revealed that the SVR method was superior in modeling landslide

susceptibility compared with the other multivariate and bivariate methods.

Keywords Landslide Susceptibility mapping � Support vector machine � Decision tree �Weight of evidence � Logistic regression � Frequency ratio

T. Kavzoglu (&) � E. Kutlug Sahin � I. ColkesenDepartment of Geodetic and Photogrammetric Engineering, Gebze Institute of Technology, CayirovaCampus, Gebze 41400, Kocaeli, Turkeye-mail: [email protected]

123

Nat Hazards (2015) 76:471–496DOI 10.1007/s11069-014-1506-8

1 Introduction

Landslides are destructive natural disasters affecting a large number of people and prop-

erties. Production of landslide susceptibility map is an important task for decision making

to help citizens, planners and engineers to reduce loss of life and property that landslide

cause in these places in the future. The results of the landslide susceptibility assessment

can help to identify the degree of susceptibility areas best suited for potential risk zones.

Production of accurate, up-to-date and reliable landslide susceptibility maps has been a hot

topic for landslide-related studies. In recent years, many methods have been applied and

evaluated in the literature for producing landslide susceptibility maps. These methods have

been commonly classified as qualitative and quantitative approaches. According to Guz-

zetti et al. (1999), qualitative approaches are subjective and portray the hazard zoning in

descriptive terms. These approaches can be divided into two groups, namely geomor-

phologic analysis and map combination. In the first group, the landslide susceptibility is

determined directly either in the field or by the interpretation of images trough geomor-

phologic analysis (Rupke et al. 1988; Aleotti and Chowdhury 1999; Guzzetti et al. 1999;

Fall et al. 2006; Kouli et al. 2010; Bui et al. 2011). In the second group, map combination

is based on combining a number of landslide influence factor maps (Saaty 1980; Soeters

and Van Westen 1996; Ayalew et al. 2004; Kavzoglu et al. 2014).

Landslide susceptibility assessment is usually performed using two common approa-

ches; namely qualitative and quantitative approaches. Generally, qualitative approaches are

based on expert opinions. On the other hand, the quantitative approaches, such as statistical

and probabilistic approaches, can be considered as more objective due to their data-

dependent characteristics (Kanungo et al. 2009). Quantitative methods can be generally

categorized into bivariate and multivariate methods that focus on the analysis of numerical

data and statistics expressing the relationship between instability factors and landslides

(Bui et al. 2011). In bivariate analysis, each factor map is combined with the landslide

distribution map, and weighting values based on landslide densities are calculated for each

factor. There are several methods [e.g., frequency ratio (FR), SI analyses and weight of

evidence (WOE)] in the literature using bivariate statistics for the assessment of landslide

susceptibility. Recently, the Bayesian probability model, using the WOE approach, has

been applied for landslide susceptibility assessment (Nandi and Shakoor 2009; Regmi et al.

2010; Schicker and Moon 2012). Also, SI, another bivariate statistic algorithm, has been

employed to determine landslide susceptibility (Van Westen 1997; Yalcin 2008; Bui et al.

2011; Yilmaz et al. 2012). Another bivariate method is the FR analysis that calculates the

probabilistic relationship between dependent (i.e., landslides) and independent variables

(e.g., slope, aspect and lithology) (Sarkar et al. 1995; Akgun et al. 2008; Yilmaz 2009;

Demir et al. 2013). A main drawback of the statistical methods is their dependence to data

structure. Statistical methods require assumptions about underlying sampling distributions

and statistical properties of samples (e.g., normal distribution). In addition, the size of the

training data set is also very important if statistical estimates are to be reasonable. In the

case of limited training set size, for example, it is difficult to define decision boundaries in

the feature space. Therefore, effectiveness of statistical methods highly depends on the data

set size and its characteristics. Distribution-free multivariate approaches should be pre-

ferred particularly when the data are not in the form of normal distribution. Lately, some

distribution-free multivariate methods including artificial neural networks (Gomez and

Kavzoglu 2005; Yilmaz 2009; Pradhan and Lee 2010), support vector machines (SVMs)

(Yao et al. 2008; Yilmaz 2010; Kavzoglu et al. 2014), decision trees (DTs) (Hwang et al.

2009; Abdallah 2010; Bui et al. 2012) and logistic regression (LR) (Ayalew and Yamagishi

472 Nat Hazards (2015) 76:471–496

123

2005; Yao et al. 2008; Pradhan and Lee 2010) have been utilized to produce landslide

susceptibility maps. In recent years, several studies were reported related to the comparison

of bivariate and multivariate methods. For example, Akgun (2012) compared the perfor-

mances of LR with two bivariate methods, multi-criteria decision and likelihood ratio.

Schicker and Moon (2012) analyzed the landslide susceptibility maps produced by weights

of evidence and LR in a landslide susceptibility assessment. Devkota et al. (2013) applied

certainty factor and index of entropy as bivariate methods and compared their perfor-

mances with LR method. Althuwaynee et al. (2014) investigated the performance of

Dempster–Shafer-based evidential belief function and analytical hierarchy process, and

modeling performances were compared with the traditional LR method.

In this paper, multivariate (SVM, logical regression and DT) and bivariate (FR, WOE

and SI) methods were utilized for landslide susceptibility process, and their modeling

performances were analyzed with statistical measures. For this purpose, nine thematic

maps representing with slope, lithology, land cover, aspect, soil thickness/slope, NDVI,

drainage density, topographical wetness index (TWI) and elevation were employed for

landslide susceptibility mapping of a region experiencing severe landslides in the past.

Method performances were then analyzed using contingency matrices, ROC curves and

significance test of differences (i.e., McNemar’s test).

2 Study area

Duzkoy district of Trabzon Province located in the Black Sea region of Turkey was chosen

as a study area, covering about 171 km2 land located north of the 41st parallel and east of

the 39th meridian (Fig. 1a). Historical landslide activities in the region have had a serious

impact on many villages, superstructure, substructure, roads, agricultural lands and other

infrastructural developments. Climatic conditions and geological structure of the study area

make it suitable for landslide activity. Its topographical features also play an important role

in landslide occurrence. The elevations in the study area range between 230 and 2,286 m,

and the slope angles range between 0� and 62�. Geological units of the study area are

mainly formed by Lias–Dogger (Jlh,), Upper Jurassic–Lower Cretaceous (Jcr), Upper

Cretaceous–Paleocene (Cru1, Cru2, Cru3, Cru4 and Cru5c) and Eocene (c2, c3, Ev) epochs

(Fig. 1b). Landslide susceptibility mapping using statistical techniques is based on a well-

defined inventory of all known landslides in an area. Thus, an important step is the

production of landslide inventory map of the study area, indicating the locations of the past

landslides and potential non-landslide sites. Landslides occurred lands, published by the

General Directorate of Mineral Research as 1:25,000 scale maps, were defined as vector

polygons. Almost all landslides in the study area can be recognized as shallow landslides

because of the region’s geological and topographical conditions. As reported by Yalcin

et al. (2011), high-intensity rainfalls produce flash floods that cause shallow landslides with

translational characteristics. It should be noted that landslide polygons in the inventory

map depict recent shallow translational landslides zones. Landslides in the study area

primarily lie along the northeast and southwest direction. Non-landslide fields were

obtained from an approach proposed by Gomez and Kavzoglu (2005) that is based on two

basic facts: landslide activity is not likely to happen on river channels and on terrains with

slope angles lower than 5�. All polygons consisting of landslide and non-landslide areas

were converted to 30-m resolution rasterized thematic map to be used in modeling pro-

cesses. Thematic maps of the study area were prepared at a scale of 1:25,000 with posi-

tional accuracy of ±12.5 m. Landslide inventory map and all factor maps were stored in

Nat Hazards (2015) 76:471–496 473

123

raster format into a 30 m 9 30 m pixel size grid, which contains 364,080 pixels, laid out in

555 columns and 656 rows. The inventory map contains 25 distinct landslide events

covered by 833 pixels with an average per-event area of 4.5 ha and a total area of 75 ha.

The inventory map also includes 14 non-landslide areas (totally 1,251 pixels) with an

average area of 10 ha and a total area of 113 ha.

3 Landslide-related parameters

Determination of landslide-related factors and preparing corresponding data sets are cru-

cial steps in the production of landslide susceptibility maps. The topographic and mete-

orological characteristics of the study area were major landslide-conditioning factors. For

this study, nine factors including slope, lithology, land cover, aspect, soil thickness/slope,

NDVI (normalized vegetation index), drainage density, TWI and elevation were utilized

for landslide susceptibility mapping. It should be mentioned that recent studies conducted

in the Black Sea region also considered similar conditioning factors (Yalcin et al. 2011;

Akgun et al. 2008; Nefeslioglu et al. 2008). In the reclassification of factor maps, natural

breaks strategy was employed for all data sets except for the DEM data. As highlighted by

Grozavu et al. (2013), this strategy identifies significant changes in the histogram distri-

bution and sets class breaks which best group similar values and maximize the differences

between the classes. For the DEM data, there was no need for natural breaks as the

histogram shows normal distribution curve characteristics.

One of the most important contributing factors of landslides can be given as the slope of

the terrain. In general, the causes of landslides are associated with the slope angle.

Therefore, it can be stated that landslide risk increases as the slope angle increases.

Fig. 1 a Study area containing landslide and non-landslide areas and b geological map of the study areas

474 Nat Hazards (2015) 76:471–496

123

Because of this consideration, most researchers have used slope angle as a major factor in

landslide occurrence (Cevik and Topal 2003; Bui et al. 2011; Costanzo et al. 2012). A

digital elevation model (DEM) produced from 1:25,000-scale topographic maps was used

to estimate the slope angles that ranged from 0� to 62�. They were reclassified into eight

natural break intervals expressed in decimal degrees (1) 0�–8.6�, (2) 8.6�–14.2�, (3) 14,2�–19.3�, (4) 19.3�–24.2�, (5) 24.2�–28.8�, (6) 28.8�–33.7�, (7) 33.7�–40.1� and (8) 40.1�–62.0�.

Lithology of the terrain is one of the main factors having direct impact on mass

movement (Dai et al. 2001; Yesilnacar and Topal 2005). Landslide activities can be

directly affected by the rock mass properties of the land surface. The lithological map of

the study area published by the General Directorate of Mineral Research and Exploration

of Turkey in 1998 was used after grouping lithological units into ten classes (Fig. 1b).

These classes corresponded to (1) c3 (granite, granodiorite), (2) Ev (pyroclastics, greenish-

gray basalt and andesite, sandy limestone, and tuff), (3) c2 (granite, granodiorite, quartz

diorite and diorite), (4) Cru5a (gray marl, gray-white clayey, micritic/sandy limestone and

tuff), (5) Cru4 (rhyolite, rhyodacitic lava and pyroclastics), (6) Cur3 (basalt, andesite, lava

and pyroclastics), (7) Cru2 (rhyodacite, dacitic lava and pyroclastics), (8) Cru1 (basalt,

andesite, lava and pyroclastics (sandstone, sandy/clayey limestone)), (9) JCr (sandy and

cherty reef limestone and dolomite limestone, clayey and sandy limestone) and (10) Jlh

(basalt, andesite, dacitic lava and pyroclastics). It should be pointed out that preliminary

analysis on the study area showed that landslides mainly occurred in Cru2 and Cru1

formations.

Land use/cover type is another factor associated with landslide. This factor is used to

take into consideration the natural and man-made environmental impacts on land surface.

Environmental and human-induced impacts on land surface, such as deforestation, con-

struction activities, road cuts, exploitation of natural resources, may contribute to landslide

occurrence. For inclusion of land use/cover information, 30-m resolution Landsat

ETM? imagery was selected to produce land use/cover map of the study area. After a

detailed analysis of forest maps and aerial photographs, it was decided that the study area

mainly covered by nine land use/cover types, namely green tea, hazelnut, deciduous,

coniferous, pasture, rocky areas, water (i.e., pond), agricultural lands (including corn,

potato and green bean) and urban lands (including roads, buildings and concrete surfaces).

Maximum likelihood classifier, a traditional supervised classification method, was used to

produce thematic maps of the study area.

Slope aspect is measured in degrees clockwise from 0 to 360, where 0 is north-facing,

90 is east-facing, 180 is south-facing, and 270 is west-facing. Aspect-associated parameters

such as exposure to sunlight, drying winds, rainfall (degree of saturation) and disconti-

nuities are important factors for landslides (Dai et al. 2001). Aspect map derived from the

DEM data was reclassified into ten categories as flat (-1�), north (0�–22.5�, 337.5�–360�),

northeast (22.5�–67.5�), east (67.5�–112.5�), southeast (112.5�–157.5�), south (157.5�–

202.5�), southwest (202.5�–247.5�), west (247.5�–292.5�), northwest (292.5�–337.5�).

Soil thickness is one of the most important predisposing factors in shallow landslides

(Santacana et al. 2003; Alparslan 2011; Segoni et al. 2012). Slope movements, such as

translation or rotational slope failures, occur when sheer stress exceeds sheer strength of the

materials forming the slope (Gray and Leiser 1982). The soil thickness on a hillslope, which

coincides with the failure depth, is a critical parameter in performing a slope-instability

analysis (Ho et al. 2012). In this paper, soil thickness map classified into 9 classes considering

the relation between slope terrain (�) and soil depth (cm) was produced by the General

Directorate of Rural Services. In the construction of soil thickness map, both soil depth and

Nat Hazards (2015) 76:471–496 475

123

slope angle were taken into consideration as a common strategy reported in the literature.

Totally, nine classes were formed by considering various soil depth and slope angle com-

binations (see Table 2). For example, Class I in soil thickness map represents areas with

slopes bigger than 30 degrees and the soil depth ranging from 0 to 20 cm. Vegetation plays an

important role in controlling soil erosion and can help to stabilize the slope by providing

mechanical strength to the subsoil (Singhal and Srivastava 2004). NDVI is a measure about

the green vegetation amount and condition. In landslide susceptibility assessment, NDVI can

be utilized as one of the important indicator of possible vegetation stress such as landslide

events. Chang et al. (2007) point out that the loss of vegetative cover is one of the major causes

of landslides. In the literature, it is reported that landslide-prone areas were usually located in

grassland, afforested area and bare soils (Ercanoglu 2005; Yilmaz 2010). In this study, NDVI

values were obtained by the combination of visible bands (red) and near-infrared band of the

Landsat ETM ? data. NDVI map was classified to produce eight classes using natural break

intervals, from -0.08 to 0.74.

The concept of topographic wetness index (TWI) was introduced by Beven and

Kirkby (1979) through a terrain analysis-based hydrologic model (TOPMODEL). TWI

can be thought of as an abstract parameter used as a basis for estimating the local soil

moisture status and, thus, landslide areas due to surface topographic effect on hydrology

response (Gomez and Kavzoglu 2005). Soil moisture can have strong relation with slope

stability. For this reason, TWI data were reclassified into six classes from 0.58 to 17.2 m

and used in this study.

Drainage density is the total length of all streams and rivers in a drainage basin divided

by the total area of the drainage basin. Drainage density is often used for susceptibility

mapping by many researchers (Sarkar and Kanungo 2004; Suzen and Doyuran 2004;

Yalcin 2008). In the study, a kernel density algorithm was used for drainage density

evaluations. The factor map was reclassified into eight classes with natural break intervals

ranging from 0 to 4.66 km-1.

Lastly, different elevation levels cause varying temperature and rainfall conditions that

are suitable for different plant types to grow. Vivas (1992) indicates that these conditions

are likely to affect slope stability for the data sets considered in this study. For the study

area, topographical elevation maps were produced from the DEM (30 9 30 m2). Eleva-

tions ranging from 230 to 2,286 m were then reclassified into 10 elevation classes based on

equal interval strategy.

4 Methods for landslide susceptibility modeling

Two groups of methods (bivariate and multivariate methods) were compared in a landslide

susceptibility problem. SVMs, LR and DTs were applied as examples of multivariate

approach, while FR, WOE and SI analysis were applied as bivariate methods. Perfor-

mances of the methods were compared with each other in terms of several accuracy

measures. To apply multivariate methods (SVR, LR and DT), training and test data sets

including landslide and non-landslide zones were created using a landslide inventory map.

Training and testing data sets were created with equal numbers of samples for landslide

and non-landslide class using random pixel selection method. As a result, 900 pixels were

selected as training data (i.e., 450 pixels for landslide and 450 pixels for non-landslide) and

700 pixels were selected as testing data (i.e., 350 pixels for landslide and 350 pixels for

non-landslide). Method performances in relation to the ground reference map were ana-

lyzed using both ROC curves and confusion matrices. In addition to these metrics,

476 Nat Hazards (2015) 76:471–496

123

McNemar’s test was utilized to determine the significance of differences in method

performances.

4.1 Multivariate methods

In multivariate models, all independent variables and landslide-occurred lands were treated

together to determine the landslide susceptibility analysis. The multivariate model, unlike

the bivariate model, evaluates the relative contribution of each variable by putting more

emphasis on the variables known to contribute to landslide occurrence (Nandi and Shakoor

2009). Several methods based on multivariate theory were suggested and applied in the

literature for landslide susceptibility assessment. In this study, the most popular multi-

variate methods, namely SVMs, LR and DTs analysis, were chosen to test the robustness of

the multivariate approaches.

4.1.1 Support vector machine regression

SVM, a kernel-based advanced learning algorithm, has lately become one of the most

popular methods for solving classification and regression problems. In recent years, SVMs

have been successfully applied to many problems including landslide susceptibility map-

ping (Pourghasemi et al. 2013; Pradhan 2013; Kavzoglu et al. 2014; Peng et al. 2014). The

basic idea behind the SVM for a given binary classification problem is to find an optimal

separating hyperplane that maximizes the margin to the nearest training data points

(Vapnik 2000; Scholkopf and Smola 2002).

In susceptibility assessment, most regression techniques (e.g., LR) are based on linear

functions. In support vector regression (SVR), the hyperplane is derived from the values

calculated through the dot product. For nonlinear data sets as in landslide susceptibility

modeling, it is not possible to separate landslide and non-landslide classes using a linear

function in feature space. To overcome this problem, the data set is projected into a high-

dimensional space through the use of kernel functions (e.g., linear, polynomial or radial

basis functions), making it possible to separate the two classes with a plane (Ballabio and

Sterlacchini 2012).

In SVR, the main goal is to estimate an unknown continuous-valued function based on a

finite number of noisy samples (Burges and Scholkopf 1997; Durbha et al. 2007). A

regression problem is learned from the training patterns and used to predict the target values

of unseen input vectors. SVR tries to locate a regression hyperplane with small risk in high-

dimensional feature space. Among the various types of SVR, the most popular one is e-intensive loss function that tries to find an optimum hyperplane, from which the distance to all

data points is minimum (Cristianini and Shawe-Taylor 2000; Yeh et al. 2011).

In SVR theory, considering a regression problem with training samples xi; yið Þf gni¼1

where xi is ith sample value of the input vector, yi is the corresponding output and n is the

number of training samples. The SVR approximates the relationship between the input and

output data points in the following form:

F xið Þ ¼ w � / xið Þ þ b ð1Þ

where w is a vector in the feature space, /(xi) is a mapping from the input space to the

feature space F, b is the bias term and �h i denotes the inner product in F. When the e-insensitive loss function is used, the SVR can be expressed in an optimization form

(Vapnik 2000).

Nat Hazards (2015) 76:471–496 477

123

minw;b;n;n�

1

2w2 þ C

Xn

i¼1

ni; n�i

� �s:t:

yi � w/ xið Þh i � bi� eþ ni

w � / xið Þh i þ bi � yi� eþ ni

ni; n�i � 0; i ¼ 1; . . .; n

8<

: ð2Þ

where C [ 0 is the regular constant; under an e- intensive loss function, the slack variables

ni and ni* are described as:

n �ð Þ��

�� ¼0 if n �ð Þ

�� e

n �ð Þ��

�� e otherwise

8<

: ð3Þ

In SVR, several parameters including regularization or penalty parameter (C), threshold

value (e) and kernel function parameter (e.g., c value for radial basis function) must be

defined by the users for a given problem. Basic steps of SVR were summarized by Smola

and Scholkopf (2004) and are shown in Fig. 2. The input pattern for which a prediction is

to be made is mapped into feature space by mapping function (U). Then, dot products are

computed with the training pixels under the map U. This corresponding to evaluating

kernel functions k(x, xi). Finally, the dot products are added up using the weights vi = -

ai - ai*. This plus the constant term b yields a final prediction output. For landslide

susceptibility estimation, each pixel in the image represented by a vector is fed into the

model, and susceptibility level is, thus, estimated.

4.1.2 Logistic Regression

LR is a statistical method for analyzing factor sets in which there are one or more inde-

pendent variables. The outcome is measured with a dichotomous variable such as 0 or 1 or

true and false (Menard 2001). The main objective of LR is to find the best-fitting (yet

reasonable) model to describe the relationship between the presence or absence of land-

slides (dependent variable) and a set of independent parameters such as slope, lithology,

aspect and elevation. LR model is based on the generalized linear model that can be

calculated by the following formula:

Fig. 2 General architecture of support vector regression algorithm

478 Nat Hazards (2015) 76:471–496

123

P ¼ 1

1þ eZð Þ ð4Þ

where P is the probability of an event, Z is a value from -? to ??, defined by the linear

form,

Z ¼ B0 þ B1X1 þ B2X2 þ . . .þ BnXn ð5Þ

where B0 is the intercept of the regression function, n is the number of independent

variables and coefficients (B1, B2, …, Bn) are representative of the contribution of single

independent variables Xi, which measure the contribution of independent set of geo-

graphical variables (X1, X2, …, Xn) (Ayalew and Yamagishi 2005). In the LR model, the

dependent variable can be expressed as:

Logit pð Þ ¼ ln p= 1� pð Þð Þ ¼ 1=1þ e�B0þB1X1þB2X2þ��þBnXn ð6Þ

where p is the probability that the dependent variable has values of only 0 and 1, p/(1 - p)

is the so-called odds or likelihood ratio. Probabilities vary between 0 and 1. As a proba-

bility gets closer to 1, the numerator of the odds becomes larger relative to the denomi-

nator, and the odds become an increasingly large number. On the contrary, if a probability

gets closer to 0, the numerator of the odds becomes smaller relative to the denominator

(Ayalew and Yamagishi 2005).

4.1.3 Decision trees

DTs, a nonparametric supervised learning method, have long been popular in machine

learning arena for the solution of a variety of classification and regression problems

(Breiman et al. 1984; Quinlan 1993; Niuniu and Yuxun 2010; Bui et al. 2012). A DT is a

flow-chart-like structure consisting of a root node (containing all the data), a set of internal

nodes (splits) and a set of terminal nodes (leaves). Each node of the tree structure makes a

binary decision that separates either one class or some of the classes from the remaining

classes (Xu et al. 2005). DT construction involves the recursive partitioning of a set of

training data, which is split into increasingly homogenous subsets on the basis of test

applied to one or more of the feature values (Pal and Mather 2003). When the target

variables are continuous, a DT constructs a regression tree with respect to relationship

between features (i.e., landslide factors) and target value (i.e., landslide-related suscepti-

bility). The construction of a regression tree is based on binary recursive partitioning,

which is an iterative process that splits the data into partitions. Initially, all the training

samples are used to determine the structure of the tree. Then, the algorithm breaks the data

(using every possible binary split and selects the split that partitions the data) into two parts

such that it minimizes the sum of the squared deviations from the mean in the separate

parts. The splitting process is then applied to each of the new branches. The process

continues until each node reaches a user-specified minimum node size (i.e., the number of

training samples at the node) (Xu et al. 2005).

Up to now, many techniques have been developed for DT induction (e.g., ID3, CART

and C4.5). General idea of the DT induction is identical to other type of DT models. Each

induction method employs a learning algorithm to identify a model that best fits the

relationships between the continuous features and target of input data (Nefeslioglu et al.

2010). In this study, classification and regression trees (CART) algorithm introduced by

Breiman et al. (1984) was applied to construct a regression tree for determining landslide

Nat Hazards (2015) 76:471–496 479

123

susceptibility of the study area. CART constructs binary trees, and splits are selected using

the Twoing index given by the formula in Eq. 7. In this equation, L and R refer to the left

and right sides of a given split, respectively, and P(i|t) is the relative frequency of class i at

node t (Breiman 1996). In case of regression, CART tries to find splits that minimize the

prediction-squared error. The prediction in each leaf is based on the weighted mean for

node (Rokach and Maimon 2008).

Twoing tð Þ ¼ PLPR

4

X

i

PðijtLÞ � PðijtRÞj jð Þ2 !

ð7Þ

4.2 Bivariate methods

In bivariate statistical analysis, each factor map is combined with the landslide distribution

map, and weighting values based on landslide densities are calculated for each parameter

class (Suzen and Doyuran 2004). Bivariate approaches are considered robust and flexible

for landslide susceptibility assessment (Van Westen et al. 2003; Suzen and Doyuran 2004;

Thiery et al. 2007). However, several restrictions were reported including model

requirement of categorical/reclassified input data sets, the large sensitivity to the accuracy

of the thematic data and loss of data sensitivity in forced individual analysis of causative

factors (Thiery et al. 2007; Schicker and Moon 2012). In the literature, there are several

bivariate statistical methods that have been used for landslide susceptibility mapping. In

this study, the FR, weights of evidence and SI methods were evaluated for the particular

problem under consideration.

4.2.1 Weight of evidence

Bayesian probability model, known as the weights of evidence (WOE), is an objective

approach for the definition and selection of parameter weights in prediction modeling

(Bonham-Carter 1994). The method is applied where sufficient data are available to

estimate the relative importance of the evidence by statistical means. The basic principle of

WOE is the concept of prior and conditional/posterior probability. The prior probability of

landslide occurrence is estimated from all existing evidences based on the density of

landslide-affected areas in the study site. Posterior probability takes into account other

evidences modifying the initial estimate of landslides occurrence. It is obtained according

to the density of known landslide locations in each class of the variables, addressed as

‘‘evidence’’ such as slope, lithology and aspect (Armas 2012). Detailed information about

WOE method can be found in Regmi et al. (2010) and Van Westen et al. (2003). A pair of

weights, W? and W-, is calculated for the classes of landslide-affected factors using the

following equations (Regmi et al. 2010).

Wþ ¼ ln

w1

w1þw2

w3

w3þw4

" #¼ ln

Landslide area in the considered classTotal landslide area

Stable area in the considered class

Total stable area

" #ð8Þ

W� ¼ ln

w2

w1þw2

w4

w3þw4

" #ð9Þ

where w1 is the number of landslide pixels present in a given factor class, w2 is the number

of landslide pixels not present in the same given factor class, w3 is the number of pixels in

480 Nat Hazards (2015) 76:471–496

123

the given factor class in which no landslide pixels are present and w4 is the number of

pixels in the given factor class when neither a landslide nor the given factor is present.

A positive weight (W?) indicates that the causative factor is present at the landslide

location and this weight is an indication of the positive correlation between presence of the

causative factor and landslides. The negative weights (W-) indicate an absence of the

causative factor and show the negative correlation (Dahal et al. 2008). The difference

between W? and W- weights is known as the weights contrast (C) that reflects the overall

spatial association between a predictable variable and landslide occurrence.

4.2.2 Frequency ratio

FR calculates the probabilistic relationship between dependent (landslides) and indepen-

dent (slope, aspect, lithology, etc.) variables. FR analysis is used to estimate the densities

of landslide occurrence within each parameter classes and factor weights based on class

distribution and the landslide density. FR results give the relative susceptibility to landslide

occurrence. The greater result in the index represents the higher landslide risk, and the

lower value in the index shows the lower risk. After calculation of the rates, parameters are

normalized and aggregated to create landslide susceptibility map. To obtain the landslide

susceptibility index (LSI), below equation is applied.

LSI ¼XðFRÞi i ¼ 1; 2; . . .nð Þ ð10Þ

where FR is the FR of a factor and n is the total number of the input factors.

4.2.3 Statistical index

SI method, proposed by Van Westen (1997), was employed in this study under a GIS

environment to produce landslide susceptibility map. In this method, weights for each

parameter class (lithology, slope, aspect, etc.) are defined as the natural logarithm of the

landslide density class and divided by the landslide density in the entire map. The weights

of natural algorithms values will be negative when the landslide density is lower than

normal and positive when it is higher than normal. This method is based upon the formula

given below.

wi ¼ lndensClass

densMap

� �¼ ln

NpixðSiÞNpixðNiÞ

� �� PNpixðSiÞPNpixðNiÞ

� �� ð11Þ

where wi is the weight given to the parameter class, densClass is the landslide density within

the parameter class and densMap is the landslide density within the entire map. Npix (Si) is

the number of landslide pixels in parameter class i, and Npix (Ni) is the total number of pixels

in the same parameter class. The SI method is based on statistical correlation of the landslide

inventory map with the illustrative attributes of the parameter maps. It means that the wi is

only calculated for landslide-occurred classes. If the parameter class contains no landslide

occurrence, it will have no correlation with the landslide inventory (Bui et al. 2011).

5 Result and discussion

Performances of the multivariate and bivariate methods were analyzed in this study for a

landslide susceptibility mapping problem of Duzkoy district of Trabzon province, Turkey.

Nat Hazards (2015) 76:471–496 481

123

While SVR, LR and DTs were applied as the multivariate methods, FR, SI and WOE were

evaluated as bivariate methods. It should be noted that produced landslide susceptibility

maps were reclassified as very high, high, moderate, low and very low susceptibility

classes using natural break approach in ArcGIS software.

5.1 Multivariate methods

SVR process was carried out using radial basis function kernel with e-insensitive loss

function to produce landslide susceptibility map of the study area (Fig. 4a). Regularization

parameter C, threshold value e and the radial basis function parameter known as kernel

width c were selected by employing the cross-validation strategy which divides training

data randomly into k folds (subsets) of equal size. Each set is used as the validation data

once, while the remaining data are used as the training data. The training process is

repeated k times so that all folds are used for testing, and the generalization performance is

evaluated using the root-mean-square error (RMSE). Finally, the average validation rate is

used to determine the requested parameter combination (Ito and Nakano 2003; Wu and

Wang 2009). After applying tenfold cross-validation, the optimum parameters of C, e and cwere determined as 125, 0.001 and 0.05, respectively. RMSE was calculated as 0.15 when

the trained SVR model was applied to the test data set. It should be noted that the final SVR

model consisted of 560 support vectors. When the SVR model was analyzed, it was

observed that slope, lithology and NDVI were the most effective factors while TWI and

aspect were the least effective factors. The developed regression model was applied to the

entire data set including nine landslide factor layers to calculate the susceptibilities of each

pixel. It can be seen from the figure that high and very high susceptible areas were mainly

located in the center of the study area. On the other hand, the northwest and southeast sides

of the study area were found to be less susceptible for landslide.

In performing LR analysis, the independent variables were slope, lithology, land cover,

aspect, soil thickness, drainage density, TWI and elevation, while the dependent variable

was landslide areas in the inventory map. The ground reference map comprising 833

landslide pixels and 1,251 non-landslide pixels was used to produce a dependent variable.

The coefficients obtained by MATLAB software for the final LR model are shown in

Table 1. While the positive regression coefficient indicates a positive relationship between

the landslide occurrences, negative values of the coefficients have negative relationship

with the landslide occurrence. When the coefficient values given in the table were ana-

lyzed, it was found that slope was the most contributing factor related to landslide

occurrence. Among the other parameters, NDVI and elevation were detected as other

effective factors. Also, the coefficients estimated for TWI and aspect were close to 0,

indicating minor impacts compared to the other factors. This finding coincides with the

results of SVR model. Landslide susceptibility map produced by LR was reclassified into

five susceptibility classes as very low, low, moderate, high and very high (Fig. 4b). It was

observed that high landslide susceptibility zones were generally situated in the central and

northeast part of the study area. On the other hand, the northeast of the study area was

generally covered by low susceptibility zones.

In this study, CART algorithm was applied to construct regression trees for determining

landslide susceptibility of the study area. Twoing index was used to construct CART tree

structure. Tenfold cross-validation strategy was used to define optimal regression tree

structure. Minimum number of observations per tree leaf was set to three, and all variables

were considered for each decision split in this study. Root-mean-square error was esti-

mated as 0.12 when the optimal regression tree model (Fig. 3) was applied to the test data

482 Nat Hazards (2015) 76:471–496

123

set. The regression tree structure was composed of 18 variables and 19 leaves representing

susceptibility level. The top-down induction of the DT indicates that variables in the higher

order of the tree structure were more important for analyzing the landslide susceptibility.

The tree structure showed that important factors related to landslide susceptibility were

ordered as follows: slope, lithology, NDVI, aspect, soil thickness, drainage density, TWI,

elevation and land cover. Among the other landslide-related factors, the slope was selected

as a root node by the algorithm. This finding could be a good indicator about the effec-

tiveness of the slope factor for the data set considered in this study.

Final regression model was applied to the whole data set to produce landslide sus-

ceptibility map (Fig. 4c). It can be easily seen from the figure that landslide susceptible

regions were mainly located in the center, also low and very low susceptible zones in the

northwest and southeast of the study area.

5.2 Bivariate methods

The application of FR method consists of three steps. First, ratio for landslide occurrence

and non-occurrence is calculated for each factor’s class. Second, ratio of each factor’s class

to the total area of the factor is determined, and, finally, FRs for each factor are calculated

by dividing the landslide occurrence ratio by the area ratio. Applying the main steps of the

method, FRs were estimated (Table 2). The FRs of each factor’s subclasses were evaluated

to determine the LSI, computed from the ratings of each factor’s type or range. Suscep-

tibility map was reclassified using natural break approach (Fig. 5a). As a result, the

southwest and central of the study area were mostly identified as very high and high

susceptible zones, while the northwest and southeast of the study area were largely

described as low and very low susceptibility zones.

To perform SI method, landslide density was determined using the landslide-occurred

areas (totally 833 pixels) for each parameter (Table 2). Then, weight of each value was

calculated, and all weighted factor maps were aggregated to produce landslide suscepti-

bility map. Resulting map was reclassified into five susceptibility class using natural breaks

(Fig. 5b). Similar to the previous susceptibility maps, central part of the study area was

generally identified as highly susceptible to landslides, while the north was determined as

low susceptible to landslides.

Table 1 Logistic regression coefficients estimated for landslide-conditioning factors

Factor Logistic regressioncoefficients

Wald Collinearity statistics

Tolerance VIF

Slope 1.7803 138.113 0.536 1.865

Lithology 0.4356 43.401 0.658 1.519

Land cover 0.5064 15.293 0.861 1.161

Aspect -0.0678 1.004 0.950 1.052

Soil thickness -0.1334 1.589 0.928 1.077

NDVI 1.2454 104.670 0.546 1.831

Drainage density 0.2648 3.776 0.686 1.458

TWI 0.0519 0.190 0.770 1.299

Elevation -0.6757 45.525 0.538 1.859

Intercept -14.5517 80.657

Nat Hazards (2015) 76:471–496 483

123

To construct a regression model using WOE algorithm, weight and contrast values were

calculated for each landslide-related factor (Table 2). Contrast value of C is positive for a

positive spatial association, and negative for a negative spatial association (Pradhan et al.

2010). If the C value is positive, the factor is favorable for the occurrence of landslides,

whereas if it is negative, it is unfavorable. Otherwise, C values close to zero indicate the

little relation to the occurrence of landslides. Some examples of relationships between

landslide and landslide-related factor are given (see Table 2). The contrast value is high in

Cru2 and Jlh formation, whereas c3, Cru1 and Cru4 are less vulnerable for lithology. In the

case of TWI, some classes in contrast to other factors are less vulnerable. It means that

TWI factor plays insignificant role in the WOE modeling process. Resulting susceptibility

map was reclassified to show five levels of susceptibility using natural break approach

(Fig. 5c).

Fig. 3 Decision tree model forlandslide susceptibilityassessment for the study area

484 Nat Hazards (2015) 76:471–496

123

Fig

.4

Lan

dsl

ide

susc

epti

bil

ity

map

pro

duce

db

ya

SV

R,

bL

Ran

dc

DT

Nat Hazards (2015) 76:471–496 485

123

Ta

ble

2W

eigh

tses

tim

ated

for

dat

ala

yer

sb

ased

on

lan

dsl

ide-

occ

urr

edla

nd

s

Cla

ssN

o.

of

pix

els

Per

cen

tag

eo

fcl

ass

(a)

No

.o

fla

nd

slid

eP

erce

nta

ge

of

lan

dsl

ide

(b)

FR

SI

W?

W-

C

Slo

pe

(�)

0.0

1–

8.5

63

5,3

09

9.7

02

02

.40

0.2

5-

1.4

0-

1.4

00

.08

-1

.47

8.5

6–

14

.17

61

,83

41

6.9

83

94

.68

0.2

8-

1.2

9-

1.2

90

.14

-1

.43

14

.17

–1

9.3

06

5,2

97

17

.93

91

10

.92

0.6

1-

0.5

0-

0.5

00

.08

-0

.58

19

.30

–2

4.1

96

1,2

45

16

.82

11

01

3.2

10

.79

-0

.24

-0

.24

0.0

4-

0.2

8

24

.19

–2

8.8

35

8,2

17

15

.99

18

92

2.6

91

.42

0.3

50

.35

-0

.08

0.4

3

28

.83

–3

3.7

14

8,1

14

13

.22

23

52

8.2

12

.13

0.7

60

.76

-0

.19

0.9

5

33

.71

–4

0.0

72

7,3

38

7.5

11

28

15

.37

2.0

50

.72

0.7

2-

0.0

90

.80

40

.07

–6

2.0

56

,72

31

.85

21

2.5

21

.37

0.3

10

.31

-0

.01

0.3

2

Lit

holo

gy

Gam

a38

23

0.2

30

0.0

00

.00

0.0

00

.00

0.0

00

.00

Ev

10

2,2

07

28

.07

10

.12

0.0

0-

5.4

5-

5.4

50

.33

-5

.78

Gam

a2a

19

,28

95

.30

00

.00

0.0

00

.00

0.0

00

.05

-0

.05

Kru

5c

19

,28

05

.30

35

4.2

00

.79

-0

.23

-0

.23

0.0

1-

0.2

4

Cru

47

,69

02

.11

00

.00

0.0

00

.00

0.0

00

.02

-0

.02

Cru

35

9,2

61

16

.28

44

5.2

80

.32

-1

.13

-1

.13

0.1

2-

1.2

5

Cru

24

6,6

10

12

.80

43

05

1.6

24

.03

1.3

91

.39

-0

.59

1.9

8

Cru

18

3,0

88

22

.82

18

92

2.6

90

.99

-0

.01

-0

.01

0.0

0-

0.0

1

Jcr

13

,52

93

.72

59

7.0

81

.91

0.6

50

.65

-0

.04

0.6

8

Jlh

12

,30

33

.38

75

9.0

02

.66

0.9

80

.98

-0

.06

1.0

4

486 Nat Hazards (2015) 76:471–496

123

Ta

ble

2co

nti

nued

Cla

ssN

o.

of

pix

els

Per

cen

tag

eo

fcl

ass

(a)

No

.o

fla

nd

slid

eP

erce

nta

ge

of

lan

dsl

ide

(b)

FR

SI

W?

W-

C

Lan

dco

ver

Gre

ente

a6

84

0.1

90

0.0

00

.00

0.0

00

.00

0.0

00

.00

Haz

eln

ut

5,9

91

1.6

52

12

.52

1.5

31

.00

0.4

3-

0.0

10

.44

Dec

iduo

us

97

,68

32

6.8

32

00

24

.01

0.8

9-

0.1

1-

0.1

10

.04

-0

.15

Co

nif

ero

us

76

,59

02

1.0

48

81

0.5

60

.50

1.0

0-

0.6

90

.12

-0

.81

Pas

ture

13

1,2

55

36

.05

27

63

3.1

30

.92

-0

.08

-0

.08

0.0

4-

0.1

3

Ro

cky

lan

ds

13

,47

13

.70

47

5.6

41

.52

0.4

20

.42

-0

.02

0.4

4

Wat

er1

1,9

24

3.2

85

66

.72

2.0

50

.72

0.7

2-

0.0

40

.76

Agri

cult

ura

lla

nds

26,2

85

7.2

2144

17.2

92.3

90.8

70.8

7-

0.1

10

.99

Urb

anla

nd

s1

97

0.0

51

0.1

22

.22

0.8

00

.80

0.0

00

.80

Asp

ect

Fla

t7

30

0.2

00

0.0

00

.00

0.0

00

.00

0.0

00

.00

N5

0,4

24

13

.85

11

51

3.8

11

.00

2.0

00

.00

0.0

00

.00

NE

43

,21

71

1.8

71

10

13

.21

1.1

10

.11

0.1

1-

0.0

20

.12

E4

6,3

22

12

.72

20

92

5.0

91

.97

2.0

00

.68

-0

.15

0.8

3

SE

43

,79

91

2.0

35

97

.08

0.5

9-

0.5

3-

0.5

30

.05

-0

.58

S3

4,8

21

9.5

62

22

.64

0.2

8-

1.2

9-

1.2

90

.07

-1

.36

SW

37

,15

41

0.2

01

72

.04

0.2

0-

1.6

1-

1.6

10

.09

-1

.70

W5

0,1

24

13

.77

95

11

.40

0.8

3-

0.1

9-

0.1

90

.03

-0

.22

NW

57

,48

71

5.7

92

06

24

.73

1.5

70

.45

0.4

5-

0.1

10

.56

Nat Hazards (2015) 76:471–496 487

123

Ta

ble

2co

nti

nued

Cla

ssN

o.

of

pix

els

Per

cen

tag

eo

fcl

ass

(a)

No

.o

fla

nd

slid

eP

erce

nta

ge

of

lan

dsl

ide

(b)

FR

SI

W?

W-

C

So

ilth

ick

nes

s(c

m)/

Slo

pe

(�)

0–

2/[

30

52

,69

81

4.4

71

67

20

.05

1.3

90

.33

0.3

3-

0.0

70

.39

0–

20

/[3

09

,49

32

.61

00

.00

0.0

00

.00

0.0

00

.03

-0

.03

20

–5

0/1

2–

20

16

,71

64

.59

98

11

.76

2.5

60

.94

0.9

4-

0.0

81

.02

20

–5

0/2

0–

30

1,1

65

0.3

20

0.0

00

.00

0.0

00

.00

0.0

00

.00

20

–5

0/[

30

17

9,3

85

49

.27

38

54

6.2

20

.94

3.0

0-

0.0

60

.06

-0

.12

50

–9

0/6

–1

21

,64

80

.45

96

11

.52

25

.46

3.2

43

.24

-0

.12

3.3

6

50

–9

0/1

2–

20

12

,55

83

.45

00

.00

0.0

00

.00

0.0

00

.04

-0

.04

50

–9

0/2

0–

30

65

,81

61

8.0

88

71

0.4

40

.58

3.0

0-

0.5

50

.09

-0

.64

50

–9

0/[

30

24

,60

16

.76

00

.00

0.0

00

.00

0.0

00

.07

-0

.07

ND

VI

-0

.08–

0.2

83

,62

00

.99

10

.12

0.1

2-

2.1

1-

2.1

10

.01

-2

.12

0.2

8–

0.3

51

2,1

48

3.3

49

1.0

80

.32

4.0

0-

1.1

30

.02

-1

.15

0.3

8–

0.4

12

4,3

27

6.6

81

72

.04

0.3

1-

1.1

9-

1.1

90

.05

-1

.23

0.4

1–

0.4

64

5,8

23

12

.59

52

6.2

40

.50

4.0

0-

0.7

00

.07

-0

.77

0.4

6–

0.5

26

2,5

89

17

.19

86

10

.32

0.6

0-

0.5

1-

0.5

10

.08

-0

.59

0.5

2–

0.5

77

5,9

41

20

.86

15

41

8.4

90

.89

-0

.12

-0

.12

0.0

3-

0.1

5

0.5

7–

0.6

37

9,4

66

21

.83

20

12

4.1

31

.11

0.1

00

.10

-0

.03

0.1

3

0.6

3–

0.7

46

0,1

65

16

.53

31

33

7.5

82

.27

0.8

20

.82

-0

.29

1.1

1

Dra

inag

eD

en.

(km

-1)

0.0

–0

.33

20

4,3

14

56

.12

25

03

0.0

10

.53

5.0

0-

0.6

3-

0.9

50

.33

0.3

3–

0.9

74

9,2

37

13

.52

99

11

.88

0.8

8-

0.1

3-

0.1

3-

0.9

30

.80

0.9

7–

1.6

34

2,5

96

11

.70

14

71

7.6

51

.51

0.4

10

.41

-1

.01

1.4

2

1.6

3–

2.4

55

1,9

00

14

.26

33

74

0.4

62

.84

1.0

41

.04

-1

.32

2.3

6

2.4

5–

4.6

61

6,0

33

4.4

00

0.0

00

.00

0.0

00

.00

-0

.85

0.8

5

488 Nat Hazards (2015) 76:471–496

123

Ta

ble

2co

nti

nued

Cla

ssN

o.

of

pix

els

Per

cen

tag

eo

fcl

ass

(a)

No

.o

fla

nd

slid

eP

erce

nta

ge

of

lan

dsl

ide

(b)

FR

SI

W?

W-

C

TW

I-

0.4

2to

-0

.16

15

40

.05

00

.00

0.0

06

.00

0.0

00

.00

0.0

0

-0

.16–

1.7

89

2,8

39

32

.62

22

32

6.7

70

.82

-0

.20

0.0

50

.08

-0

.03

1.7

8–

3.7

11

37

,01

14

8.1

43

41

40

.94

0.8

56

.00

0.0

80

.13

-0

.05

3.7

1–

5.6

43

3,5

53

11

.79

72

8.6

40

.73

-0

.31

-0

.06

0.0

4-

0.1

0

5.6

4–

7.5

71

1,3

66

3.9

92

42

.88

0.7

26

.00

-0

.08

0.0

1-

0.0

9

7.5

7–

17

.18

9,7

05

3.4

13

64

.32

1.2

70

.24

0.4

8-

0.0

10

.49

Ele

vat

ion

(m)

23

0–

50

07

,32

12

.01

00

.00

0.0

00

.00

0.0

00

.02

-0

.02

50

0–

70

01

3,0

59

3.5

98

91

0.6

82

.98

1.0

91

.09

0.1

50

.94

70

0–

90

02

0,8

14

5.7

23

13

.72

0.6

5-

0.4

3-

0.4

30

.10

-0

.53

90

0–

11

00

36

,97

21

0.1

52

93

.48

0.3

47

.00

-1

.07

0.1

4-

1.2

1

1,1

00

–1

,30

04

7,1

14

12

.94

24

32

9.1

72

.25

0.8

10

.81

0.4

80

.33

1,3

00

–1

,50

05

4,9

68

15

.10

17

22

0.6

51

.37

7.0

00

.31

0.3

9-

0.0

8

1,5

00

–1

,70

06

2,8

36

17

.26

24

12

8.9

31

.68

0.5

20

.52

0.5

3-

0.0

1

1,7

00

–1

,90

06

8,6

16

18

.85

28

3.3

60

.18

7.0

0-

1.7

20

.24

-1

.97

1,9

00

–2

,10

04

7,4

73

13

.04

00

.00

0.0

00

.00

0.0

00

.14

-0

.14

2,1

00

–2

,28

64

,90

51

.35

00

.00

0.0

00

.00

0.0

00

.01

-0

.01

Nat Hazards (2015) 76:471–496 489

123

Fig

.5

Lan

dsl

ide

susc

epti

bil

ity

map

pro

duce

db

ya

FR

,b

SI

and

cW

OE

490 Nat Hazards (2015) 76:471–496

123

Accuracy statistics (i.e., overall accuracy and kappa coefficient) and ROC curves were

calculated using the test data sets to analyze the results produced by the bivariate and

multivariate methods considered in this study. All landslide susceptibility maps were

categorized into five susceptibility levels as: very low, low, moderate, high and very high.

It should be noted that the lands classified as very high, high and moderate were considered

as landslide zones and the rest (i.e., low and very low) was considered as non-landslide

zones in accuracy assessment stage. The overall accuracy values were calculated as 94.434,

93.858, 89.827, 85.077, 82.486 and 81.833 % for SVR, DT, LR, FR, SI and WOE,

respectively (Table 3). It can be seen that multivariate approaches (i.e., SVR, LR and DT)

produced significantly higher accuracies than the bivariate ones (i.e., FR, SI and WOE) for

all cases. Considering the overall accuracies and kappa coefficients, SVR and LR methods

produced similar performances, but outperforming other methods (i.e., DT, FR, SI and

WOE).

For the validation of the results, area under the ROC curve or simply AUC was also

applied. In the ROC analysis, a susceptibility map is compared with a data set showing the

landslide/non-landslide of occurrences in the same area. While AUC values between 0.7

and 0.9 indicate reasonable discrimination ability, values higher than 0.9 show typical of

highly accurate classification models (Swets 1988). AUC values \0.5 indicate that per-

formance of the methods has no power to discriminate. In this study, the ROC curves were

plotted based on the number of correctly classified pixels (true-positive) and the number of

the incorrectly identified pixels (false-positive). The AUC values of the ROC curve for

SVR, LR, DT, FR, SI and WOE methods were estimated as 0.985, 0.984, 0.980, 0.921,

0.911 and 0.893, respectively (Fig. 6). These results indicated that SVR, LR and DT

methods were effective for determining landslide susceptibility in the study area, but SVR

produced the best score within the multivariate analysis methods. When comparing

bivariate methods to each other based on AUC values, the FR method was found to be the

most effective one.

In addition to the assessments of modeling performances using standard accuracy

metrics (i.e., overall accuracy, kappa coefficient and AUC values), McNemar’s test was

employed to analyze statistical significance of differences in modeling performances of

multivariate and bivariate methods. The McNemar’s Chi-squared statistic (Eq. 12) is a

nonparametric test applied to 2 9 2 contingency table.

v2 ¼nij � nji

�� 1� �2

nij þ nji

ð12Þ

where nij denotes the number of pixels that are misclassified by method i, but correctly

classified by method j and nji denotes the number of pixels that are misclassified by method

j, but correctly classified by method i. (Japkowicz and Shah 2011). If the observed statistic

Table 3 Susceptibility mappingresults in terms of overall accu-racy and kappa coefficient values

Methods Overall accuracy (%) Kappa

Support vector regression 94.434 0.882

Logistic regression 93.858 0.869

Decision trees 89.827 0.784

Frequency ratio 85.077 0.687

Statistical index 82.486 0.635

Weight of evidence 81.883 0.612

Nat Hazards (2015) 76:471–496 491

123

value estimated with Eq. 12 is larger than v21;0:05 ¼ 3:84, the null hypothesis can be

rejected with 95 % confidence level. In other words, methods i and j differ in their per-

formances, so the difference in accuracy is said to be statistically significant.

McNemar’s Chi-squared test was applied to susceptibility maps, and statistical results as

a symmetric matrix are given in Table 4. It should be noted that the calculated statistics

greater than the critical Chi-squared table value (v21;0:05 ¼ 3:84) are shown in bold in the

table. When the performances of SVR and LR were compared, they found to be producing

statistically similar performances (1.04 \ 3.84). When the performances of bivariate

approaches were analyzed with each other, it was found that the SI method showed

statistically similar performance with FR and WOE methods.

Fig. 6 ROC statistics for the methods used in landslide susceptibility assessment

Table 4 McNemar’s statistic test results for the multivariate and bivariate approaches

SVR LR DT FR SI WOE

SVR – 1.04 10.20 9.00 11.35 13.05

LR – 7.50 6.93 10.58 12.43

DT – 37.44 6.32 11.19

FR – 2.54 7.78

SI – 0.54

WOE –

Please note that calculated statistics greater than the critical value v21;0:05 ¼ 3:84

�, indicating statistical

significance, are shown in bold

492 Nat Hazards (2015) 76:471–496

123

6 Conclusions

Landslide susceptibility assessment is a complex and multi-step process that has been

investigated by many researchers in the literature. Up to now, a variety of methods have

been suggested for estimation of landslide susceptibility and their performances were

analyzed based on various statistical measures. In this study, performances of bivariate and

multivariate approaches were evaluated for the determination of landslide susceptibility of

Duzkoy district of Trabzon province, Turkey. These approaches were assessed using slope,

lithology, land cover, aspect, soil thickness, drainage density, TWI and elevation factors.

SVR, LR and DT methods were applied as multivariate approaches, while FR, SI and

WOE methods were used as bivariate approaches. Overall accuracy, kappa coefficient and

ROC curves were employed in the stage of performance evaluation. In addition to these

performance evaluation measures, McNemar’s test statistic was applied to assess the

statistical significance of the differences in method performances.

When the results of multivariate and bivariate methods were analyzed, some important

findings were deduced. Firstly, the multivariate methods (i.e., SVR, LR, DT) clearly

outperformed bivariate methods for all cases, reaching up to 13 % overall accuracy. The

results of the ROC curves also confirmed this finding. Also, McNemar’s statistical test

showed that the accuracy level reached by the multivariate methods compared with the

bivariate methods was statistically significant. Secondly, results showed that the FR

method produced the best performance (85.1 %) among the bivariate methods. When the

statistical significance of differences in performances of bivariate methods was analyzed, it

was found that the FR method was superior to the WOE method, whereas performance

difference with the SI method was statistically insignificant. Thirdly, it was seen that the

SVR and LR methods showed similar performances (94.4 and 93.9 %, respectively),

significantly higher than the DT method. Also, the statistical test results supported the

finding that difference in performances of SVR and LR methods was statistically insig-

nificant. It should be noted that while the LR method has a simple mathematical structure

that can be easily programmed, the SVR method requires user defined parameters

depending on the selected kernel function that highly affects its performance. Finally,

when the produced landslide susceptibility maps were analyzed, it was observed that high

susceptibility sites were mostly situated in the region between northeast and southwest of

the study area. When the susceptibility maps were analyzed in detail, it was found that

landslide susceptible sites were mainly located on the lithological units of Cru2 and Cru1

and the slope angles between 20� and 35�, which points out the latest landslide events in

the study area. In addition, it was observed that the northwestern-facing and north-facing

parts of the study area located at elevations of between 800 and 1,400 m carry the highest

hazard potential. When the susceptibility maps were analyzed in terms of land cover types,

it was observed that landslides in the region generally occurred in pasture and deciduous

lands. In addition, it was found that high susceptibility areas were mainly observed on the

land having characteristics [20� slope terrain and 20–50 cm soil depth.

Landslide susceptibility information is most commonly required at the local government

level for planning urban development particularly in developing countries. This informa-

tion is also vital for disaster management planning made by state agencies. Findings in this

study showed that the most of the urban lands and main roads in the region were located in

very high and high susceptibility zones. Therefore, future landslide activities may cause

major damages or casualties in the study area. Landslide susceptibility maps produced in

this study provide invaluable information in developing strategies for disaster mitigation

works and future investigation for provincial authorities.

Nat Hazards (2015) 76:471–496 493

123

References

Abdallah C (2010) Spatial distribution of block falls using volumetric GIS-decision-tree models. Int J ApplEarth Obs 12:393–403

Akgun A (2012) A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at Izmir. Turkey. Landslides 9(1):93–106

Akgun A, Dag S, Bulut F (2008) Landslide susceptibility mapping for a landslide-prone area (Findikli, NEof Turkey) by likelihood-frequency ratio and weighted linear combination models. Environ Geol54:1127–1143

Aleotti P, Chowdhury R (1999) Landslide hazard assessment: summary review and new perspectives. BullEng Geol Env 58:21–44

Alparslan E (2011) Landslide susceptibility mapping in Yalova, Turkey, by remote sensing and GIS.Environ Eng Geosci 17:255–265

Althuwaynee OF, Pradhan B, Park HJ, Lee JH (2014) A novel ensemble bivariate statistical evidential belieffunction with knowledge-based analytical hierarchy process and multivariate statistical logisticregression for landslide susceptibility mapping. Catena 114:21–36

Armas I (2012) Weights of evidence method for landslide susceptibility mapping. Prahova Subcarpathians.Romania. Nat Hazards 60:937–950

Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibilitymapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31

Ayalew L, Yamagishi H, Ugawa N (2004) Landslide susceptibility mapping using GIS-based weightedlinear combination, the case in Tsugawa area of Agano River, Niigata Prefecture, Japan. Landslides1:73–81

Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: the StafforaRiver Basin case study, Italy. Math Geosci 44:47–70

Beven KJ, Kirkby MJ (1979) A physically based, variable contributing area model of basin hydrology.Hydrol Sci Bull 24:43–69

Bonham-Carter GF (1994) Geographic information systems for geoscientists: modelling with GIS. Perg-amon, Oxford

Breiman L (1996) Bagging predictors. Mach Learn 24:123–140Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth,

BelmontBui DT, Lofman O, Revhaug I, Dick O (2011) Landslide susceptibility analysis in the Hoa Binh province of

Vietnam using statistical index and logistic regression. Nat Hazards 59:1413–1444Bui DT, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in Vietnam using

support vector machines, decision tree, and naive Bayes models. Math Probl Eng. vol. 2012, Article ID974638, doi:10.1155/2012/974638

Burges CJC, Scholkopf B (1997) Improving the accuracy and speed of support vector learning machine. In:Mozer MC, Jordan MI, Petsche T (ed) Advances in neural information processing systems 9. Cam-bridge, MIT Press, pp 375–381

Cevik E, Topal T (2003) GIS-based landslide susceptibility mapping for a problematic segment of thenatural gas pipeline, Hendek (Turkey). Environ Geol 44:949–962

Chang YL, Liang LS, Han CC, Fang JP, Liang WY, Chen KS (2007) Multisource data fusion for landslideclassification using generalized positive Boolean functions. IEEE T Geosci Remote 45(6):1697–1708

Costanzo D, Rotigliano E, Irigaray C, Jimenez-Peralvarez JD, Chacon J (2012) Factors selection in landslidesusceptibility modelling on large scale following the GIS matrix method: application to the river Beirobasin (Spain). Nat Hazard Earth Sys 12:327–340

Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge UniversityPress, Cambridge

Dahal RK, Hasegawa S, Nonomura A, Yamanaka M, Masuda T, Nishino K (2008) GIS-based weights-of-evidence modelling of rainfall-induced landslides in small catchments for landslide susceptibilitymapping. Environ Geol 54:311–324

Dai FC, Lee CF, Li J, Xu ZW (2001) Assessment of landslide susceptibility on the natural terrain of LantauIsland, Hong Kong. Environ Geol 40:381–391

Demir G, Aytekin M, Akgun A, Ikizler SB, Tatar O (2013) A comparison of landslide susceptibilitymapping of the eastern part of the North Anatolian Fault Zone (Turkey) by likelihood-frequency ratioand analytic hierarchy process methods. Nat Hazards 65:1481–1506

Devkota KC, Regmi AD, Pourghasemi HR, Yoshida K, Pradhan B, Ryu IC, Dhital MR, Althuwaynee OF(2013) Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression

494 Nat Hazards (2015) 76:471–496

123

http://dx.doi.org/10.1155/2012/974638

models in GIS and their comparison at Mugling-Narayanghat road section in Nepal Himalaya. NatHazards 65(1):135–165

Durbha SS, King RL, Younan NH (2007) Support vector machines regression for retrieval of leaf area indexfrom multiangle imaging spectroradiometer. Remote Sens Environ 107:348–361

Ercanoglu M (2005) Landslide susceptibility assessment of SE Bartin (West Black Sea region, Turkey) byartificial neural networks. Nat Hazard Earth Sys 5(6):979–992

Fall M, Azzam R, Noubactep C (2006) A multi-method approach to study the stability of natural slopes andlandslide susceptibility mapping. Eng Geol 82:241–263

Gomez H, Kavzoglu T (2005) Assessment of shallow landslide susceptibility using artificial neural networksin Jabonosa River Basin, Venezuela. Eng Geol 78:11–27

Gray DH, Leiser AT (1982) Biotechnical slope protection and erosion control. Van Nostrand ReinholdCompany, New York

Grozavu A, Plescan S, Patriche CV, Margarint MC, Rosca B (2013) Landslide susceptibility assessment:GIS application to a complex mountainous environment. In: Kozak J et al (eds) The carpathians:integrating nature and society towards sustainability, environmental science and engineering. Springer,Berlin, pp 31–44

Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide hazard evaluation: a review of currenttechniques and their application in a multi-scale study, Central Italy. Geomorphology 31:181–216

Ho JY, Lee KT, Chang TC, Wang ZY, Liao YH (2012) Influences of spatial distribution of soil thickness onshallow landslide prediction. Eng Geol 124:38–46

Hwang S, Guevarra IF, Yu B (2009) Slope failure prediction using a decision tree: a case of engineeredslopes in South Korea. Eng Geol 104:126–134

Ito K, Nakano R (2003) Optimizing Support Vector regression hyperparameters based on cross-validation.Proceedings of the International Joint Conference on Neural Networks 1–4:2077–2082

Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. CambridgeUniversity Press, New York

Kanungo DP, Arora MK, Sarkar S, Gupta RP (2009) Landslide susceptibility zonation (LSZ) mapping: areview. J South Asia Disaster Stud 2:81–105

Kavzoglu T, Sahin EK, Colkesen I (2014) Landslide susceptibility mapping using GIS-based multi-criteriadecision analysis, support vector machines, and logistic regression. Landslides 11:425–439

Kouli M, Loupasakis C, Soupios P, Vallianatos F (2010) Landslide hazard zonation in high risk areas ofRethymno Prefecture, Crete Island, Greece. Nat Hazards 52:599–621

Menard S (2001) Applied logistic regression analysis, 2nd edn. Sage Publication, Thousand OaksNandi A, Shakoor A (2009) A GIS based landslide susceptibility evaluation using bivariate and multivariate

statistical analyses. Eng Geol 110:11–20Nefeslioglu HA, Duman TY, Durmaz S (2008) Landslide susceptibility mapping for a part of tectonic Kelkit

Valley (Eastern Black Sea region of Turkey). Geomorphology 94(3–4):401–418Nefeslioglu HA, Sezer E, Gokceoglu C, Bozkir AS, Duman TY (2010) Assessment of landslide suscepti-

bility by decision trees in the metropolitan area of Istanbul. Math Probl Eng, TurkeyNiuniu X, Yuxun L (2010) Review of decision trees. Computer science and information technology

(ICCSIT), 2010 3rd IEEE International Conference, pp. 105–109Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover

classification. Remote Sens Environ 86:554–565Peng L, Niu RQ, Huang B, Wu XL, Zhao YN, Ye RQ (2014) Landslide susceptibility mapping based on

rough set theory and support vector machines: a case of the Three Gorges area, China. Geomorphology204:287–301

Pourghasemi HR, Jirandeh AG, Pradhan B, Xu C, Gokceoglu C (2013) Landslide susceptibility mappingusing support vector machine and GIS at the Golestan Province, Iran. J Earth SystSci 122(2):349–369

Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machineand neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365

Pradhan B, Lee S (2010) Landslide susceptibility assessment and factor effect analysis: backpropagationartificial neural networks and their comparison with frequency ratio and bivariate logistic regressionmodelling. Environ Modell Softw 25:747–759

Pradhan B, Oh HJ, Buchroithner M (2010) Weights-of-evidence model applied to landslide susceptibilitymapping in a tropical hilly area. Geomat Nat Hazards Risk 3:199–223

Quinlan JR (1993) C4.5: programs for machine learning. Kaufmann Publishers, San MateoRegmi NR, Giardino JR, Vitek JD (2010) Modeling susceptibility to landslides using the weight of evidence

approach: western Colorado, USA. Geomorphology 115:172–187Rokach L, Maimon O (2008) Data mining with decision trees: theory and applications. World Scientific

Publishing, Singapore, Series in Machine Perception and Artificial Intelligence

Nat Hazards (2015) 76:471–496 495

123

Rupke J, Cammeraat E, Seijmonsbergen AC, Vanwesten CJ (1988) Engineering geomorphology of thewidentobel catchment, appenzell and sankt-gallen, gallen, Switzerland: a geomorphological inventorysystem applied to geotechnical appraisal of slope stability. Eng Geol 26:33–68

Saaty TL (1980) The analytic hierarchy process: planning, priority setting. Resource allocation, McGraw-Hill

Santacana N, Baeza B, Corominas J, De Paz A, Marturia J (2003) A GIS-based multivariate statisticalanalysis for shallow landslide susceptibility mapping in La Pobla de Lillet area (Eastern Pyrenees,Spain). Nat Hazards 30:281–295

Sarkar S, Kanungo DP (2004) An integrated approach for landslide susceptibility mapping using remotesensing and GIS. Photogramm Eng Rem S 70:617–625

Sarkar S, Kanungo DP, Mehrotra GS (1995) Landslide Hazard Zonation: a case Study in Garhwal Himalaya.India, Mountain Research and Development 15:301–309

Schicker R, Moon V (2012) Comparison of bivariate and multivariate statistical approaches in landslidesusceptibility mapping at a regional scale. Geomorphology 161:40–57

Scholkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimizationand beyond. MIT Press, Cambridge

Segoni S, Rossi G, Catani F (2012) Improving basin-scale shallow landslides modelling using reliable soilthickness maps. Nat Hazards 61:85–101

Singhal PK, Srivastava P (2004) Challenges in sustainable development. Anmol publication, IndiaSmola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222Soeters R, Van Westen CJ (1996) Slope instability recognition analysis and zonation. In: Turner KT,

Schuster RL, editors. Landslides: investigation and mitigation. Transportation Research BoardNational Research Council, Special Report No 247, Washington, DC, pp. 129–177

Suzen ML, Doyuran V (2004) Data driven bivariate landslide susceptibility assessment using geographicalinformation systems: a method and application to Asarsuyu catchment, Turkey. Eng Geol 71:303–321

Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293Thiery Y, Malet JP, Sterlacchini S, Puissant A, Maquaire O (2007) Landslide susceptibility assessment by

bivariate methods at large scales: application to a complex mountainous environment. Geomorphology92:38–59

Van Westen CJ (1997) Statistical landslide hazard analysis ILWIS 2.1 for windows application guide. ITCPublication, Enschede

Van Westen CJ, Rengers N, Soeters R (2003) Use of geomorphological information in indirect landslidesusceptibility assessment. Nat Hazards 30:399–419

Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer-Verlag, New YorkVivas L (1992) Los andes venezolanos. Academia Nacional de la Historia, CaracasWu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster

distance in the feature space. Pattern Recogn 42(5):710–717Xu M, Watanachaturaporn P, Varshney PK, Arora MK (2005) Decision tree regression for soft classification

of remote sensing data. Remote Sens Environ 97:322–336Yalcin A (2008) GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate

statistics in Ardesen (Turkey): comparisons of results and confirmations. Catena 72:1–12Yalcin A, Reis S, Aydinoglu AC, Yomralioglu T (2011) A GIS-based comparative study of frequency ratio,

analytical hierarchy process, bivariate statistics and logistics regression methods for landslide sus-ceptibility mapping in Trabzon. NE Turkey. Catena 85(3):274–287

Yao X, Tham LG, Dai FC (2008) Landslide susceptibility mapping based on support vector machine: a casestudy on natural slopes of Hong Kong, China. Geomorphology 101:572–582

Yeh CY, Huang CW, Lee SJ (2011) A multiple-kernel support vector regression approach for stock marketprice forecasting. Expert Syst Appl 38:2177–2186

Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic regression andneural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79:251–266

Yilmaz I (2009) Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neuralnetworks and their comparison: a case study from Kat landslides (Tokat-Turkey). Comput Geosci35:1125–1138

Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey:conditional probability, logistic regression, artificial neural networks, and support vector machine.Environ Earth Sci 61:821–836

Yilmaz C, Topal T, Suzen ML (2012) GIS-based landslide susceptibility mapping using bivariate statisticalanalysis in Devrek (Zonguldak-Turkey). Environ Earth Sci 65:2161–2178

496 Nat Hazards (2015) 76:471–496

123

An assessment of multivariate and bivariate approaches in landslide susceptibility mapping: a case...

Documents

Transcript of An assessment of multivariate and bivariate approaches in landslide susceptibility mapping: a case...