Bayesian Space-time modelling of malaria incidence in Sucre state-Venezuela

24
Noname manuscript No. (will be inserted by the editor) Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela SPATIAL SPECIAL ISSUE Desir´ ee Villalta · Lelys Guenni · Yasmin Rubio-Palis · Ra´ ul Ram´ ırez Arbel´ aez Received: date / Accepted: date Abstract Malaria is a parasitic infectious tropical disease that causes high mortality rates in the tropical belt. In Venezuela, Sucre state is considered the third state with most disease prevalence. This paper presents a hierarchical re- gression log-Poisson space-time model within a Bayesian approach to represent the incidence of malaria in Sucre state, Venezuela, during the period 1990-2002 in 15 municipalities of the state. For a full Bayesian modeling, the logarithm of the relative risk of the disease for each district is expressed as an additive model that includes a multiple regression with socio-economic and climatic co- variates; a random effect that captures the spatial heterogeneity in the study Desir´ ee Villalta Departamento de omputo Cient´ ıfico y Estad´ ıstica, Divisi´on de Ciencias ısicas y Matem´ aticas, Universidad Sim´on Bolvar, Caracas-Venezuela Tel.: +58-212-9063233 Fax: +58-212-9063232 E-mail: [email protected] Lelys Guenni Departamento de omputo Cient´ ıfico y Estad´ ıstica, Divisi´on de Ciencias ısicas y Matem´ aticas, Universidad Sim´on Bolvar, Caracas-Venezuela Tel.: +58-212-9063233 Fax: +58-212-9063232 E-mail: [email protected] Yasmin Rubio-Palis Direcci´ on de Salud Ambiental, MPP Salud/Biomed-Universidad de Carabobo, Maracay- Venezuela Tel.: +58-212-243-241-9997 Fax: +58-212-243-241-3876 E-mail: [email protected] Ra´ ul Ram´ ırez Arbel´ aez Centro de Estad´ ıstica y Software Matem´ atico, Universidad Sim´on Bol´ ıvar, Caracas- Venezuela Tel.: +58-212-9063233 Fax: +58-212-9063232 E-mail: [email protected]

Transcript of Bayesian Space-time modelling of malaria incidence in Sucre state-Venezuela

Noname manuscript No.(will be inserted by the editor)

Bayesian Space-Time modelling of malaria incidence

in Sucre state-Venezuela

SPATIAL SPECIAL ISSUE

Desiree Villalta · Lelys Guenni · Yasmin

Rubio-Palis · Raul Ramırez Arbelaez

Received: date / Accepted: date

Abstract Malaria is a parasitic infectious tropical disease that causes highmortality rates in the tropical belt. In Venezuela, Sucre state is considered thethird state with most disease prevalence. This paper presents a hierarchical re-gression log-Poisson space-time model within a Bayesian approach to representthe incidence of malaria in Sucre state, Venezuela, during the period 1990-2002in 15 municipalities of the state. For a full Bayesian modeling, the logarithmof the relative risk of the disease for each district is expressed as an additivemodel that includes a multiple regression with socio-economic and climatic co-variates; a random effect that captures the spatial heterogeneity in the study

Desiree VillaltaDepartamento de Computo Cientıfico y Estadıstica, Division de Ciencias Fısicas yMatematicas, Universidad Simon Bolvar, Caracas-VenezuelaTel.: +58-212-9063233Fax: +58-212-9063232E-mail: [email protected]

Lelys GuenniDepartamento de Computo Cientıfico y Estadıstica, Division de Ciencias Fısicas yMatematicas, Universidad Simon Bolvar, Caracas-VenezuelaTel.: +58-212-9063233Fax: +58-212-9063232E-mail: [email protected]

Yasmin Rubio-PalisDireccion de Salud Ambiental, MPP Salud/Biomed-Universidad de Carabobo, Maracay-VenezuelaTel.: +58-212-243-241-9997Fax: +58-212-243-241-3876E-mail: [email protected]

Raul Ramırez ArbelaezCentro de Estadıstica y Software Matematico, Universidad Simon Bolıvar, Caracas-VenezuelaTel.: +58-212-9063233Fax: +58-212-9063232E-mail: [email protected]

2 Desiree Villalta et al.

region, and a CAR (Conditionally Autoregressive)component that recognizesthe effect of nearby municipalities in the transmission of the disease each year.Model estimation and predictive inference was carried out through the imple-mentation of a computer code in the WinBUGS software, which makes useof Markov Chains Monte Carlo (MCMC) methods. For model selection thecriterion of minimum Posteriori Predictive Loss (D) was used. The Moran’sI statistic was calculated to test the independence of the residuals of the re-sulting model. Finally, we verify the model fit by using the Bayesian p-value,and in most cases the selected model captures the spatial structure betweenthe relative risks from the nearby municipalities each year. For years with apoor model fit, the t-Student distribution is used as an alternative model forthe spatial local random effect with better fit to the tail behavior of the dataprobability distribution.

Keywords Bayesian hierarchical models · log-Poisson regression model ·malaria · areal models.

1 Introduction

Malaria or paludism is an infectious disease whose transmission occurs mainlyin tropical and subtropical regions [25]. Despite the long term efforts to min-imize its impacts on society, this disease continue to be of high risk for alarge number inhabitants in different continents. The disease continues to bean important pathology for the human being and it is concentrated in greatextensions of Central and South America, Haitı and Dominican Republic inthe Caribbean, the African continent, India and South-East Asia, the MiddleEast and the Oceanic countries [26]. During year 2006 the disease was presentin 109 countries and its territories.

Following estimations fromWHO/PAHO (World Health Organization/Pan-American Health Organization), every year between 300 and 500 millions ac-quire the infection and more than 1 million deaths can be attributed to thedisease [26].

Social crisis, uncontrolled migrations, environmental degradation, deficientsanitary services, proliferation of poor households, uneducated population andlack of governmental actions to control the disease are some of the factors con-tributing to its spread worldwide.

There are many examples in the literature which demonstrate the im-portance of different factors in its spatial and temporal dynamics, as social-economics variables, climatic conditions, migrations from and towards themalaric regions, and the anthropogenic factors modifying the landscape.

As mentioned by [14], vector transmitted diseases are highly complex anddynamic systems in time and space. In this research a space-time model for the

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 3

incidence of malaria in Sucre state Venezuela during the period 1990− 2002 ispresented. The aim of the modeling effort is to understand the main covariablesdriving the dynamics of the disease and to provide a spatial representation ofthe relative risk difference among the different municipalities or districts con-forming.

Malaria incidence in Sucre has always been of big concern due to the im-portant number of cases reported in the state each year. This is reflected inthe following studies: [2], [6], [8], [7], [3], [14], where many authors concludethat disease eradication might result in a paramount task due to the alarmingpoverty indicators of this state and the environmental and social conditions ofits population which favor the proliferation of the disease transmission vector.

This paper is organized as follows: in Section 2 a description of the pro-posed model and the methodologies for model selection and model checkingare presented; in Sections 3 and 4, a more detailed description is given on thestudy region and all data sets used in the analysis respectively. In Section 5results of fitting the log-Poisson linear models and model selection by usingthe DIC and D criterions are shown, as well as the behavior of model residualsand the posterior predictive checks of the selected model using the Bayesianp− value. A model parameter interpretation by inspecting their posterior dis-tributions is attempted. Finally some conclusions and recommendations arepresented in Section .

2 Model Description

2.1 Standardized Malaria Incidence Rate

In general risk is defined as the combination of the probability to have anevent and its negative consequences1. The Standardized Incidence Rate (SIR)or Standardized Mortality Ratio (SMR), is the ratio between the number ofobserved disease cases (yi) and the expected number of cases in the region(Ei)[1], this is,

SMRi = Ψi =yiEi

i = 1, ..., k (1)

where k is the number of subregions (in our case the number of municipalities

or districts) and Ei = p∗.ni =∑

k

i=1 yi∑k

i=1 ni

.ni, being p∗ the total proportion of

disease incidence and ni is the total population in district i.

This incidence rate Ψi is an estimate of the relative risk of disease infesta-tion in the municipality i. A value grater than 1 indicates a disease incidencegreater than expected for a region, therefore this constitutes an alarm for the

1 UNISDR, Terminology about risk disaster reduction 2009 for the concepts about hazard,vulnerability and risk

4 Desiree Villalta et al.

public health authorities in charge [23], [16]. To improve this estimator, hier-archical Bayesian models have been developed which play an important role inmodeling the spatial data structure in epidemiology and overcome the prob-lems presented in the Standardized Incidence Rate estimation [16].

In Fig.1 an example of the raw relative risk calculation from the localincidence rate and the global observed rate is shown for all years and the 15municipalities of Sucre state. The eastern region of the state presents greatvariability in the relative risks from year to year; however the western regionpresents relative risks lower than 0.5. For all years the Cajigal district(see Fig.3), presents a relative risk greater than 1, which gives important indications toconsider this municipality as a critical malaria region within the state. Whenobserving the behavior of the raw relative risks we can conclude that malariahas always been present for all years of study. The objective of this work isto propose a model including temporal and spatial components to explain thedynamics of the vector disease which allows simultaneously to identify theexplanatory social-economic and climatic variables related with the diseaseincidence in Sucre state.

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 5

100

Km

ArubaBarbados

Brazil

Colombia

Grenada

Guyana

Venezuela

greater than 1

0 - 1

Year 1990

Year 1993

Year 1991

Year 1992

Year 1994

Year 1996

Year 1998

Year 2000 Year 2001

Año 2002

Year 1999

Year 1997

Year 1995

NS iS i t u a t i o n o n S u c r e S t a t e N a t i o n a l

Fig. 1 Observed relative risk maps for the 15 municipalities of Sucre state during years1990 − 2002

2.2 Poisson-Gamma model

Initially, we suppose a region of interest, which has been divided in k contermi-nous subregions (municipalities) and Yi represents the number of malaria casesin subregion i. In epidemiology a Poisson-Gamma model is usually assumedfor these quantities, where the mean is λi = Ei Ψi. Therefore,

Yi ∼ Poisson(λi),

Ψi ∼ Gamma(a, c) a > 0, c > 0 (2)

6 Desiree Villalta et al.

where Ei is a known value, while the relative risk Ψi is the value to be esti-mated. In this case the maximum likelihood estimate is expressed in (1). Thisestimate presents a limitation, as for example, when the disease is rare andthe study areas are small, ([19],[17]) it is not possible to capture the spatialdependence of the amount of interest. Another aspect to be considered is thehomogeneity within each area, where the risk Ψi is expected to be constantwithin each region.

2.3 Log-Poisson Model

The authors [9] propose a log-normal model for the relative risk (Ψi) in casedata are amenable to be better explained by a selected set of co-variables(stored in matrix Xi). Moreover, [4] expands this idea by including the in-fluence of adjacent neighbors in the i-th area disease rate. The model can bewritten as:

Yi ∼ Poisson(λi)

Ψi = eα+β′.Xi+vi+bi (3)

where α is the intercept vector, β is the regression coefficient; vi represents thelocal random heterogeneity component with normal distribution vi ∼ N(0, 1

τv),

and precision τv . bi is the CAR component, which accounts for the influenceof adjacent neighbors in the disease rate of area i, and its structure is definedas:

bi|bj , j 6= i ∼ N

(bi,

1

τb mi

)(4)

where mi is the number of neighbors to district i, bi = m−1i

∑i6=j bj , τb is the

scale parameter. The set of weights 1mi

are stored in the neighbor matrix:

Wij =

1mi

if j ∈ δi

0 any other case

where δi is the set of neighbors to district i.

Posterior samples from model 3 are obtained by sampling from the fullconditional probability distributions following the Gibbs sampling algorithm[1].

2.4 Space-time model

Expanding model (3) for several years simultaneously we can write

Yit ∼ Poisson(λit) (5)

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 7

with λit = Eit Ψit, t = 1, ..., T , and T is the number of years; in this caseT = 13.

We consider two equations for Ψit. The first one is the simple model (model(6)) and the second one is the complex model or CAR, which assumes thatthe disease rates are not independent among neighbor districts (model (7)).

Ψit = exp(αt + β′t.Xit + vit) (6)

Ψit = exp(αt + β′t.Xit + vit + bit) (7)

In models (6) and (7), vectors αt = (α1, α2, ..., αT ), βt = (β1, β2, ..., βT ),τht = (τh1, τh2, ..., τhT ), τbt = (τb1, τb2, ..., τbT ),bit = (b1t, b2t, ..., bkt) , vit = (v1t, v2t, ..., vkt) and Xit represents the covariatesmatrix.

For model (7) the full posterior conditional distributions for parametersαt, βt, bit, vit, τbt and τht were calculated.

2.5 Prior distributions for the model parameters

For the parameter vectors of model defined in (7), a uniform distribution forαt and βt was used. vit and bit are assumed normally distributed and for τhtand τbt a Gamma distribution is used. The prior probability distributions areset as follows:

αt ∝ 1

βt ∝ 1

vit ∼ N(0, 1/τht)

bit|b−it ∼ N

(bit,

1

τbtmit

)

τht ∼ Gamma(ah, dh)

τbt ∼ Gamma(ac, dc) (8)

with hyper-parameter values for the Gamma distribution set as ah = ac = 0.5,dh = dc = 0.0005. b−it is the CAR component excluding parameter bit andmit is the number of adjacent districts to district i on time t. The numberand spatial configuration of districts does not change with time, therefore mi

could be used, but the notation mit was adopted instead.

8 Desiree Villalta et al.

2.5.1 Posterior conditional distributions

Assuming prior parameter independence, the posterior conditional probabilitydistributions of parameter vector (αt, βt, vit, bit, τht, τbt)

– Posterior conditional distribution for αt

P (αt|Y, βt, τb, τh, v, b) ∝ exp

(T∑

t=1

n∑

i=1

(−Eit exp(αt + βt ·Xit + vit + bit) + αtyit)

)

– Posterior conditional distribution for βt

P (βt|Y, αt, τb, τh, v, b) ∝ exp

(T∑

t=1

n∑

i=1

(−Eit exp(αt + βt ·Xit + vit + bit)) + βtyit ·Xit

)

– Posterior conditional distributions for each CAR effect bit

P (bit|b−it, Y, βt, αt, τb, τh, v) ∝ exp

(T∑

t=1

(−Eit exp(αt + βt.Xit + vit + bit) + bityit −τbmit

2(bit − bit)

2)

)

– Posterior conditional distribution for each random effect vit

P (vit|Y, βt, αt, τb, τht, v) ∝

T∏

t=1

exp(vityit − Eit exp(αt + βt.Xit + vit + bit) +

τht2v2it

)

– Posterior conditional distribution for τbt

P (τbt|Y, αt, βt, , τht, v, b) ∝

T∏

t=1

τn/2bt exp

(τbt2(bit − bit)

2)τac−1bt exp(−dcτbt)

– Posterior conditional distribution for τht

P (τht|Y, αt, βt, τbt, v, b) ∝T∏

t=1

τn/2+ah−1ht exp

{−τht

(dh +

1

2

n∑

i=1

v2it

)}

2.6 Model Selection

As an initial analysis the Deviance Information Criteria (DIC), proposed by[24] was used for model comparisons. DIC outputs from WinBugs are shownin Table 1. DIC values do not differ much for the different models. An alterna-tive to DIC is the minimum posterior predictive loss criteria proposed by [12]which can be easily implemented from the posterior simulation. This criteriaselects the models which minimizes the square loss function D, by generatingnew replicates from the posterior predictive distribution given the observationyi,obs.

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 9

The D criteria is calculated as

D = G+ P (9)

where

G =

N∑

i=1

(µi − yi, obs)2

P =

N∑

i=1

σ2i (10)

In(10) µi = E(Yi,rep| y), is the posterior predictive mean, and σ2i = V ar(Yi,rep| y)

corresponds to the variance. For equations (9) and (10), G is a goodness-of-fitterm and P is a penalty term for model complexity. If the expected value offuture predictions is similar to the observed data, simulations from the pos-terior predictive distribution will be on average close to the data, and the Gcomponent will be small.

Note that the µi and σ2i values can calculated from samples from the poste-

rior predictive distribution. With a model with g parameters, θ(g), the posteriorpredictive distribution is written as

p(yi,rep| y) =

∫p(yi,rep| θ

(g))p(θ(g)| y) dθ(g)

where yi,rep = (y1,rep, ..., yk,rep) are replicates of data vector Y .

Each posterior sample (θ∗) is used to get a replicate yi,rep simulating fromp(yi,rep| θ

(g) = θ∗). The resulting sample y∗i,rep has a marginal distribution

p(yi,rep| y). With samples from this distribution we can get µi and σ2i [1].

Since the proposed model is spatio-temporal, the above criteria is general-ized for the number of t years considered in the model (7).

In summary, the steps followed to calculate the D criteria are as follows:

1. Assume Markov Chains are stationary for each iteration n = 1, ..., N andeach year t = 1, ..., T .

2. Select samples θ(n)it ∼ P (θ|y), where θ

(n)it =

(α(n)t , β

(n)t , v

(n)it , b

(n)it , τ

(n)ht , τ

(n)bt

),

with i = 1, ..., k.

3. Calculate the relative risk Ψ(n)it = exp(αt + β′

t ·Xit + vit + bit).

4. Simulate replicates y(n)it, rep for each year and each district

Yit, rep|Ψ(n)it ∼ Poisson

(EitΨ

(n)it

)

5. Go to step 2 until i = k, t = T and n = N .

10 Desiree Villalta et al.

After all samples y(n)it,rep are simulated, the approximate value for the D

criteria is calculated as:

D = D(N, T, k) =

T∑

t=1

{k∑

i=1

[1

N

N∑

n=1

(y(n)it, rep − y

(n)it, obs

)2]}

2.7 Model residual analysis

Once the a model is selected using the D criteria, an analysis of the modelresiduals was undertaken to check model assumptions of residuals indepen-dence. Model residuals for year t and district i are calculated as:

rit = yit − Eit.

(1

N

N∑

n=1

Ψ(n)it

)

where Ψ(n)it is the sample from the relative risk posterior distribution, which is

built from the posterior sample of the model parameters for each year.

Independent residuals for each each must show an unstructured spatialpattern with no clustering or bands. An example of this is shown in Fig. 2,which corresponds to the residuals of year 1998 for the 15 districts of Sucrestate. There is no evidence of clustering arrangements, and they seem to berandomly distributed.

-+

- --

--

-+

+

+

+ ++

+

64°0'0"W 63°0'0"W 62°0'0"W

10°3

0'0

"N

Legend

Sucre State Municipalities

States of Venezuela

50

Km

N

Fig. 2 Model residuals for year 1998

Analytically we can check residuals independence by using the Moran’s I

statistics [22], [10], which provides a measure of spatial correlation for arealdata and has been widely used since 1990.

Moran’s I is a global statistics which uses all observations. It does not allowto measure the local spatial correlation structure, but it detects the existenceof spatial local clustering above or below the mean of all observations andpoints out which regions are contributing most to the global spatial correla-tion. The original spatial dependence measure developed by [18] are based on

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 11

notions of binary contiguity among spatial units.

The Moran’s I statistics for each year is calculated as:

M =I∑I

i=1

∑Ij=1(ri − r)(rj − r)

∑i,j Wij

∑Ii=1(ri − r)2

(11)

where I is the total number of districts and Wij is the i, j entry of neighborsmatrix W (of size 15× 15 in this case).

The values of the Moran’s I statistics vary between +1 and −1, where 1indicates perfect positive autocorrelation (perfect concentration), and −1 in-dicates perfect negative autocorrelation (perfect dispersion); zero is the valuefor a completely random pattern, which is expected under model assumptions.The null hypothesis to be tested is that there is no spatial association amongthe residuals vs. the alternative which proposes spatial dependence.

[1] propose the following steps to calculate the Moran’s I which would beapplicable for data of one year. A generalization for multiple years is as follows:

1. Construct matrix M of size k×T with the residuals rit estimated for eachyear t (t = 1, ..., T ) and each district i (i = 1, ..., k) in Sucre state.

2. Apply equation (11) to each column of matrix M , to get the vector Mobs, t,where each vector entry represents the Moran’s I for each year t.

3. Resample 1000 replicates for each column vector of matrix M and repeatstep 2 to get Mrep, t.

4. Identify quantiles qα

2and q1−α

2for each year from samples Mrep, t, for, lets

say, α (α = 0.05).5. Verify whether each value Mrep, t lies within the previous interval.

Under the assumption of no spatial correlation of the residuals, Mobs, t

should lie within the interval (qα

2, q1−α

2) for each year. In other case we will

assume spatial correlation.

2.8 Posterior predictive model checks

When a model is fitted, the pertinent question to be asked is whether the modelfits the data, since in most cases none of the proposed models are strictly cor-rect although they can be useful in practice. The relevant question is whethermodel deficiencies would have an important impact on model inference.

A way to verify whether the model fits the observed data is to provide anexternal validation by using the model to make future predictions. Posteriorpredictive simulations are plausible outcomes of the number of malaria cases

12 Desiree Villalta et al.

and they should be comparable with the observations. One option to make thiscomparison is by using the posterior Bayesian p-value [13], which is definedas:

p− valueB = p (y rep ≤ yobs) (12)

where y rep is the replicate vector for a given year and district.

A model will be considered uncertain if the p-value is closed to 0 or 1(less than 0.01 or greater than 0.99). In this case the observed value is highlyimprobable to have come from the proposed model.

3 Study Region

Sucre state as shown in Fig. 3, is one of the 23 states of Venezuela and it islocated at 10°02′38′′N − 10°45′30′′N, 61°50′48′′W − 61°31′47′′W , in the north-western region of the country. The state includes 15 municipalities, which are:Andres Eloy Blanco, Andres Mata, Arismendi, Benıtez, Bermudez, Bolıvar,Cajigal, Cruz Salmeron Acosta, Libertador, Marino, Mejıa, Montes, Ribero,Sucre y Valdez. A more detailed areal subdivision consists of parishes whichare 54 in total, with an area of 11800Km2. In this work municipalities wereused the basic areal unit.

64°9'47"W 63°39'47"W 63°9'47"W 62°39'47"W 62°9'47"W

10

°0'4

2"N

10

°30

'42

"N

Municipalities of Sucre State

Andres Eloy Blanco

Andres Mata

Arismendi

Benitez

Bermudez

Bolívar

Cajigal

Cruz Salmeron A.

Libertador

Mariño

Mejia

Montes

Ribero

Sucre

Valdez

50 0 5 025

Km

1:1,150,000

400

KmUniversidad Simón Bolívar

Centro de Estadística y Software Matemático CESMaCaracas - VenezuelaSituation on Sucre State National

Fig. 3 Spatial distribution of the 15 municipalities of Sucre state

4 Data description

For the preparation of this work several kinds data sets were collected: So-cial variables, which were obtained from the National Institute of Statistics(INE), number of malaria cases at Sucre state, data collected from the Ministryof Health and Social Development (MSDS) and climatic data from differentsources including the hidrometeorological data repository from the Centro de

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 13

Estadıstica y Software Matematico (CESMa) at Simon Bolıvar University andfrom the global data sets available at the National Oceanic and AtmosphericAdministration (NOAA) web page.

4.1 Malaria Data

Data were available for the period 1990-2002. Some districts had missing datafor some years, which were estimated as the average of the neighbor districts.After data corrections the total number of malaria cases were obtained foreach year and each district.

Fig.4 shows the temporal and spatial variability of the number of cases.The highly malaric districts during the study years were Cajigal, Marino andRibero. During year 2002 there was an important increase in the number ofcases for most districts with important numbers in Cajigal, where 5559 caseswere reported. This district has been considered as endemic for the disease.Another district with important number of cases is Marino, which has beenreported by [7].

0

1000

2000

3000

4000

5000

6000

Mala

ria C

ases

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

Municipalities

Fig. 4 Number of malaria cases in the 15 districts of Sucre state during the period 1990−

2002

It is important to note that districts Cajigal and Marino are neighbors (seeFig.3), which suggests that a model to capture information about neighboringareas might be more realistic in explaining the dynamics of the disease throughthe mosquito as a moving vector in the transmission scheme.

14 Desiree Villalta et al.

4.2 Social Variables

The database from the National Institute of Statistics(INE) was carefullyreviewed before use for possible outliers and inconsistencies. Different ex-ploratory techniques were used for data inspection as Principal ComponentAnalysis (PCA), Bi-plots and Cluster Analysis.

Social variables were available for each municipality and for the base years1990, 1995 and 2001, and they were classified as variables related with basic

needs, employment characteristics and basic services. Not all variables wereavailable for all base years.

Basic Needs: With respect to basic needs, data were available for years 1990and 2001. The following variables were considered: Percentage of poor house-holds, percentage of people living in poor households, percentage of houseswith bad quality building materials, percentage of houses with fair qualitybuilding materials, percentage of households with critical crowding, percent-age of households with inadequate housing and percentage of households withlack of basic services. Many of these variables are highly correlated and thePCA analysis was applied to reduce the number of variables into few uncor-related principal components (PC’s).

When applying PCA analysis to year 1990 the first two PC’s explained85.84% of total variance, where the first PC (I1) is highly associated withthe percentage of houses with bad quality building materials, and the sec-ond component(I2) is more related with percentage of people living in poorhouseholds. For year 2001, the two first PC’s accounted for 87.42% of totalvariance, where the first component was highly correlated with the percent-age of households with lack of basic services (I3) and the second component(I4) with percentage of houses with fair quality building materials. In Fig. 5boxplots of basic needs study variables comparing years 1990 and 2001 areshown.

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 15

%P

oo

r H

h

%P

oo

r H

h 0

1

%P

eo

ple

Po

or

Hh

%B

ad

Qu

ali

ty H

h

0

20

40

60

80

Year 1990

Year 2001

%P

eo

ple

Po

or

Hh

01

%B

ad

Qu

ali

ty H

h 0

1

%F

air

Qu

ali

ty H

h

%F

air

Qu

ali

ty H

h 0

1

%C

riti

ca

l C

row

d

%C

riti

ca

l C

row

d 0

1

%H

h.

Ina

d.

H

%H

h i

na

d.

H 0

1

%L

ac

k B

as

ic.

Se

rv

%L

ac

k B

as

ic.

Se

rv 0

1

Fig. 5 Boxplots of basic needs variables for years 1990 and 2001 for the 15 municipalitiesof Sucre state

In general we can observe that Sucre state improved these social indicatorsfor year 2001 (red color boxplot) in comparison with year 1990 (blue color box-plot). Variables percentage of poor households ($ Poor Hh) and percentage ofpeople living in poor households (% People Poo Hh) decreased from year 1990to 2001, followed by variables percentage of houses with bad quality buildingmaterials (%Bad Quality Hh), percentage of households with inadequate hous-ing (% Hh Inad. h) and critical crowding (% Critical Crowd.). However thereis an important increase in lack of basic services (% Lack Basic Serv) and aslight increase in the percentage of houses with fair quality building materi-als (%Fair Quality Hh); this trend makes sense since an improvement in thequality of life of the population requires more basic services (not necessarilyaccomplished) and reflects better quality of housing (from bad to fair). Theprincipal components were included in the matrix of explanatory variables ascovariates.

To include new variables I1, I2 I3, e I4 in the model two poverty indicatormatrices were built, which will be named as X1 and X2. These matrices willfollow the following structure:

Matrix X1 of dimension 15 × 13 (municipalities × years) has repeatedcolumns with component I1 for the period 1990− 1995 and repeated columnswith component I3 to complete columns for years 1996− 2002. The same pro-

16 Desiree Villalta et al.

cedure was used for matrix X2, using components I2 and I4 respectively.

Percentage of employed population: For the variable percentage of employedpopulation, data from base year 1999 were used and the following employmentclasses were considered: Agriculture, mining, manufacture, electricity and gas,construction, commerce, transport, communication and others. Applying againthe PCA method, the first PC (named I5) explained 81.38% of the data vari-ance, being I5 highly associated with the percentage of population employedin agriculture. For building matrix X3 the same component I5 is repeated forall years.

Basic Services: Finally, for the set of variables related with basic services,information was available for year 1995 only. The following variables wereused: drinking water and sewerage availability, categorical degree of urbanconditioning efforts and solid waste collection and disposal. The first PC forthese variables, named as I6, explained 79.67% of the data variance and itis highly correlated with drinking water and sewerage availability. Matrix X4

was built in a similar way to matrix X3.

4.3 Climatic variables

To get precipitation data for all Sucre state, spatial interpolation of the dataset available for the 23 climatic stations across the state was carried out. Fig.6 shows the geographic location for the 23 stations.

ALGARROBITO

BAJO NEGRO

CANCAMURE

CANGREJALCARIACOCARIACO-MUELLE

CARUPANO

CASANAY

CATUARO

CHACARACUAL

COCOLLAR

CUMANA-AEROPUERTOCUMANA-UDO

CUMANACOA-LA GRANJA

GUIRIA-AEROPUERTO

LA HACIENDA

IRAPA

LAS PALOMAS

LAS CLAVELLINAS

NURUCUAL

TUNAPUY

50

Km

RIO CARIBE

SALSIPUEDES

Fig. 6 Spatial locations of the 23 climatic stations in Sucre state

A space-time Bayesian kriging model as described in [15] was fitted to themonthly time series for all locations. Before applying this methodology, impu-tation of missing data was necessary. Average monthly rainfall was estimatedfor each municipality and matrix variable X5 of dimension 15 × 13 was setas the annual monthly maximum precipitation. Precipitation is an importantvariable for this study, since it affects significantly the mosquito’s develop-ment and spread of the disease. This was also shown in previous studies as forexample [19], [20], [21].

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 17

5 Results

5.1 Comparison of fitted models

The covariables described in (4.2) and (4.3) are labeled as follows:

– X1: First PC of basic needs variables associated with percentage of houseswith bad quality building materials (1990−1995) and lack of basic services(1996− 2002).

– X2: Second PC of basic needs variables associated with percentage of peopleliving in poor households (1990− 1995) and percentage of houses with fairquality building materials.

– X3: First PC of percentage of employed population, associated with per-centage of population employed in agriculture.

– X4: First PC of basic service variables associated with drinking water andsewerage availability.

– X5: Annual monthly maximum precipitation.

All variables were centered to diminish convergence problems in the MCMCmethod. Models shown in Table 1 are the relative risks following equations (6)and (7). In both cases the same groups of covariables were used. The first 4000iterations were used for burning and 16000 additional values were simulatedand convergence of the resulting chains was monitored.

To improve convergence initial values of parameters αt y βt were set as themaximum likelihood estimates of the log-linear Poisson model.

In Table 1 fitted models for the relative risk are shown, and results of crite-ria DIC [12] and D [24] were calculated for model selection, including simplemodels without the CAR component and more complex models including theCAR component. In case of the simple models the term (+bit) is not includedin the model and equation (6) is the valid equation in this case; while in thecomplex model (with CAR), the term (+bit) is added, and the model equationis (7).

Table 1 Model comparisons without CAR and with CAR with criteria DIC and D

MODELS Whithout CAR Whith CARDIC D DIC D

exp(αt + β2tX2it + β5tX5it + vit(+bit)) 1689.13 517864552 1702.60 175950.2exp(αt + β1tX1it + β5tX5it + vit(+bit)) 1688.43 518837418 1708.50 181948.3exp(αt + β3tX3it + β5tX5it + vit(+bit)) 1690.15 518740948 1709.33 193492.8exp(αt + β4tX4it + β5tX5it + vit(+bit)) 1689.31 519983456 1707.53 178300.8exp(αt + β2tX2it + β3tX3it + β5tX5it + vit(+bit)) 1690.91 520433771 1704.99 177759.8exp(αt + β1tX1it + β2tX2it + β3tX3it + β5tX5it + vit(+bit)) 1690.16 518936237 1701.94 174430.4exp(αt + β1tX1it + β2tX2it + β5tX5it + vit(+bit)) 1688.63 837149929 1706.28 179240.2exp(αt + β1tX1it + β3tX3it + β5tX5it + vit(+bit)) 1691.31 518684859 1718.20 183262.4exp(αt + β4tX4it + β3tX3it + β5tX5it + vit(+bit)) 1691.76 5178585516 1716.52 184213.4

18 Desiree Villalta et al.

It can be noticed in Table 1 that the DIC criterion is not very informativefor model selection, since it is practically constant for all models. The D [12]criterion provides better results for model selection.

The selected model is highlighted in grey in Table 1, and it resulted to bea model with CAR term and covariables X1, X2, X3 and X5, this is:

Ψit = exp(αt +β′1t ·X1 it +β′

2t ·X2 it +β′3t ·X3 it +β′

5t ·X5 it + vit + bit) (13)

All models were implemented in WinBugs version 1.4.3 and chain convergencewas monitored after using different initial values for the model parameters.Model convergence was slow and it was better appreciated after 18000 sampleswere simulated.

Since the selected model (13) included the CAR component, its medianwas calculated to estimate its effect on the relative risk for each municipality.

In Fig. 7 the CAR component effect is plotted for all years. It is observedthat municipalities with high CAR effect (greater to 0) corresponds to mu-nicipalities with higher relative risk (see Fig. 1). This shows that there is arandom effect contribution to the relative risk other than the covariates effect,associated with the spatial dynamics of the disease.

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 19

ArubaBarbados

Brazil

Colombia

Grenada

Guyana

Suriname

Venezuela

S i t u a t i o n o n S u c r e S t a t e N a t i o n a l

greater o iqual to 0

greater than 0

100

Km

Year 1990

Year 1993

Year 1991

Year 1992

Year 1994

Year 1996

Year 2001

Year 2002

Year 1999

Year 1995

N

1:3.100.000

Graph i c sca l e

Year 1997

Year 1998

Year 2000

Fig. 7 Median of the CAR component for years 1990 − 2002

5.2 Residual analysis

After model selection and inference it is expected that model residuals wouldbe independent. To check independence model residuals for years 1990− 2002were estimated based on the N samples from the parameter posterior distri-bution for the selected model (13), this is,

rit = yit − Eit

(1

N

N∑

n=1

eα(n)t

+β(n)1t .X1 it+β

(n)2t .X2 it+β

(n)3t .X3 it+β

(n)5t .X5 it+v

(n)it

+b(n)it

)

(14)The resulting maps for residuals are presented in Fig. 8, where some depen-

dence is observed for year 2002 with more clustering of residuals of the same

20 Desiree Villalta et al.

sign, which might explain that for some municipalities the relative risk canbe overestimated. In particular relative risk for Marino district for years 1990,1992, 1993, 1994 and 2000 are slightly overestimated. The Moran’s I statisticwas used to test independence.

ArubaBarbados

Brazil

Colombia

Grenada

Guyana

Suriname

Venezuela

S i t u a t i o n o n S u c r e S t a t e N a t i o n a l

less than -1

between -1 and 0

between -1 and 0

greater to 1

100

Km

Year 1991

N1:3.100.000

G raph i c s c a l e

Year 1990

Year 1992 Year 1993

Year 1994 Year 1995

Year 1996 Year 1997

Year 1998 Year 1999

Year 2000 Year 2001

Year 2002

Fig. 8 Estimated residuals for the selected model

5.3 Model checking for the selected model

The posterior predictive check verifies whether the proposed model is consis-tent with the observed data. Therefore samples from the posterior predictivedistribution might be comparable with the observed data. Any systematic dif-ference between simulations and the observed data might indicate a modelfailure to represent observations.

Following this idea 2000 replicates were simulated for the 15 municipalitiesof Sucre for each year. The Bayesian p− value was calculated from the simu-

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 21

lations and it was verified that for most years the p− values were around 0.5.In Fig. 9 an example of a good fit is shown for Cajigal district for year 1994,where the histogram of the posterior predictive simulations is shown, and theobserved data for that year is overlaid on the graph (vertical dark color line).At the bottom of the figure the Bayesian p− value is indicated.

Cajigal Municipality

p= 0.517

Fre

qu

en

cy

20 800

20

0

Fig. 9 Posterior predictive check for Cajigal district during year 1994 and Bayesian p−value

5.3.1 Special case

For year 1997 a good fit only for 8 of 15 municipalities was observed; modelunderestimated the relative risk for 5 districts, while for 2 more districts therelative risk was overestimated. In Fig. 10(a) a poor fit for the Cruz SalmeronAcosta district is shown.

p= 0

Fre

quency

0 15000

01000

p= 0.635

Fre

qu

en

cy

0 15

01

00

0

(a) (b)

Fig. 10 Posterior predictive check for Cuz Salmeron Acosta district for year 1997 and−value in case of: (a)poor fit, and (b)good fit

Since data from year 1997 was not well represented by the selected model(13) (see Fig. 10 (a)), an alternative model was proposed by which the localheterogeneity parameter v was represented by a t−Student distribution with2 degrees of freedom.

22 Desiree Villalta et al.

In this case, for each district i of year 1997 we have:

v[i] ∼ t− student(1, ξ, 2)

ξ ∼ Gamma(0.5, 0.005) (15)

In Fig. 10 (b) the posterior predictive check for Cruz Salmeron Acostadistrict improves when a heavier tail distribution as the t − Student is usedfor the parameter v on year 1997.

5.4 Posterior probability distribution for some parameters of the selectedmodel

Posterior distributions of parameters β3 y β5 for year 1998 are shown in Fig.11. During this year the relative risk was highly influenced by variables X3

(associated with the percentage of population employed in agriculture) andX5 (annual monthly maximum precipitation). In general malaria incidence in-creases in municipalities where living conditions are poor and basic services arelacking. These distributions are positive over most of their domain indicatinga positive relationship among these variable and the relative risk.

β3 1998

0.025 0.05 0.075 0.1

0.0

10.0

20.0

30.0

40.0

β9[9] chains 1:2 sampl e: 4002

-2.0 0.0 2.0 4.0

0.0

0.2

0.4

0.6

β5 1998

Fig. 11 Posterior distributions of coefficients β3 and β5 for year 1998

6 Conclusions

Sucre state in general presents favorable conditions for the appearance anddevelopment of the mosquito genus Anopheles due to different factors as en-nvironmental conditions, lack of strategies from the population to prevent thedisease, lack of governmental support for vector control, among other factors.

Poverty conditions, agricultural activities (mostly of subsistence nature)and climatic variability were considered as factors affecting malaria incidenceduring the period 1990− 2002 in the 15 municipalities of Sucre state as it wasreflected in the final model following equation (13).

Parameters v and b capture the spatial heterogeneity, and the b compo-nent (CAR term) plays an important role in representing the disease dynamic

Bayesian Space-Time modelling of malaria incidence in Sucre state-Venezuela 23

among neighboring districts. Model selection was carried out by using the min-imum posterior predictive loss (D criterion), which resulted more informativethat the DIC criterion.

Posterior predictive checks of the model using the Bayesian p-value sug-gested that the model is adequate to represent the relative risk at a districtlevel for most years. In cases where model fit was not adequate as in year1997, a heavier tail distribution for the v parameter as the t − Student wasconvenient to improve model fit.

From the 15 municipalities in Sucre state during the period 1990− 2002,7 districts presented a relative risk greater than 1 for most years. These dis-tricts are: Ribero, Marino, Libertador, Cajigal, Benıtez, Arismendi and AndresMata, where a higher number of malaria cases was observed. These districtsare the worst in terms of quality of life; their population is mostly employedin agriculture and they can be classified as highly malaric zones, with Cajigaldistrict as the most critical case.

Posterior densities of model parameters (equation 13) reflect the positiveinfluence of agricultural activities and precipitation as covariables enhancingthe relative risk, followed by the poverty conditions of the population.

In this research malaria cases are considered independent from year toyear. A time dependent model as used by [19] might be attempted to con-sider the potential influence of previous year cases on the temporal evolutionof the disease. Other improvements could be included by attempting differentstructures on the weight matrix W from the binary form used in this research(see for example [11]). Other factors as imported malaria cases from out Sucrestate might also be considered in the model structure as additional covariablesor in the neighboring conditions.

Data for this study was limited since availability from governmental sourceswas very restricted. Updated information might contribute to a better up-to-date picture of the disease incidence in the state.

Acknowledgements Authors would like to acknowledge the National Fund of Scientificand Technological Research (FONACIT) for partially funding this research under the project.No. 2005-000184.

References

1. Banerjee, S., Carlin, B., P., Gelfand, A.: Hierarchical Modeling and Analysis for SpatialData, Chapman & Hall / CRC, England (2003)

2. Barrera, R., Grillet, M., E., Rangel, Y., Berti, J., A, A.: Estudio eco-epidemiologicode la reintroduccion de la malaria en el nororiente de Venezuela mediante Sistemas deInformacion Geografica y Sensores Remotos, Bol. Mal. Sal.Amb, 38, pp.14-31 (1998)

24 Desiree Villalta et al.

3. Berti, M., J., Rivas, J., G.: Evaluacion de la Efectividad y Persistencia de una nuevaformulacion de Bacillus sphaericuscontra larvas de Anopheles aquasalis Curry (Diptera:Culicidae) en criaderos naturales del Estado Sucre, Venezuela, Bol. Mal. Sal.Amb, 44,pp.21-27 (2004)

4. Besag, J., Mollie, A.: Bayesian image restoration, whith two aplications in spatial statis-tics, Annals of the Statistical Mathematics, 43, pp.1-59 (1991)

5. Brown, M., Forsythe.: Robust Test for the Equality of Variantes, Journal of the AmericanStatistical Association, 69, pp.364-367 (1974)

6. Caceres, J., L., Vela, F., A.: Reporte Epidemiologico. Incidencia Malarica en VenezuelaDurante el ao 2002, Bol. Mal. Sal. Amb, pp.53-58 (2003)

7. Caceres, J., L.: Eficacia de la Cura Radical Masiva en la Incidencia Malarica del MunicipioMarino, Estado Sucre, Journal of the American Statistical Association, 1, pp.45-49. (2004)

8. Caceres, J., L., Pizzo, N., Vela, F., Perez, W., Rojas, J., G., Mora, J., D., Sanchez,E., Paez, E., Botron, L., Rubio, N., Maldonado, A.: Impacto de la Cura Radical Masivasobre la incidencia sobre la malarica del estado Sucre, Venezuela, Bol. Mal. Sal. Amb, 44,pp.51-55 (2005)

9. Clayton, D., G., Kaldor, J.: Empirical Bayes estimates of age-standardized relative riskfor use in disease mapping, Biometric, 43, pp.671-681 (1987)

10. De Smith, M., Longley, P., A.: Geospatial Analysis: A Comprehensive Guide to Princi-ples, Techniques and Sowftware Tools, Universidad Nacional de Colombia, Bogota (2006)

11. Ferreira, G., S., Schmidt, A., M.: Spatial modelling of the relative risk of dengue feverin Rio de Janeiro for the epidemic period between 2001 and 2002, Brazilian journal ofProbability and Statistics, 1, pp.29-47 (2005)

12. Gelfand, A., E., Ghosh, S., K.: Model choice: a minimum posterior predictive loss ap-proach, Biometrika, 85, 1-11 (1998)

13. Gelman, A., Carlin, J., Stern, H., Rubin, D.: Bayesian Data Analysis, Chapman & Hall/ CRC,London (2004)

14. La Malaria y su contexto Espacial. Caso de Estudio: El Estado Sucre en Venezuela,http://www.geogra.uah.es/inicio/web 11 confibsig/PONENCIAS/2-058%20Delgado-Ramos-Camardiel.pdf (2009). Accessed 26 June 2009

15. Le, N., D., ZIDEK, J.: Statistical Analysis of Environmental Space-Time Processes,Springer, (2006)

16. Lawson, A., B., Browne, W., J., Rodeiro, C., L., V.: Disease Mapping with WinBUGSand MLwiN, Wiley, England (2003)

17. Mollie, A.: Bayesian mapping of disease. In: Markov Chain Monte Carlo in Pratice,Chapman & Hall / CRC, England (2003)

18. Moran, P.: The interpretation of statistical maps, Journal of the Royal Statistical SocietyB, 10, pp.243-251 (1948)

19. Nobre, A., A., Schmidt, A., M., Lopes, H., F.: Spatio-temporal models for mapping theIncidence of malaria in Para, Envirometrics, 1, pp.291-304 (2005)

20. Poveda, G., Rojas, W., Quinones, M., L., Velez, I., D., Mantilla, R., I., Ruiz, D., Zuluaga,J.,S., Rua, G.,L.: Coupling between Annual and ENSO Timescales in the Malaria-ClimateAssociation in Colombia, Environmental Health Perspectives, 5, pp.489-493 (2001)

21. Poveda, G., Rojas, W.: Evidencias de la Asociacion entre Brotes Epidemicos de laMalaria en Colombia y el Fenomeno El Nino-Oscilacion del Sur, Rev. Acad. Colomb.Cienc, 21, pp.421-427 (1997)

22. Ripley, B., D.: Spatial Statistics, Wiley & Sons, New York (1981).23. Rodrıguez, J.: Modelaje de la vulnerabilidad de la poblacin Venezolana a eventos delluvia extrema, Universidad Simon Bolıvar, Tesis de Maestrıa, Departamento de ComputoCientıfico y Estadıstica, Venezuela (2009)

24. Spielgelhalter, D., J., Best, N., Carlin, B., P., Van der Linde, A.: Bayesian measure ofmodel complexity and fit (with discussion), J Roy. Statist.Soc., Ser. B, 64, pp.583-639(2002)

25. OMS. World malaria situation in 1993, part I. Weekly Epidemiological Record, 71,pp.17-22 (1996)

26. http://portal.unesco.org/education/es/ev.php-URL ID=36655&URL DO=DO TOPIC&URL SECTION=201.html (2008). Accessed 20 November 2008