Geostatistical Approaches to Change of Support Problems - Theoretical Framework

Geostatistical Approaches to

Change of Support Problems

– Theoretical Framework –

C. LAJAUNIE, H. WACKERNAGEL

Report No 19 of Contract IST-1999-11313

December 2000

Technical Report N–30/01/GENSMP - ARMINES, Centre de Géostatistique

35 rue Saint Honoré, F-77305 Fontainebleau, France

http://cg.ensmp.fr

Contents

Summary 4

1 Introduction 5

2 Geostatistical model-based approach 7

2.1 Classical presentation of change-of-support models. . . . . . . . . . . . . . . . . . 7

2.2 Basic principle of the geostatistical model. . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Spatial models and support effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Support effect models and data assimilation 12

3.1 Gaussian model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Lognormal model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Monte-Carlo approach 15

4.1 Change of support using conditional simulations. . . . . . . . . . . . . . . . . . . . 15

4.2 Conditioning upon the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Short scales behavior and truncations. . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4 Monte-Carlo approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Application to AIRPARIF air quality data 18

5.1 Presentation of 1999 data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2 Future work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Conclusion 24

A An example: the Hermitian model 25

B Monte-Carlo approach - simulation of Gamma model 29

B.1 Simulation by inverse Lévy measure. . . . . . . . . . . . . . . . . . . . . . . . . . 29

B.2 Infinitely divisible process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

B.3 Simulation algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

B.4 Conditioning to data.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

CONTENTS 3

Bibliography 35

Summary1

This report is a deliverable of the EC funded “IMPACT” IST project on “Estimation of Human Impact

in Presence of Natural Fluctuations2.

Chapter1 gives a general introduction.

Chapter2 presents the geostatistical change of support model.

Chapter3 discusses the importance of support effect models in data assimilation.

Chapter4 presents a Monte Carlo simulation approach as an alternative to conventional change of

support models.

Chapter5 discusses first ideas in connection with the Paris area air pollution case study being set

up within the IMPACT project.

Chapter6 summarizes the conclusions of the report.

In the appendix ChapterA is giving details about the Hermitian change of support model and

ChapterB is dwelling about recent work by Wolpert and Ickstadt in the framework of a Gamma

model.

1The report is available as a pdf file and is best viewed with the Acrobat-reader, which allows to take full advantage ofthe internal links.

2See the Website:http://www.mai.liu.se/impact/ .

Chapter 1

Introduction

When dealing with spatial fields or time processes, it is necessary to assess the information carried

by the data in a precise way. It is acknowledged that data have a limited precision and therefore

measurement error models are frequently used to account for this. Another important aspect of data

which is less often considered is their support. It turns out that very often data can be considered as

spatial averages over fixed volumes, or time averages over intervals of fixed duration. This is obviously

the case for soil samples, at least as long as each sample which has a well defined volume, is properly

mixed up before analysis. But this is true also for air and water quality measurement, at least to a good

approximation, because the analysis is based on accumulation of material through filters during fixed

time intervals and under specific flux condition. This volume is termed in the geostatistics literature

thesupportof the data.

Why should we be concerned about the support of data, or in other words, on what problem

does the support play a significant role? Spatial fields generally have a large spectrum of variability,

including short scale components. This means that in spite of the fact that they have usually a long

range smooth tendency, they vary in a very erratic way at microscopic scale. The range of variability

of a spatial field or a time process can be analyzed for instance by Fourier spectrum, or equivalently

by looking at the spatial covariance. But this analysis is limited by the support of the data available.

Scales shorter than the data support are filtered out when only spatial averages at that volume are

considered. This means that a part of the natural dispersion is not reflected by the data. To sum this

up, while the overall average estimation is not biased by taking data at various support, the dispersion

is underestimated in a way which depends on the sample volume considered. This is affecting the

variance in an easily predictable way, but also the extremes of the statistical distributions and the

percentages of exceeding over fixed thresholds for instance. The change of distribution with the data

volume (or analyzed length in case of time data), is termed support effect in geostatistics.

This effect has long been recognized in the literature on air quality data. In order to make correc-

tions on the distributions according to the integration time of the measurement devices, which vary

from case to case, the well known Larsen model had been proposed [8]. It has been used as a standard

to normalize air quality measurements, relative to the integration time. However this model is known

to rely on particular hypotheses, namely lognormality of the distributions and a de Wijs variogram

6 Introduction

model for the logarithmic variance, which are by no means general, and since more general models

are available, the Larsen model can be considered as outdated (see [13], [18]).

The support effect is not only important when comparing statistical distributions of sample data,

but also when deriving estimates defined in terms of spatial averages. For instance, we might be

interested in estimating the conditional probability that the average over a specified area exceeds a

critical value. In agriculture, the probability that the average over a farming field is below a critical

threshold, given the sample data, might be a criteria to consider in order to decide if nutrients need

to be added to the soil. While the spatial averages can be estimated by classical kriging with no

need to model the support effect, and the kriging variance can be used to assess the precision of this

estimation, other quantities which depend on the conditional distribution, such as probabilities over

thresholds, require a handling the support effect.

Another situation, more of concern on account of the IMPACT project objectives, is encountered

when doing data assimilation. In this situation, a mechanistic model is available, which is supposed

to be able to handle the main physical process governing the dynamics of the natural system. As a

rule, such models suffer from several shortcomings, which limit their prediction capability. Among

these shortcomings, are the requirement on initial and on boundary conditions which are in practice

imprecisely known. The lack of knowledge of the physical parameters involved, and also perhaps

model inadequacy in some respect, are other ones. Finally sensitivity to initial conditions is known to

be a more fundamental obstacle to such time forecast. On the other hand, as measurements become

available with time, they can be used to improve the estimation of the state of the system provided

by the model. This process of state estimation update is termeddata assimilationin the literature.

The derivation of this estimate is based on the specification of a model linking the unknown state

to measurements. It can often reasonably be assumed that this link is adequately represented by a

Gaussian error model. It will be argued in this document that a much more appropriate representation

would be through a change of support model.

One obvious reason for this is that mechanistic models are aimed at representing the physical

process at the spatial resolution of the discretization grid. The shorter scales are simply not accounted

for, and the physical parameters are supposed to be known at the scale resolution required by the

model. On the other hand, the data are available at the measurement support, and this support is

usually considerably smaller. Moreover, the data are rarely at the same location as the nodes of the

discretization grid. This situation seems ideal for geostatistical change of support models.

Chapter 2

Geostatistical model-based approach

In environmental studies an important problem is to characterize the amount of trespassing of a

prescribed environmental threshold. This amount depends on the statistical distribution of the envi-

ronmental variable at hand. The distribution will actually be significantly modified by a change of the

support of the environmental variable. This phenomenon was at the heart of the development of geo-

statistics half a century ago. We begin developing change-of-support models in their initial historical

setting, which was in mining.

2.1 Classical presentation of change-of-support models

Geostatistical change of support models have originally been setup to allow recoverable reserve esti-

mation in mining applications. This problem arises in selective exploitation, in which an attempt is

made to separate the economically valuable parts of the orebody to the rest in an effective way, at a

fine level. This methodology is essentially the result of work done by Georges Matheron (see [9], [10]

and [11] and references mentioned there).

The problem can be described as follows. The orebody is first divided into panels. Each panel is

then cut into blocks of equal size, this size being limited roughly by the dimension and maneuverabil-

ity of the trucks used to move the minerals. Now each block should be led to the plant only if it’s grade

is worth the treatment cost, otherwise it is left to the waste. From this, result the definition of a cutoff

gradezc, and the recoverable tonnage of a panel is proportional to the percentage of blocks which

have a grade above this cutoff. Furthermore, the selected quantity of material (metal) is proportional

to the grade of the selected blocks. Hence, the two following quantities have typically to be estimated

to obtain the panel economic estimation:

T (zc) =1N

1Z(vi)>zc (2.1)

Q(zc) =1N

Z(vi) 1Z(vi)>zc (2.2)

In these formulae,vi ; i 2 fPg are the blocs within the panelP, and their number isN = Card(P).

8 Geostatistical model-based approach

We have noted1A the indicator function, which is equal to1 if the conditionA is satisfied and0otherwise.Z(vi) denotes the grade of the blockvi, which is the spatial average:

Z(vi) =1jvij

Z1x2vi Z(dx)

We have writtenZ(dx) instead ofZ(x)dx to emphasis the high spatial irregularity of grades, which

imply that point valuesZ(x) do not have in general a precise meaning.

Two points are worth mentioning concerning formulae (2.1) and (2.2):

1. The first one is that the quantities involved in the formulae are at block support,v. This support

has usually an order of magnitude which is a few10 m3, to be compared to the few litters for

the core data support. In other words a considerable change of support is involved to go from

the data support to the support of concerns for the estimation.

2. The second one is that both quantities are non linear transforms of the block grades. This means

that we have to be concerned with the conditional distribution of the blocks, and not merely by

their conditional expectation.

2.2 Basic principle of the geostatistical model

The building block of the geostatistical approach to the handling of support effect is bivariate model

aimed at describing pairs in which a sample is randomly located within a larger unit (block). We shall

denote byv andv respectively the sample and the block volumes, and we shall assume that these two

are compatible in the sense thatv can be partitioned into units equal tov up to a translation. From this

covering by a finite number of disjoint samples, a random uniform sample withinv can be considered.

This can be done by labeling the samples and by picking a label at random with equal probability. The

random concentration of this block will be denoted byZ(v). The model we are looking for is thus the

probability distribution of the pair(Z(v); Z(v)).What can be assumed known about this distribution? Three guidelines can be used:

� The trend�(x), and the covarianceC(h) can be estimated from standard methods. The first and

second momentum can be calculated from these, since:

�(v) = E[Z(v)] =1jvj

Z1x2v �(x) dx (2.3)

�(v)2 = E[(Z(v)� �(v))2]

=1jvj2

Z Z1x2v 1y2v C(x; y) dx dy = C(v;v) (2.4)

and the same holds for the mean and variance ofZ(v). Concerning the covariance we have:

CovfZ(v) ; Z(v)g = E[C(v;v)] =1NXiC(vi;v)

ZvC(x; y) dx dy = �(v)2

Geostatistical model-based approach 9

Thus, the covariance between a randomly located sample and a block is equal to the block

variance.

� Now due to the additive nature of the variable, the concentration overv is the arithmetic average

of the sample concentrations. The consequence of this on the bivariate pairZ(v); Z(v) is the

following:

E[Z(v) jZ(v)] = Z(v)

This relation, which is known as Cartier’s relation, has profound consequences on the bivariate

distribution model, despite its trivial appearance. In fact, it can be shown that more generally, for

any pair of probability distributionsFX andFY defined onR+ (of positive random variables),

the following statements are equivalent:

1. One can find a pair of random variablesX andY , having probability distributionsFX and

FY , which satisfy Cartier’s relationE[X jY ] = Y .

2. For every convex function�, we have:

E[�(X)] =Z 1

0�(x)FX(dx) �

�(y)FY (dy) = E[�(Y )]

3. For every cutoffs � 0, we have:

E[(X � s)+] =Z 1

0(x� s)+ FX(dx) �

(y � s)+ FY (dy)

= E[(Y � s)+]

where we have used the standard notationu+ = u 1u>0 = max(u; 0).

4. If xc andyc give the same proportions for the distributionsFX andFY respectively, that

is if FX(xc) = FY (yc), then we have:

E[X 1X�xc ] =Z 1xc

xFX(dx) �Z 1yc

y FY (dy) = E[Y 1Y�yc ]

Note that if any of this is true, in virtue of the first condition, we must haveE[X] = E[Y ]. Let

� be this common mean value. The second condition applied to the function�(x) = (x� �)2

implies that the variance ofX is not less than that ofY , and moreover that the same is true

for any convex dispersion measure. The third means that sums in excess of cutoff is always

decreasing with the support, and the last that if a fixed proportion is selected, the recovered

quantity is always less for the highest support. In other words, selectivity is always decreasing

with support.

� It can be expected that for very large supports, distributions tend to normality due to the central

limit theorem. While conditions that ensure theoretically this result do not necessarily hold in

practice, this can be accepted as an heuristic and a guide in the design of model.

10 Geostatistical model-based approach

2.3 Spatial models and support effect

In spatial applications, estimation is often required at large support, and thus some form of change

of support is needed. When only the averageZ(v) and estimation variance is required, classical

linear kriging is called for, and only the variogram is necessary to this aim. But, as discussed in the

introduction, when quantities which depends in a non linear way ofZ(v) are to be estimated, such as

�(Z(v)), for a given function�, change of support models are required. Two forms of estimators are

classically used in geostatistics:

� Conditional expectationE[�(Z(v)) jZ(v1); :::; Z(vn)], which require the specification of the

multivariate distributionL(Z(v); Z(v1); :::; Z(vn)).

� Disjunctive kriging, which have the following form:

f�Z(v)gdk =Xifi(Z(vi))

In this expression, the weighting functionsfi have to be optimized in order to obtain the least

estimation variance. The advantage of disjunctive kriging over conditional expectation, is that

it only requires the specification of the bivariate distributions:8<: L(Z(vi); Z(vj)) i; j = 1; ::; n

L(Z(v); Z(vi)) i = 1; ::; n

This in turn enlarge considerably the range of models which can be used. While the use of

conditional expectation is in practice limited to models which are transformed of Gaussian, like

log-normal, disjunctive kriging can be used in much more general situations.

In either case, the whole domain considered,D, must be hierarchically subdivided into parts of equal

shape and size, according to the following scheme:

1. Division ofD:

D = [kvk where vk \ vl = ; (k 6= l)

thevk being translated one to another.

2. Eachvk contains a variable number of samplesnk � 0 denoted here byZ(vki ). These will be

considered as randomly located invk and independently one to the other.

According to the model, the conditional distribution of the samples, given the blocksZ(vi); i = 1; ::is the product of the individualL(Z(vki ) j(vk)), conditioned by only the block value in whichvi is

located. In other words, the samples are conditionally independent, and each depends only on the

associated block value. This applies as well to the conditional distribution of several samples in the

same blockvi.One limitation of this approach is that only a limited number of support can be considered at the

same time:v;v andD in the above description. A few more can be considered simultaneously if we

Geostatistical model-based approach 11

are ready to include more levels in the hierarchy. In either case only a limited number of compatible

supports can be considered at the same time. The Monte-Carlo approach to be described next allows

to some extend to overcome this limitation, and to consider simultaneously multiple supports.

Chapter 3

Support effect models and dataassimilation

Data assimilation is the process of updating the state estimation of a mechanistic model using mea-

surements. When the model is describing a time evolving system, data assimilation is often performed

sequentially. By this we mean that data taken at timet are used to improve the state estimation at a

given time step, but are not back-propagated (are not used to improve the estimation of the past states),

and will not be used any more in future time steps. Our description will be placed in this context, but

the ideas and the formalism apply as well to other assimilation schemes.

3.1 Gaussian model

We shall consider a discrete time Markovian system, which we shall write as:

Zn+1 = F (Zn ; �n+1) (3.1)

where�n+1 is a random variable, which cannot be predicted attn, and is termed the innovation. Very

often, it turns out that an additive form is appropriate, in which case we have:

Zn+1 = F (Zn) + A�n+1 (3.2)

Moreover if the innovation is assumed Gaussian, an assumption frequently applied, the one step ahead

conditional distribution is:

L(Zn+1 jZn) = N [F (Zn) ; AAt] (3.3)

The transformationF usually represents a discretized form of a differential equation. It is not neces-

sarily linear, and might depends on imprecisely known parameters, in which case we have an inverse

problem to solve if we want to improve the parameter estimation.

Data taken at timetn+1 depends usually in a linear way on the state at that time, and an error

model frequently used is:

Yn+1 = H Zn+1 +B �n+1 (3.4)

Support effect models and data assimilation 13

where�n+1 is a standard Gaussian vector with uncorrelated components, independent of what hap-

pened until the measurement is taken.

Most often, measurements are assumed available at the node of the discretization grid of the

model, and the matrixH is accordingly just a selection matrix. Moreover, the error terms are assumed

independent one to the other, so that the equation (3.4) can be rewritten as:

Yn+1(xi) = Zin+1 + �i �in+1

The assumption of an unbiased measurement, in the sense thatE[�in+1] = 0, imply that:

E[Yn+1(xi) jZin+1] = Zin+1

which is Cartier’s relation. If we interpretZin+1 as a spatial average over the cell centered at the grid

node, and the measurementYn+1(xi) as the point value at an unspecified random point within that

cell, which is probably more realistic than assuming it located at a grid knot, this can be written as:

Zin+1 = Z(tn+1 ;vi)Yn+1(xi) = Z(tn+1 ; vi)

This interpretation allows the specification of the variances�2� = Var(�in+1), since:8>>><>>>:

Cov(Yn+1(xi) ; Zin+1) = Cov(Z(tn+1; vi) ; Z(tn+1 ;vi) = � 2v

Var(Yn+1(xi)) = � 2v + � 2

� 2� = � 2

v � � 2v = S2[vjv]

This variance should be specified by the dispersion variance, which can be obtained from the vari-

ogram, according to:

S2[v jv] = (v;v)� (v; v):

where we used the following standard notation:

(v;v) =1jvj2

Zv (x� y) dx dy

In practice, since the unconditional variances at the supportv are calculated by the filter, following in

the linear case the equation:

�n+1 = F �n F t + AAt

the model is non stationary, and the variances Var(Zin+1) = �n+1(i; i) depend on the nodei. If we

apply a proportional effect model, which amount to say that the short scale spectrum is in every cell

proportional to a reference spectrum, the dispersion variance has to be calculated from a reference

variogram, and then corrected by the variance ratio to obtain the error variance.

14 Support effect models and data assimilation

3.2 Lognormal model

The pure Gaussian case is relatively uninteresting, since the distributions at any support are obtained

by merely applying a variance correction. More interesting models are obtained when the distributions

are assumed log-normal:

Z(vi) = mi expfsiv Y iv � 1

2si 2v g

Z(vi) = mi expfsiv Y iv � 1

v gThe simpler log-normal model assumes that the pair(Y i

v ; Y iv) is Gaussian with correlation coefficient

r, and standard margins. A more general model, termed hermitian model is outlined in the appendix.

The variance relation Cov(Z(vi) ; Z(vi)) = Var(Z(vi)) can be shown to impose the following equa-

siv r = siv

For data assimilation, the conditional distribution :

L(Z(v1); ::; Z(vn) jZ(v1) ::; Z(vk))

must be specified. According to the model, these variables are independent lognormal. The associated

density is explicitely given by:

L(Z(v) jZ(v)) � Yifi(zi jZ(v)) =

� 12�

�n=2 Yi

1�i zi

exp��1

2yi(zi) 2

�The log-normal parameters and transformation involved in this expression are given by:8>>>>><>>>>>:

�i = sivp

1� r2

�i = mi expfsivrY iv � 1

v r2gyi(z) =

log� z�i

�+�i2

and the conditioning variables are:

Y iv =

log�Z(vi)

These expression make possible to obtain the posterior distribution, and thus to do Bayesian data

assimilation.

Chapter 4

Monte-Carlo approach

Monte-Carlo approaches for the change of support of Gaussian transformed random functions have

been considered for a long time (see [16]). Some twenty years ago computer cost was still a serious

limitation of the method, but the advent of fast and cheap computers makes this limitation less and

less relevant. Simulation based methods are now widely used, partly in the context of Bayesian geo-

statistics (see [3]). Recently, Monte-Carlo change of support was considered in [2], for a Gaussian

transformed random function. More generally, the approach could benefit from the advances in con-

ditional simulations of random fields, and other models can be considered. This chapter takes a closer

look at that perspective.

4.1 Change of support using conditional simulations

When conditional simulation algorithms capable of handling the data at hand are available, an obvi-

ous alternative to change of support models is simulation at fine scale of the spatial field. The desired

quantities are then obtained by numerical integration, and conditional expectation can be approxi-

mated by repeated independent simulations and standard Monte-Carlo approximations.

Explicitely, if Zkc (x); k = 1; ::;m are independent conditional simulations ofZ(x), we have to:

1. approximate the spatial averages by numerical integration:

Zkc (v) =1jvj

ZvZkc (x) dx � X

iwi Zkc (xi)

2. and approximate the desired conditional expectations by averaging over the realizations:

E[h(Z(v)) jdata] � 1m

h(Zkc (v))

The apparent simplicity of the method makes it very attractive, but a collection of conditions to its

success have to be satisfied. Let us review these.

16 Monte-Carlo approach

4.2 Conditioning upon the data

The first objective of the method is the generation of random field which respect the available data.

Pure Gaussian random functions can be elegantly conditioned on data which are linear transforms, like

spatial averages, or more generally, convolutions. The idea underlying the algorithm is the following

split-up:

Y (x) = Y k(x) + (Y (x)� Y k(x))

whereY k(x) =P�i(x)Y (xi) is the simple kriging estimate ofY (x) andY (x) � Y k(x) is the

associated estimation error. We use the letterY here to denote Gaussian random function instead of

the notationZ used so far for reasons that will be apparent latter. The key property endowed by this

Gaussian model is that the two terms in the above decomposition are independent random functions.

Thus, ifY s(x) is another Gaussian function with the same distribution thatY (same mean and same

covariance), then the estimation errorY s(x)�P�i(x)Y s(xi) has the same distribution than the true

error. This shows that a conditional simulation can be obtained using this simulated error, as in the

following expression:

Y sc (x) = Y k(x) + (Y s(x)�X

i�i(x)Y s(xi))

This procedure applies to any data which are linear transforms ofY , and not only to point dataY (xi)as considered above.

Simulation of Gaussian anamorphozed spatial field -this terminology is used to denote functions

of the formZ(x) = �(Y (x)) whereY is as above a Gaussian spatial function- is standard when data

are at point support and when� can be inverted. The values ofY (xi) are then known and we are back

to the previous situation.

When data is still at point support, but� cannot be inverted, Gibbs sampling can be used [4]. This

situation is encountered for instance in the Gaussian representation of lithofacies, and then� is an

indicator function of a setZ(x) = 1Y (x)2A. It should be noted that this solution is at the price of an

approximation, since Gibbs sampling is only an asymptotic method.

The more complex case of data at non point support is obviously even more complicated, since

data now have the form:

Z(v) =1jvj

�(Yx) dx

Simulated annealing is a possibility to handle such complex constraint. It has been used effectively to

data which are quadratic means, a case which happens in dealing with seismic data [6] and [7]. But

again this is only an asymptotic method, and convergence is more difficult to assess than in the case

of Gibbs sampling considered previously.

4.3 Short scales behavior and truncations

A second point to consider is truncation effects. Generally truncation effect happens at high frequen-

cies, because simulation is calculated only on a finite grid, which by its discrete nature cannot account

Monte-Carlo approach 17

for short scales. In the case of turning band method, another asymptotic to consider, is the fact that

isotropy and multinormality are only achieved for an infinite number of directions generated. Finally,

in the case of the gamma random field considered in more details in the appendix, the simulation

require an infinite number of components, and thus truncation occurs again.

The third point point to discuss, which has obvious connections with the previous one, is the

approximation of spatial integrals by finite sums, following a numerical integration procedure.

Because of these two approximations, a part of the support effect is not accounted for by Monte-

Carlo simulations.

4.4 Monte-Carlo approximation

And finally, we have to consider the convergence of the Monte-Carlo procedure itself. The rate of

convergence is known to be very poor, the error variance being proportional to the inverse of the

number of simulations performed. The order of magnitude of the error is thus

� �q

Var(hZ(v))1pm

which is indeed very slow. A classical mean to improve Monte-Carlo methods is importance sam-

pling. This involves the use of an alternative simulation model in which the variance would be less.

But importance sampling would require the density function at block support to be known. Another

classical idea is to generate anti-correlated rather than independent samples to speed up the conver-

gence. Neither of these techniques seems directly applicable to the generation of random field.

A more realistic alternative to speed up convergence could be through the addition of simulated

conditioning points, for which the required densities are available. If these points are chosen in an ap-

propriate way, for instance at the block centers, they can conditioned strongly enough the simulations

to speed up convergence.

To conclude this section, we stress that Monte-Carlo methods offer advantages which can pay-off their

computational complexity:

� They are exact asymptotically. By this we means that once the spatial model is specified at point

support, no modeling approximation is required to obtain the distributions at various support.

Only numerical approximations which can to some extend be controlled, are necessary.

� They offer full flexibility concerning the geometry of the supports. No compatibility constraint

exists, and various supports can be simultaneously considered.

Chapter 5

Application to AIRPARIF air qualitydata

Air quality data of Paris region have been made available to the participants of the IMPACT project

by the AIRPARIF association. Contacts have been taken with the Air Pollution group at Laboratoire

de Météorologie Dynamique (LMD) so that the data can be supplemented with runs from the model

CHIMERE (see [14], [15] for a description and validation of the model and [12] for a use in sensitivity

analysis).

5.1 Presentation of 1999 data

This study considers only data from the year1999, measured at23 fixed stations forNO andNO2,

as well as at19 stations forO3. The location of the stations is shown on Figure5.1.

Data are hourly measurements. Since our aim is only to illustrate the support and models, we

have to select one variable to study. Which one to use? It is known that due to the fast dynamic of the

chemical reaction:

NO + O3 ! NO2 + O2

ozone is consumed by this reaction as long as there isNO available, and conversely. As a result, we

expect these two to be highly anticorrelated, in the sense that they rarely coexist. The Figure5.2shows

that this is indeed the case. On this figure we have separated the plots concerning the seasons, and

also represented by different characters the days of week, in order to explore these effects meanwhile.

This figure is limited to the data from the station labeledParis 6, but the observed effects are typical.

It is interesting to analyze the periodicities in the data. Yearly and dayly cycles are of course

expected, and this will be referred to as seasonal and hourly effects. But in addition to this, we want

to explore the day-of-week effect, since human activity is depending on it, and can be considered as

entirely responsible of this effect. To this aim, we calculated the yearly averages under each specific

Season� Day� Hour combination. The results, for theParis 6station are displayed on Figures5.3

to 5.5 for ozone,NO andNO2 respectively. On these graphs, presented in matrix form, each line

represents a season, and each column a day-of-week. The seasonal effect appears clearly on these

Application to AIRPARIF air quality data 19

560000 580000 600000 620000 640000 6600002360

Paris6Paris7Paris13

Paris18

MontgeronGarches

GenneV

NeuillyAuberV

VitryS

MantesTremblay

Fontainebleau

Rambouillet

Paris12Versailles

BobignySaintDenis

Argenteuil

MontgéFremainV

Saints

Prunay

NOx and O3 measured

NOx only

O3 only

Analysers locations

580000 590000 600000 610000 62000024

Paris6Paris7

Paris13

Paris18

Montgeron

Garches

GenneV

Neuilly

AuberV

VitryS

Tremblay

Paris12

Versailles

Bobigny

SaintDenis

Argenteuil

Analysers locations (details)

Figure 5.1: Locations ofNOx andO3 stations are displayed in black, Stations measuringNOx only arerepresented in red, andO3 stations are in blue.

graphs, where higherO3 concentrations are observed in Summer. The maximum daily concentrations

are observed around12 am, an effect known to be due to thermal convection. As a consequence we

observe a minimum ofNO at the same time. The day-of-week effect is particularly evident in Winter

and in Autumn, the increase in terms ofNO, as well asNO2, from Sunday to Saturday, is followed

by a decrease, the minimum being observed on Sunday for this station. The accumulation effect of the

air pollution is obvious if it is assumed that this results from the decrease of activity in the week-end.

To reduce the effect of this chemical reaction we decided to study for the future the sumNOx =NO +NO2, which can be expected more stable than either variable considered separately.

5.2 Future work

The planned work on these data is the following

1. Estimate the three periodicities, that is the season the day of week and the hour effects. These

will be considered further as mean terms.

2. Tailor a change of support model in the time dimension. Empirical regularization of data make

it possible to validate a model.

3. Make the assumption that spatial variability can be compared to time variation, and thus obtain

a change of support model in the space domain.

20 Application to AIRPARIF air quality data

Winter

ooooooooooooo

ooooooooooooo ooooooooooooooooooooooooooooooo

oo oooooooooooooooooooooooo oooooooooooo ooooooooooo ooooooooo ooo oo oooooooooooooooooooooooooooooooooo ooooooooooooooooooo

oooooo

ooooooooo ooo ooooooooooooo oooooooo o ooooo ooooooooooooooooooooooo ooooooo

o o oooooooooooooo ooooooo ooo ooo ooooooo o ooooooo oooooo ooooooooooooooooooooooo oooooooooooooo oooooooooooooooooooooooo ooooooooooooo oooooooooooooooooooooooo

ooooooooooo o

ooooooo oooooooooooooooo oooo ooooooooo ooooooo oo oooooooooooooooooo o oooo oo+++ +++ + ++ +++++++++++++ ++++++++++++++++++++++ ++ +++++++ + +++++++++++

++++++++++

++++++++ ++++++++++

+++++++++++++ +++++++++

++ ++++++++++

++++++++++ ++++

++++++ +++++++++++ ++++++++ + ++++++++++++++++++++++++ +++++++++++++ + +++ ++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++ + ++++++ ++++ +++ ++++++++++ ++++++++++++++++++++

++++++++

++++++ ++++++++

++++++++++++ ++++++++++++++++ +++++++++++++ +++++ + +++++++++++++ xxxxxxxxxxx xxxxxx

xxxxxxxxx x xxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x

xxxxxxxxxxxxxxxxxxxxxxx

x xxxxxxxxxxx

x xxxxxxx x xxxxxxxxxxx xxxxx xxxxxxx xxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx

xxxxxxxxxx

xxxxx x xxxxxxxxxxxxxxxxxxxxx xx xx xxxxxxxxx x

xxxxxxx xx xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xx xxxxxxxx xx xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxx

xxxxxx

xxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

********

******

*********

** * ********

**** ** ***************** * **********

****************** *** **************** * * **** ***

*** ** * ****** **

********* ** * ********** ************* * *****

******* *

************************ * * **************** ************

***************

Spring

oooooooooo ooooooooo oooooo oooooooooooo oooooooooo ooooooo ooo o oooooooooooooooooooooooo oooooooooooooooooooo

ooooo ooooooooooo oo o ooooooooooooooooooo o ooooooooo

o ooo o oooooooooooooooooooo oo o oooooooooooo oo ooooooooooooooooo ooooo oo o oooooooooo oooooooooooo oooo ooooo ooo ooooooooo oo ooooooooo oooooooooooooooo oooooooooo o

oo oooooooo ooooooo oooooo ooooooooooooo oo oooooooooo ooo oooo ooooo

ooooooo

o o o ooooooooooo oooo ooooo o ooooo oooooooooooooooo ooo oo o ooooooooooooo ooooooo oo o oooooooooooooo oo oooooooo oooooooooooooo ooooooo ooo

o ooooo ooooooooooooooooo o oooooooooooooooooo oooooooo o ooooooooooooooooooo ooooooooooooooooo ooooooo o o o oooooooooooooo ooooooo ++++++++++++++++++++++ +++++++++++++++++++++++++++ +++++++++

++++++++

+ ++++++++++++++ ++++++

+++++++++

++++++++++++ + +++++++++++++++++++++

++++++++++++ ++++++++ +++++++++++++++++ +++++++++ + +++++++++ +++++++

+ +++++++++++++++++++ ++++++ ++++++++++ + +++++++++ ++++++++++++++ + ++++

+ +++++ +++++ +++++++++++++ + +++++++++++++++ +++++++ ++++++ ++++++ ++++++ ++++++ + ++ ++ +++++++++ +++++++++++++++++++ + +++ ++++ +++++++ + + ++++ +++ +++++ +++++++++++ + + +++++++++++++++++

++ + +

++++++++++++++++++++++++

+++++++++++ + +++++++

+ + +++++++++++++++++++ ++++++++++++++++ + +++++

+++ + +++++++ +++++++++++ ++

+ ++ ++ ++++++++ +++ ++++++++++ + + +++++++++++ + +++++++++++++ + ++ +++++++++++++++ +++++++++ xxx xxxxx x xxxxxxxxxxxx xxxxxxxxxxxxx xx xxxxxxxxxxx xxxxx

xxxxxxxxxxxx

xxxxxxxxxxx

x xxxxxxxxxxxxxxxxxxx

xx xxxxxxxxxxxxxxxxxxx

xxxxx x

xxxx x xxxxx xxxxx

xxxx xxx xxxxxxxxxxxxx

xx x xxxx xxxxxxxxxxxxx

x xxxxxxxxx xxx xxxxxxxxx

x x x x xx xxxxxx xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxx xxxx xxxxxxxxxxxx

xxxxxxx xx xxxxxxx x x xxxxxxxxxxx x xx xxxxxx x x xxxxxxxxxxxxxxx x xxxxxxxx xxxx xxxxxxxxxxxxxxxxxxx xx x xx xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxx x x x

x xxxxxxxxxxxxxxxxxxx xxxx xxxxxxxxxx x xxxxxxxx x x xxxxxxxxxxxxxx xxxxxx x xx x xxxxxxxxxxxxxx xxxxxxx

xx xx xxx xx xxxxxxx xxxxxxxxxxx xxxxx xx xx xxxxxxxx xxxxx x xx xx x xxxxxxxxxx xxxxxxxx x

xx xxx

x xxxxxxxx x********** * ******** ** * *** ****

********

* * ********

* * ************ ** ** ****** ** ************ * ******* * * * * *** *** **** ******

** ** * *********

*** *****

***** ***** ** * ********** * * * * ** ****** * * * ******

* ** * ****** * **** * *******

** ****** ** *** **** * ** ** ************** * ******** * * * * *********** *** ***** * * *** * ******** *

Sun−MonTue−WedThu−FriSat

0 50 100 150 200

Summer

oo ooooooo oooooooooooo oooooooooooooooo oooooooooo ooo ooooooo o oo oooooooooooo

o o ooooo ooo oo o ooooooooooooooooooo o o o o ooooooooooo oooooooooooooooooooo ooooo ooooooooo

o o ooooooo oooooooooooooooo oooooooooooooooooo o ooooo oooooooooooooooo ooo oooo o ooooooo ooooo oooooo ooo o o oooo o oooooooooooooooo o o oooooooooooooooooooooo o o o oooooooooooooo oooooo o o o o o oooooooooooooooooo o o o o ooooooooooooooo oooo o o ooooooooooooo o ooooo oooo o o o o o o ooooooooooo oooooo oo oo o o oooooooooooo o ooo oooo oooooo ooooooooooooooooooo oo oo oooooooooooo oooooooooooooooooooooooooooooooo oooooooooooooo oo oooooooo o o o oooooooooooooooooooooo o o o oooooooooooo oooooooo o o o oo ooooooooo oooo oooooo oooo o o+++++

+ + +++ +++++++++ + ++++++ + + + +++ +++++++++++++++++++ ++ + ++++++++++++ ++ +++++++ ++ + + + +++ ++++++ + +++++++++++ + + + +++++++++++++++++++ + + ++++++++++++++++++++++

+ + + +++++ ++++++++ +++++ +++++++++++++++++ +++ +++++++++++++ ++++++++++ + +++++++++++++++++++++++

+ + +++++++++++ + + ++++++++ + + ++ +++++++++++++++

+++++++++++ +++ +++++ + + + + +++++++++++++++++++++ + ++++++ + ++ +++++++ ++++++++ ++ + +++++++++++++++++++++ + + + + ++++++++++++++++

++ + + ++++++++++++++++++++ ++ + ++ ++++++++++ +++++ + +++++ + +++++ +++++++ ++++++++++++ ++++++ + +++++++ +++++++ ++ ++ +++++++++ +++++++++++ + ++ +++++++++++++ +++++

+++++ + + + ++++++++++ + +++++++++

++ + ++++++++++ + + + +++++

+++ + + + +++++++++++++++

+++ + + ++++++++ +++xx xxx x x x xx x xxx xxxxxxxxxxxxxxx x xxxxxxx x xxxxxxxxxxxxxxxxx xxxx xxxxxxxxxxxxxxxxx

xxxx xxxxx x xxxxxxxxx

x x xxxxxxxxxxxx

x xx xxxx x x x xxxxxxxxxxxxx xxxxxxxxx xxxxxxxx

xx x xxxxxxxxxxxx xxxxxxxxx xx x xxxx xxxxxxxxxxxxxx

xx x xxx xxxxxxxxxxxx xxxxxxxx x xxx x xxxxxxxxxxxxxxxxx

xx xx xxxxxxxxx xxx xxxxxxx xxxxxxxxxxxxxxxxxx xx xxxxx x x xxx xxxxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx xxxx xxxx xxxxx xxxx xxxxxxxx x x x x xxxxxx xxxxxxxxxxxxx

xx x x x xxxxxxxxxxx x xxxxxxxx x xxx x xxxxxxxxxxx xxxxx xxx x x xxxxxxxxxxxxx xx x xxxxxxxx x xxxxxxxxx xxxxxxxxxxx x

x x xxxxxxxxxxxxxx xxxxxx xxxxxx xxxxxxx xxxxxxxxxxxx x xxxxxxxxxxxxxxx xxxxxxxxx

xxxxxx

xx xxx xxxxxxxxx x x xxxxxxxxxxxx x ** ******* ************** * ****** * ** * ******* ******* ******* * * * * ** * ********** *****

* * * * * **************** ** ** * * ********** ** * *****

** * * * * ************* ******** * * * * ** * * ******* ********* *** * * ** ********* ******** *

* * * * *********** ********

** * ** ****** **** *** ****

* * ** ** * ************* ****

* * * * * * ************* ****** * * * ********** **

0 50 100 150 200

Autumn

oo oooooooo o o o oooooooooooo

oooooo o

oo o o oooooooooo oo

oo o o o ooooooooooo

oooooooooo o oo o oooooooooooooooooooooo oooooooooo o

oo oooooooo o o oooooooooo o o ooooooooooo o o oooooooooo oooooooooooo ooooooooooooooo oooooooooooooooooooooooooooooooooo oooooooo oooooo ooooooooo o oooooooooooooooo oooooo o o o ooooooooooooooooooooooooo ooooooooooooooooooooo oooooooooooooooo oooooooo ooooooooo ooo

oooooooooo

o o oo oooooo oooooooooooooooooooooo

oooooo oooooooooo ooo oooooooooooooooooooo o o

oooooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooo ooooooo o ooooooooooooo ooooooooooo oooooooooooo + + +++++ ++ ++ + +++++++ ++++ +++++

++ ++++ + +++ ++++++++++++ +++++++ ++ + + +++++++ ++++++

+++ + + ++++++++++ ++++++++ + + ++++++ ++++ +++ +++++++++ ++ ++++++++++++++++++++ +++++++ ++++ ++ +++++ +++++++++++++++++ + ++++ +++++++ + ++ ++++++++++++

+++++++++++++++++++++++++++

++++++++++++++++ +++++

++++++++++++++++++++++++++++++

+++++++++ +++++++++ + +++++++++++++ ++++++++++++++++++++++++ ++++++++++++++ +++++++++++++++++++++++++

++++++++++++++++++++++++ ++++++++++++++++++

+++ + + ++++++++++++++++++

++++++

++++++++++++++

++++++++++++++++++++++++++++++

+ + +++++++++

+++++++++ +++

+++++ +++++++ + xxxxxx

xx x xxxxxxxxxxxxx xxxx

xx x xx xxxxxxxxx x x

x x x x x xxxxxx xxxxxxxx

x x x xx xxxxxxxxxxxxxxxxxxxxx

x x x xxxxxxxxx

x x xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx x xxxxxx x x xx xxxxxxxxx x x x xxxxxxxxxxxxx xxxxx xxxx xxxxxxxx xxx xxxxxxxxx x x xxxxxxxxxxxxxx xxxxxx

xxxxxxxxxxxxxxxx

xxxxxx xx xxxx xxx xxxxxxxxx

xxxxxxxxx

xxxxxxxxxx

xx xxxxx

xxxxxxxxxxxx

xxxxxxxx

xx xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx

xxxxxxxxx

xxxxxxxxxxxxx xxxxxxx

xxxxxxxxxx xx x

xxxxxxxxxxxxxxx

xxx xxxxxxxxxx

xxxxxxxxxxxx xxxx x ******* * * * * * * *****************

** * * ********* ******

*** ** * ** ********** ******

* * * ** ***** * *** * ******* *** ********** * **** * * * ************** ** *********

************************* ********* * ***************************

********* **** ***********

********* *** ****

******** 0

Paris6

Figure 5.2:Scatterplots ofNO andO3 hourly concentrations atParis 6, during the year1999. As expectedwhen one element is present, the other one is almost absent.

5 15 5 15 5 15

5 15 5 15 5 15 5 15

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

Figure 5.3:Hourly averages ofO3 at Paris 6station.

22 Application to AIRPARIF air quality data

5 15 5 15 5 15

5 15 5 15 5 15 5 15

Figure 5.4:Hourly averages ofNO at Paris 6station.

5 15 5 15 5 15

5 15 5 15 5 15 5 15

Figure 5.5:Hourly averages ofNO2 at Paris 6station.

Chapter 6

Conclusion

In this report, only a general theoretical review has been given. Change of support can be viewed

as statistical change of scale for additive variables, and there could be interesting connections with

self-similar models. We did not attempt to explore these connections here. We preferred to review

briefly the existing approaches, in order to make it possible to identify the needs concerning support

effect models in the course of the IMPACT project. We emphasized the interest of change of support

models when a simulation based on physical model is compared to data, as this is the case in data

assimilation. But the situation is probably relevant to other problems in which such comparisons are

attempted.

Appendix A

An example: the Hermitian model

In this section we describe a fairly general model for continuous distributions. It should be mentioned

that alternative models exist, some of them have continuous marginals, others are discrete, and mixed

models exists as well. So this should really be considered as an illustrative example.

The starting point of this model is a Gaussian representation of the concentrations at sample

support. It is assumed that a monotonous transform is known, termed anamorphosis� : R+ ! R,

such thatZ(v) = �(Yv) whereYv is a standard Gaussian variable. This transform is determined from

the distribution ofZ(v) by:

Fv(z) = P [Z(v) < z] = G[��1(z)]

whereG is the standard Gaussian distribution function. Thus we have� = (Fv)�1 � G, which is

always possible whenFv is continuous. The distribution at supportv is then continuous as well, and

we have the same representationZ(v) = �v(Yv) with a transform noted�v. It should be understood

thatYv is by no means the spatial average ofYv, since the average operates on theZ scale and not on

the Gaussian transform. This is the reason why we used the notationYv instead ofY (v).The model assumption concerns the bivariate pair(Yv; Yv) whereZ(v) = �(Yv). It will be as-

sumed that this pair follows a mixture of standard Gaussian distributions, termed Hermitian model.

Specifically, ifgr(u; v) is the density of a standard Gaussian pair with correlationr, the hermitian

distribution can be expressed as:

gh(u; v) =Zgr(u; v) !(dr)

where! is a distribution on[�1;+1]. In practice negative correlations are very unrealistic and the

domain of! can be assumed restricted to[0; 1]. The introduction of the mixing offer a higher level

of flexibility to the model, compared to the pure Gaussian model (obtained by taking! = �r). For

instance, if! = p �0 + (1 � p)�1, we obtain the mosaic distribution, for whichYv andYv are inde-

pendent with probabilityp, and equal with probability1� p. The change of support generated by this

bivariate model is in fact the affine change of support, according to which block and sample density

are the same up to an affine rescaling1. Models in which! is a beta distribution have been considered1The terminology in use is unfortunately a bit confusing, for what is termed mosaic change of support model is in fact

something different.

26 An example: the Hermitian model

by L .Hu [5].

Calculations are greatly simplified by considering hermitian polynomials:

�n(y) =1pn!

g(n)(y)g(y)

In this formula,g is the standard Gaussian density, andg(n) its n-derivative. It turns out that these

expressions are indeed polynomials, which can be calculated from a three terms recurrence. Hermite

polynomials are standardized and orthogonal relatively to the Gaussian distribution:R�n(y) g(y) dy = 0R

�n(y) �m(y) g(y) dy = �nm

Moreover, the orthogonality holds also for the bivariate Gaussian distribution:Z�n(y) �m(y

0) gr(y; y

0) dy dy

0= �nm rn

This result is a consequence of the following relation concerning the conditional expectation of a

Hermite polynomial:

if (Y; Y0) � gr then E[�n(Y ) jY 0 ] = rn �n(Y

In other words, the Hermite polynomials are the factors of the Gaussian model. A consequence of this

is the following expression of the bivariate density:

gr(y; y0) = g(y) g(y

0)Xk�0

rk �k(y) �k(y0)

This development holds forjrj < 1 only.

These expressions translate readily to the hermitian model. If now the pair(Yv; Yv) follows a

hermitian model, we have the following expressions, in whichE[Rn] =Rrn !(dr):

E[�n(Yv) jYv] = E[Rn] �n(Yv) (A.1)

E[�n(Yv) jYv] = E[Rn] �n(Yv) (A.2)

The interest of these expressions follows from the fact that Hermite polynomials allows represen-

tations of functions having finite order two moment relative to the Gauss density:

ifZ�(y)k g(y) dy < 1 for k = 1; 2

then,Y being a standard Gaussian variable,Y � N [0; 1]:

�(Y ) =Xk�k �k(Y ) for �k = E[�(Y ) �k(Y )]

An example: the Hermitian model 27

this development being convergent in the quadratic mean. Thus, ifZ(v); Z(v) satisfy Cartier’s rela-

tion, and if they are transformed fromYv; Yv, which follows hermitian model with parametersE[Rn],the transform�v can be obtained.

E[Z(v) jZ(v)] = Z(v) = �v(Yv) =X

�vk �k(Yv)

But, since� and�v are monotonous, we have, using (A.1):

E[Z(v) jZ(v)] =Xk�k E[�k(Yv jZ(v)] =

Xk�k E[Rk] �k(Yv)

Unicity of developments on factors legitimates the identification:

�vk = �k E[Rk]

Assuming� and! known, the distribution at larger support can thus be calculated, since it is fully

characterized by the coefficients�vk . The coefficients�k can be obtained from the distribution at sam-

ple support provided a stationarity assumption is legitimate. As an alternative to this, drift estimation

and removal could be done. But since it is known that the underlying variance is underestimated by

the residual variance, only in case of strong drift evidence, or in case of marked periodicity, should

this be attempted. Concerning the distribution!, the variance relation:

VarfZ(v)g =Xk�1

� 2k

nE[Rk]

gives another constraint which is enough to fully specify the model in the pure Gaussian, or in the

mosaic case, for only one degree of freedom remains. In the case of the beta distribution which have

two parameters, an arbitrary level of freedom remains.

Concerning convergence towards normality, let us first observe that in the pure Gaussian case,

since� 2v =

Pk�1 � 2

k r2k, is monotonous relatively tor on [0; 1], when�v ! 0, we have as well

r ! 0. Hence:

�v(Y ) � �0 + �1 r Y + 0(r2)

we can expect asymptotic normality ofZ(v) for large support. More precisely, we shall consider the

distribution of the standardized variable:

Z(v)� �0

r�1= Y +

Xk�2

�k�1

rk�1 �k(Y )

whereY � N [0; 1]. We shall see that the above sum tends in probability to zero. We have thus to

prove that8� > 0:

24��Xk�2

�k�1

rk�1 �k(Y )

�� > �

35 ! 0

In fact, since the variance is simply:

8<:Xk�2

�k�1

rk�1 �k(Y )

9=; =Xk�2

��k�1

�2r2k�2 � r2

Xk�2

��k�1

28 An example: the Hermitian model

This quantity tends to zero, by virtue of the fact that the variance at small support is� 2v =

Pk�1 � 2

and thusPk�2(�k=�1)2 is bounded.

Convergence in probability follows by application of Chebyschef inequality.

On the other hand, let us consider now the mosaic distribution,!(dr) = p�0(dr) + (1�p)�1(dr).The variance condition imposep! 1, and since the coefficients are such that:

�v0 = �0

�vk = (1� p)�k (k > 0)

we have� 2v = (1� p)2P

k�1 � 2k = (1� p)2� 2

v , and the parameterp is therefore determined by the

variances:

(1� p) =�v�v

The transformation at the larger support is given by:

Z(v) = m +�v�v

Xk�1

�k�k(Yv) = m +�v�v

(�(Yv)�m)

and, the distributions are such that:

Z(v)�m � �v�v

(Z(v)�m)

In other words, according to this model, the distribution at large support is obtained merely by sharp-

ening around the mean. The algorithm resulting from the application of this model is termed affine

correction, and it is known to have poor quality. In particular, convergence towards normality cannot

obviously be satisfied by this model.

We have just seen that asymptotic normality cannot be obtained by arbitrary choices of Hermitian

models. A sufficient condition is the convergence ofE[Rk]=E[R]! 0, uniform ink.

Appendix B

Monte-Carlo approach - simulation ofGamma model

This section is based mainly on a collection of papers published by Robert Wolpert and Katja Ickstadt

(see for instance [20], [19], and [1]). The applications considered by the authors concerns principally

epidemiology, where data can be considered as Poisson distributed, with intensity determined by

a latent spatial field to be estimated. The data augmentation algorithm proposed by the authors is

particularly tailored to this situation. Conditional simulations where data are the spatial average at

various support is a somewhat different problem.

B.1 Simulation by inverse Lévy measure

Let us recall that a random variable follows an infinitely divisible distributions if it can broken into an

arbitrary number of independent identically distributed random variable. So for each integern > 0,

the variableX can be decomposed into:

X = X1 +X2 + :::+Xn

whereXi are independent and identically distributed. Applied to spatial fields, if we were ignoring

the spatial correlation, the distributions at every support should be infinitely divisible. In practice, cor-

relation is more often the rule than the exception, and the argument does not have so firm foundations.

Nevertheless, it remains that spatial fields with infinitely divisible distributions are very interesting

from coherence point of view.

It is known from Lévy’s representation theorem, that positive infinitely divisible distributions, if

their mean is normalized in a suitable way, have positive Laplace transform, satisfying the formula:

� logE[e�sX ] =Z 1

0(1� e�su) �(du)

where� is a positive measure, termed Lévy’s measure, satisfying:Z 10

min(1; u) �(du) < 1:

30 Monte-Carlo approach - simulation of Gamma model

This condition ensures the convergence of the above integral, but in general, the measure of a neigh-

borhood of the origin is not finite. Let�c(u) = �[u;1[, which is finite foru > 0. Lévy’s theorem

being granted,X has a representation in terms of a Poisson process, which we describe now.

LetH be a Poisson process onR � R+, having intensitydt � �(du). By this we means that for

each measurable disjoint setsAi 2 R � R+, the random variablesH(Ai) are independent Poisson

variables with parameters�i =RAi dt� �(du). Moreover, given the numberH(A), the points of the

process inA are distributed independently, following the probability

1(t;u)2Adt� �(du)(dt� �)(A)

Then we shall see that the process defined fort > 0 by:

Xt =Z t

u H(dt0; du) =XiUi 1Ti2[0;t[

is such thatX1 is infinitely divisible, with Lévy measure�. In the above formula, we have denoted by

(Ui; Ti) the points of the processH.

Proof: Let � > 0, andX�t be the truncated integral:

X�t =

Z 1�

uH(dt; du):

Then, due to the fact that�c(�) is finite, there is a finite number of pointsN �t of the process in the set

[0; t[�[�;1[. The Laplace transform:

E[e�sX�t ] = E[expf�sX

iUi 1Ti<t 1Ui>�g]

can be calculated by conditioning onN �t , since each point follow independently each one another the

distribution:

(T;U) � P (dt0; du) =dt0t�(du)�c(�)

1t02[0;t[ 1u2[�;1[

We have thus:

Ehe�sX�

"�Z 1�

e�su �(du)�c(�)

�N�t#

= e��c(�) tXk

(�c(�) t)k

�Z 1�

e�su �(du)�c(�)

�k= exp

��c(�) t + t

Z 1�

e�su�(du)�

= exp�tZ 1�

(e�su � 1) �(du)�

Monte-Carlo approach - simulation of Gamma model 31

Hence, we obtain the log characteristic transform:

� log(Ee�sX�1) =

Z 1�

(1� e�su) �(du)

and in the limit, we have the desired result.

Simulation: A simple simulation method can be derived by observing that the projections on

theu axis of the process points in the bandt0 2 [0; t[, follow a Poisson process with intensity�(du) =t:�(du):

Ui such that0 � Ti < t have distributionUi � Po(t:�)

The associatedTi are uniformly distributed in the intervalt 2 [0; t[. The first step requires a method

to simulate Poisson process with intensityt�. The key to this is the following change of variable:(y = t �[u;1[ = t �c(u)

dy = �t �(du)

which transforms a process Po[t �] into a standard Poisson process (independence of counts over

disjoint sets is ensured by monotonicity of the transform, and ifNY (dy) � Po[dy], thenNU (du) is

also Poisson distributed with parametert:�(du)). Hence the algorithm:8>><>>:Yi � Po(1)

Ui = ��c(Yit )

Ti � U [0; t[

where��c is the inverse of�c.

B.2 Infinitely divisible process

We consider only positive spatial process, and we denote them byZ. For test functions having the

form� =Pi �i 1Ai whereAi are disjoint measurable sets, we writeZ(�) =

Pi �iZ(Ai). This nota-

tion is extended to test functions which are limits of this form when possible. The Laplace functional,

which is defined by:

L(�) = � logE[e�Z(�)]

can be used to characterize positive spatial fields, in a similar way Laplace transform characterize

the distribution of positive random variables. It turns out that infinitely divisible and positive spatial

process, with independence over disjoint sets have Laplace transform which can be written as:

L(�) =Z

(1� e�u�(x)) �(dx; du)

where� satisfies some constraints, analogous to the one specified in the previous section. Gamma

process are obtained when:

�(dx; du) = �(dx)e��uu

32 Monte-Carlo approach - simulation of Gamma model

The Laplace transform of gamma process can be calculated by first considering test functions�(x) =P�i 1x2Ai , for which we have:Z

(1� e�uP�i1Ai ) �(dx) =Xi

(1� e�u�i)�(Ai)

since: Z 10

(1� e�u�i)e��uu

du = log�

1 +�i�

�we obtain, after taking the limit:

L(�) =Z

log�

1 +�(x)�

��(dx)

Remarks:

� The contributions from disjoint setsAi enter additively into the Laplace transform, which en-

sures the independence of the random variablesZ(Ai).

� For� = s 1A, we obtain:

E[e�sZ(A)] =�

1 +s�

��(A)

which shows that the average ofZ overA is gamma distributed, with shape parameter�(A)and scale1=�:

Z(A) � Ga(�(A); ��1)

� Wolpert and Ickstadt propose the following extension of the gamma random field:

�(dx; du) = �(dx)e��(x)u

in which the scale parameter vary through space. However, in this model, the spatial averages

Z(A) are no longer gamma-distributed. Only if� is slowly varying through space, and if

the dimension ofA is small do we have approximate gamma distributions. The simulation

algorithm have to be modified to deal with this generalization.

B.3 Simulation algorithm.

The positive Lévy spatial process can be generated from a Poisson processH on R+ � Rn with

intensity�(dx; du) using the algorithm described in the previous paragraph:

Z(A) =Zx2A uH(dx; du) =

1Xi2A Ui

and more generally:

Z(�) =Z�(x)uH(dx; du) =

Xi�(Xi) Ui

Monte-Carlo approach - simulation of Gamma model 33

The proof is the same.

The generation ofH is almost unmodified if� is the product of two measures, such as�(dx; du) =�(dx)�(du). We describe now the algorithm proposed by the two authors, in a slightly simplified

version, for simulation of points inR+ �A, in the extended gamma case (space-varying�):

� Let Yi; i = 1; :: be a standard Poisson process onR+.

� LetXi follow the distribution:

Xi � �(dx)�(A)

� DetermineUi from the following equation:Z 1Ui

e��(Xi)u

u�(A) du = Yi

which is equivalent, ifEi(x) =R1x e�u=u du is the exponential integral function, to:

�(Xi)E�1i

� Yi�(A)

�In practice, only a finite number of points can be generated, and it is necessary to stop at some point

(note thatUi ! 0)

B.4 Conditioning to data.

Spatial fields generated by the algorithm considered so far in this section do not have spatial corre-

lation, and only the parameters� and� can break the spatial homogeneity. It is very easy to obtain

spatially correlated fields by moving averages. For example,Y being a gamma spatial process gen-

erated according to the model described in the previous paragraph,Y � Ga(�(dx); �(x)�1), we can

consider:

Z(y) =Z�(y; x) Y (dx) =

Xi�(y;Xi) Ui

Note that simulation at supportv are obtained simply by integration the kernel relatively toy. If:

�(v; x) =1jvj

Zv�(y; x) dy

Z(v) =Xi�(v ; Xi) Ui

Monte-Carlo simulations at various support seems easy to implement. The difficult problem remaining

is the conditioning to data. The authors have considered the case of conditionally Poisson distributed

data, that is, conditioned onZ, the data are:

Ni � Poisson(Z(vi))

with independence over disjoint supportsvi. The ingredient to simulateY conditioned onN1; ::Nk is

data augmentation.

Bibliography

[1] BEST, N., ICKSTADT, K., AND WOLPERT, R. Spatial poisson regression for health and expo-

sure data measured at disparate spatial scales.Journal of the American Statistical Association

95 (2000), 1076–1088.29

[2] CHRISTENSEN, O. F., J, D. P.,AND RIBEIRO, P. J. Analysing positive-valued spatial data:

the transformed gaussian model. IngeoENV – Geostatistics for Environmental Applications

(Amsterdam, 2001), P. Monestiez, D. Allard, and R. Froidevaux, Eds., Kluwer, pp. 287–298.15

[3] DIGGLE, P., TAWN , J., AND MOYEED, R. Model-based geostatistics.Applied Statistics 47, 3

(1998), 299–350.15

[4] FREULON, X. Conditionnement du Modèle Gaussien par des inégalités ou des randomisées.

Doctoral thesis, Ecole des Mines de Paris, Fontainebleau, 1992.16

[5] HU, L. Y. Mise en oeuvre du modèle gamma pour l’estimation de distributions spatiales. doc-

toral thesis, Ecole des Mines de Paris, Fontainebleau, 1988.26

[6] LAJAUNIE, C. Simulation of velocity fields under constraints on the stack values. Tech. Rep. N-

8/98/G - confidential, Centre de Géostatistique, Ecole des Mines de Paris, Fontainebleau, 1998.

[7] LAJAUNIE, C. Simulation of velocity constrained by stack analysis model, methods and a case

study. Tech. Rep. N-20/99/G - confidential, Centre de Géostatistique, Ecole des Mines de Paris,

Fontainebleau, 1999.16

[8] LARSEN, R. I. A new mathematical model of air pollutant concentration averaging time and

frequency.Journal of the Air Pollution Control Association 19(1969), 449–467.5

[9] MATHERON, G. Isofactorial models and change of support. In Verly et al. [17], pp. 449–467.7

[10] MATHERON, G. The selectivity of the distributions and the second principle of geostatistics. In

Verly et al. [17], pp. 421–434.7

[11] MATHERON, G. Change of support for diffusion type random functions.Mathematical Geology

17 (1985), 137–165.7

Bibliography 35

[12] MENUT, L., VAUTARD , R., BEEKMANN , M., AND HONORÉ, C. Sensitivity of photochemical

pollution using the adjoint of a simplified chemistry-transport model.Journal of Geophysical

Research 105, D 12 (2000). 18

[13] ORFEUIL, J. P.Interprétation géostatistique du modèle deLarsen. Tech. Rep. N-413, Centre de

Géostatistique, Ecole des mines de Paris, Fontainebleau, 1975.6

[14] VAUTARD , R., BEEKMANN , M., DELEUZE, I., AND HONORÉ, C. La pollution photochimique

en région parisienne simulée par le modèleChimère et l’influence du transport régional d’ozone.

Tech. rep., Laboratoire de Météorologie Dynamique, Ecole Polytechnique, 1997.18

[15] VAUTARD , R., BEEKMANN , M., ROUX, J., AND GOMBERT, D. Validation of a deterministic

forecasting system for the ozone concentrations over theParis area.Atmospheric Environment

35 (2001), 2449–2461.18

[16] VERLY, G. The block distribution given a point multivariate normal distribution. In Verly et al.

[17], pp. 495–516.15

[17] VERLY, G., DAVID , M., AND JOURNEL, A. G., Eds. Geostatistics for Natural Resources

Characterization, vol. C-122 ofNATO ASI Series C-122. Reidel, Dordrecht, 1984.34, 35

[18] WACKERNAGEL, H., THIÉRY, L., AND GRZEBYK, M. TheLarsen model from a deWijsian

perspective. IngeoENV II – Geostatistics for Environmental Applications(Amsterdam, 1999),

J. Gomez-Hernandez, A. Soares, and R. Froidevaux, Eds., Kluwer, pp. 125–135.6

[19] WOLPERT, R., AND ICKSTADT, K. Simulation ofLévy random fields. InPractical Nonpara-

metric and semiparametric Bayesian Statistics(Berlin), Müller, Dey, and Sinha, Eds., Springer-

Verlag, pp. 227–242.29

[20] WOLPERT, R., AND ICKSTADT, K. Poisson/gamma random field models for spatial statistics.

Biometrika 85(1998), 251–267.29

Geostatistical Approaches to Change of Support Problems - Theoretical Framework

Documents

Transcript of Geostatistical Approaches to Change of Support Problems - Theoretical Framework

Articulation of Legitimacy: A Theoretical Note on Confrontational and Nonconfrontational Approaches to Protest Policing

Bayesian geostatistical design: Task‐driven optimal site investigation when the geostatistical model is uncertain

Approaches 1 psychodynamic approach

A modified Levenberg?Marquardt algorithm for quasi-linear geostatistical inversing

Introducing Theoretical Approaches to Work-Life Balance and Testing a New Typology Among Professionals

Approaches 4 cognitive

Deep Operations: Theoretical Approaches to Fighting Deep

Geostatistical coal quality control in longwall mining

Race: A theoretical Introduction

Integration of geostatistical realizations in data assimilation ...

A modified Levenberg–Marquardt algorithm for quasi-linear geostatistical inversing

Multiple-point geostatistical simulation using the bunch-pasting direct sampling method

Available rotation capacity of wide-flange beams and beam-columns Part 1. Theoretical approaches

Mapping malaria risk in Bangladesh using Bayesian geostatistical models

Approaches in Impact Measurement

Bayesian geostatistical modelling of stunting in Rwanda

Geostatistical-simulation.pdf - Repositorio Académico ...

Geostatistical analysis and mapping of malaria risk in children ...

Geostatistical methods for disease mapping and visualization ...

Spatial analysis of GM populations in Sardinia using geostatistical and climate models