Post on 13-May-2023
Geostatistical Approaches to
Change of Support Problems
– Theoretical Framework –
C. LAJAUNIE, H. WACKERNAGEL
Report No 19 of Contract IST-1999-11313
December 2000
Technical Report N–30/01/GENSMP - ARMINES, Centre de Géostatistique
35 rue Saint Honoré, F-77305 Fontainebleau, France
http://cg.ensmp.fr
Contents
Summary 4
1 Introduction 5
2 Geostatistical model-based approach 7
2.1 Classical presentation of change-of-support models. . . . . . . . . . . . . . . . . . 7
2.2 Basic principle of the geostatistical model. . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Spatial models and support effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Support effect models and data assimilation 12
3.1 Gaussian model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Lognormal model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Monte-Carlo approach 15
4.1 Change of support using conditional simulations. . . . . . . . . . . . . . . . . . . . 15
4.2 Conditioning upon the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Short scales behavior and truncations. . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Monte-Carlo approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Application to AIRPARIF air quality data 18
5.1 Presentation of 1999 data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Future work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Conclusion 24
A An example: the Hermitian model 25
B Monte-Carlo approach - simulation of Gamma model 29
B.1 Simulation by inverse Lévy measure. . . . . . . . . . . . . . . . . . . . . . . . . . 29
B.2 Infinitely divisible process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B.3 Simulation algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
B.4 Conditioning to data.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2
Summary1
This report is a deliverable of the EC funded “IMPACT” IST project on “Estimation of Human Impact
in Presence of Natural Fluctuations2.
Chapter1 gives a general introduction.
Chapter2 presents the geostatistical change of support model.
Chapter3 discusses the importance of support effect models in data assimilation.
Chapter4 presents a Monte Carlo simulation approach as an alternative to conventional change of
support models.
Chapter5 discusses first ideas in connection with the Paris area air pollution case study being set
up within the IMPACT project.
Chapter6 summarizes the conclusions of the report.
In the appendix ChapterA is giving details about the Hermitian change of support model and
ChapterB is dwelling about recent work by Wolpert and Ickstadt in the framework of a Gamma
model.
1The report is available as a pdf file and is best viewed with the Acrobat-reader, which allows to take full advantage ofthe internal links.
2See the Website:http://www.mai.liu.se/impact/ .
4
Chapter 1
Introduction
When dealing with spatial fields or time processes, it is necessary to assess the information carried
by the data in a precise way. It is acknowledged that data have a limited precision and therefore
measurement error models are frequently used to account for this. Another important aspect of data
which is less often considered is their support. It turns out that very often data can be considered as
spatial averages over fixed volumes, or time averages over intervals of fixed duration. This is obviously
the case for soil samples, at least as long as each sample which has a well defined volume, is properly
mixed up before analysis. But this is true also for air and water quality measurement, at least to a good
approximation, because the analysis is based on accumulation of material through filters during fixed
time intervals and under specific flux condition. This volume is termed in the geostatistics literature
thesupportof the data.
Why should we be concerned about the support of data, or in other words, on what problem
does the support play a significant role? Spatial fields generally have a large spectrum of variability,
including short scale components. This means that in spite of the fact that they have usually a long
range smooth tendency, they vary in a very erratic way at microscopic scale. The range of variability
of a spatial field or a time process can be analyzed for instance by Fourier spectrum, or equivalently
by looking at the spatial covariance. But this analysis is limited by the support of the data available.
Scales shorter than the data support are filtered out when only spatial averages at that volume are
considered. This means that a part of the natural dispersion is not reflected by the data. To sum this
up, while the overall average estimation is not biased by taking data at various support, the dispersion
is underestimated in a way which depends on the sample volume considered. This is affecting the
variance in an easily predictable way, but also the extremes of the statistical distributions and the
percentages of exceeding over fixed thresholds for instance. The change of distribution with the data
volume (or analyzed length in case of time data), is termed support effect in geostatistics.
This effect has long been recognized in the literature on air quality data. In order to make correc-
tions on the distributions according to the integration time of the measurement devices, which vary
from case to case, the well known Larsen model had been proposed [8]. It has been used as a standard
to normalize air quality measurements, relative to the integration time. However this model is known
to rely on particular hypotheses, namely lognormality of the distributions and a de Wijs variogram
5
6 Introduction
model for the logarithmic variance, which are by no means general, and since more general models
are available, the Larsen model can be considered as outdated (see [13], [18]).
The support effect is not only important when comparing statistical distributions of sample data,
but also when deriving estimates defined in terms of spatial averages. For instance, we might be
interested in estimating the conditional probability that the average over a specified area exceeds a
critical value. In agriculture, the probability that the average over a farming field is below a critical
threshold, given the sample data, might be a criteria to consider in order to decide if nutrients need
to be added to the soil. While the spatial averages can be estimated by classical kriging with no
need to model the support effect, and the kriging variance can be used to assess the precision of this
estimation, other quantities which depend on the conditional distribution, such as probabilities over
thresholds, require a handling the support effect.
Another situation, more of concern on account of the IMPACT project objectives, is encountered
when doing data assimilation. In this situation, a mechanistic model is available, which is supposed
to be able to handle the main physical process governing the dynamics of the natural system. As a
rule, such models suffer from several shortcomings, which limit their prediction capability. Among
these shortcomings, are the requirement on initial and on boundary conditions which are in practice
imprecisely known. The lack of knowledge of the physical parameters involved, and also perhaps
model inadequacy in some respect, are other ones. Finally sensitivity to initial conditions is known to
be a more fundamental obstacle to such time forecast. On the other hand, as measurements become
available with time, they can be used to improve the estimation of the state of the system provided
by the model. This process of state estimation update is termeddata assimilationin the literature.
The derivation of this estimate is based on the specification of a model linking the unknown state
to measurements. It can often reasonably be assumed that this link is adequately represented by a
Gaussian error model. It will be argued in this document that a much more appropriate representation
would be through a change of support model.
One obvious reason for this is that mechanistic models are aimed at representing the physical
process at the spatial resolution of the discretization grid. The shorter scales are simply not accounted
for, and the physical parameters are supposed to be known at the scale resolution required by the
model. On the other hand, the data are available at the measurement support, and this support is
usually considerably smaller. Moreover, the data are rarely at the same location as the nodes of the
discretization grid. This situation seems ideal for geostatistical change of support models.
Chapter 2
Geostatistical model-based approach
In environmental studies an important problem is to characterize the amount of trespassing of a
prescribed environmental threshold. This amount depends on the statistical distribution of the envi-
ronmental variable at hand. The distribution will actually be significantly modified by a change of the
support of the environmental variable. This phenomenon was at the heart of the development of geo-
statistics half a century ago. We begin developing change-of-support models in their initial historical
setting, which was in mining.
2.1 Classical presentation of change-of-support models
Geostatistical change of support models have originally been setup to allow recoverable reserve esti-
mation in mining applications. This problem arises in selective exploitation, in which an attempt is
made to separate the economically valuable parts of the orebody to the rest in an effective way, at a
fine level. This methodology is essentially the result of work done by Georges Matheron (see [9], [10]
and [11] and references mentioned there).
The problem can be described as follows. The orebody is first divided into panels. Each panel is
then cut into blocks of equal size, this size being limited roughly by the dimension and maneuverabil-
ity of the trucks used to move the minerals. Now each block should be led to the plant only if it’s grade
is worth the treatment cost, otherwise it is left to the waste. From this, result the definition of a cutoff
gradezc, and the recoverable tonnage of a panel is proportional to the percentage of blocks which
have a grade above this cutoff. Furthermore, the selected quantity of material (metal) is proportional
to the grade of the selected blocks. Hence, the two following quantities have typically to be estimated
to obtain the panel economic estimation:
T (zc) =1N
Xi2P
1Z(vi)>zc (2.1)
Q(zc) =1N
Xi2P
Z(vi) 1Z(vi)>zc (2.2)
In these formulae,vi ; i 2 fPg are the blocs within the panelP, and their number isN = Card(P).
7
8 Geostatistical model-based approach
We have noted1A the indicator function, which is equal to1 if the conditionA is satisfied and0otherwise.Z(vi) denotes the grade of the blockvi, which is the spatial average:
Z(vi) =1jvij
Z1x2vi Z(dx)
We have writtenZ(dx) instead ofZ(x)dx to emphasis the high spatial irregularity of grades, which
imply that point valuesZ(x) do not have in general a precise meaning.
Two points are worth mentioning concerning formulae (2.1) and (2.2):
1. The first one is that the quantities involved in the formulae are at block support,v. This support
has usually an order of magnitude which is a few10 m3, to be compared to the few litters for
the core data support. In other words a considerable change of support is involved to go from
the data support to the support of concerns for the estimation.
2. The second one is that both quantities are non linear transforms of the block grades. This means
that we have to be concerned with the conditional distribution of the blocks, and not merely by
their conditional expectation.
2.2 Basic principle of the geostatistical model
The building block of the geostatistical approach to the handling of support effect is bivariate model
aimed at describing pairs in which a sample is randomly located within a larger unit (block). We shall
denote byv andv respectively the sample and the block volumes, and we shall assume that these two
are compatible in the sense thatv can be partitioned into units equal tov up to a translation. From this
covering by a finite number of disjoint samples, a random uniform sample withinv can be considered.
This can be done by labeling the samples and by picking a label at random with equal probability. The
random concentration of this block will be denoted byZ(v). The model we are looking for is thus the
probability distribution of the pair(Z(v); Z(v)).What can be assumed known about this distribution? Three guidelines can be used:
� The trend�(x), and the covarianceC(h) can be estimated from standard methods. The first and
second momentum can be calculated from these, since:
�(v) = E[Z(v)] =1jvj
Z1x2v �(x) dx (2.3)
�(v)2 = E[(Z(v)� �(v))2]
=1jvj2
Z Z1x2v 1y2v C(x; y) dx dy = C(v;v) (2.4)
and the same holds for the mean and variance ofZ(v). Concerning the covariance we have:
CovfZ(v) ; Z(v)g = E[C(v;v)] =1NXiC(vi;v)
=1NXi
Zvi
ZvC(x; y) dx dy = �(v)2
Geostatistical model-based approach 9
Thus, the covariance between a randomly located sample and a block is equal to the block
variance.
� Now due to the additive nature of the variable, the concentration overv is the arithmetic average
of the sample concentrations. The consequence of this on the bivariate pairZ(v); Z(v) is the
following:
E[Z(v) jZ(v)] = Z(v)
This relation, which is known as Cartier’s relation, has profound consequences on the bivariate
distribution model, despite its trivial appearance. In fact, it can be shown that more generally, for
any pair of probability distributionsFX andFY defined onR+ (of positive random variables),
the following statements are equivalent:
1. One can find a pair of random variablesX andY , having probability distributionsFX and
FY , which satisfy Cartier’s relationE[X jY ] = Y .
2. For every convex function�, we have:
E[�(X)] =Z 1
0�(x)FX(dx) �
Z 10
�(y)FY (dy) = E[�(Y )]
3. For every cutoffs � 0, we have:
E[(X � s)+] =Z 1
0(x� s)+ FX(dx) �
Z 10
(y � s)+ FY (dy)
= E[(Y � s)+]
where we have used the standard notationu+ = u 1u>0 = max(u; 0).
4. If xc andyc give the same proportions for the distributionsFX andFY respectively, that
is if FX(xc) = FY (yc), then we have:
E[X 1X�xc ] =Z 1xc
xFX(dx) �Z 1yc
y FY (dy) = E[Y 1Y�yc ]
Note that if any of this is true, in virtue of the first condition, we must haveE[X] = E[Y ]. Let
� be this common mean value. The second condition applied to the function�(x) = (x� �)2
implies that the variance ofX is not less than that ofY , and moreover that the same is true
for any convex dispersion measure. The third means that sums in excess of cutoff is always
decreasing with the support, and the last that if a fixed proportion is selected, the recovered
quantity is always less for the highest support. In other words, selectivity is always decreasing
with support.
� It can be expected that for very large supports, distributions tend to normality due to the central
limit theorem. While conditions that ensure theoretically this result do not necessarily hold in
practice, this can be accepted as an heuristic and a guide in the design of model.
10 Geostatistical model-based approach
2.3 Spatial models and support effect
In spatial applications, estimation is often required at large support, and thus some form of change
of support is needed. When only the averageZ(v) and estimation variance is required, classical
linear kriging is called for, and only the variogram is necessary to this aim. But, as discussed in the
introduction, when quantities which depends in a non linear way ofZ(v) are to be estimated, such as
�(Z(v)), for a given function�, change of support models are required. Two forms of estimators are
classically used in geostatistics:
� Conditional expectationE[�(Z(v)) jZ(v1); :::; Z(vn)], which require the specification of the
multivariate distributionL(Z(v); Z(v1); :::; Z(vn)).
� Disjunctive kriging, which have the following form:
f�Z(v)gdk =Xifi(Z(vi))
In this expression, the weighting functionsfi have to be optimized in order to obtain the least
estimation variance. The advantage of disjunctive kriging over conditional expectation, is that
it only requires the specification of the bivariate distributions:8<: L(Z(vi); Z(vj)) i; j = 1; ::; n
L(Z(v); Z(vi)) i = 1; ::; n
This in turn enlarge considerably the range of models which can be used. While the use of
conditional expectation is in practice limited to models which are transformed of Gaussian, like
log-normal, disjunctive kriging can be used in much more general situations.
In either case, the whole domain considered,D, must be hierarchically subdivided into parts of equal
shape and size, according to the following scheme:
1. Division ofD:
D = [kvk where vk \ vl = ; (k 6= l)
thevk being translated one to another.
2. Eachvk contains a variable number of samplesnk � 0 denoted here byZ(vki ). These will be
considered as randomly located invk and independently one to the other.
According to the model, the conditional distribution of the samples, given the blocksZ(vi); i = 1; ::is the product of the individualL(Z(vki ) j(vk)), conditioned by only the block value in whichvi is
located. In other words, the samples are conditionally independent, and each depends only on the
associated block value. This applies as well to the conditional distribution of several samples in the
same blockvi.One limitation of this approach is that only a limited number of support can be considered at the
same time:v;v andD in the above description. A few more can be considered simultaneously if we
Geostatistical model-based approach 11
are ready to include more levels in the hierarchy. In either case only a limited number of compatible
supports can be considered at the same time. The Monte-Carlo approach to be described next allows
to some extend to overcome this limitation, and to consider simultaneously multiple supports.
Chapter 3
Support effect models and dataassimilation
Data assimilation is the process of updating the state estimation of a mechanistic model using mea-
surements. When the model is describing a time evolving system, data assimilation is often performed
sequentially. By this we mean that data taken at timet are used to improve the state estimation at a
given time step, but are not back-propagated (are not used to improve the estimation of the past states),
and will not be used any more in future time steps. Our description will be placed in this context, but
the ideas and the formalism apply as well to other assimilation schemes.
3.1 Gaussian model
We shall consider a discrete time Markovian system, which we shall write as:
Zn+1 = F (Zn ; �n+1) (3.1)
where�n+1 is a random variable, which cannot be predicted attn, and is termed the innovation. Very
often, it turns out that an additive form is appropriate, in which case we have:
Zn+1 = F (Zn) + A�n+1 (3.2)
Moreover if the innovation is assumed Gaussian, an assumption frequently applied, the one step ahead
conditional distribution is:
L(Zn+1 jZn) = N [F (Zn) ; AAt] (3.3)
The transformationF usually represents a discretized form of a differential equation. It is not neces-
sarily linear, and might depends on imprecisely known parameters, in which case we have an inverse
problem to solve if we want to improve the parameter estimation.
Data taken at timetn+1 depends usually in a linear way on the state at that time, and an error
model frequently used is:
Yn+1 = H Zn+1 +B �n+1 (3.4)
12
Support effect models and data assimilation 13
where�n+1 is a standard Gaussian vector with uncorrelated components, independent of what hap-
pened until the measurement is taken.
Most often, measurements are assumed available at the node of the discretization grid of the
model, and the matrixH is accordingly just a selection matrix. Moreover, the error terms are assumed
independent one to the other, so that the equation (3.4) can be rewritten as:
Yn+1(xi) = Zin+1 + �i �in+1
The assumption of an unbiased measurement, in the sense thatE[�in+1] = 0, imply that:
E[Yn+1(xi) jZin+1] = Zin+1
which is Cartier’s relation. If we interpretZin+1 as a spatial average over the cell centered at the grid
node, and the measurementYn+1(xi) as the point value at an unspecified random point within that
cell, which is probably more realistic than assuming it located at a grid knot, this can be written as:
Zin+1 = Z(tn+1 ;vi)Yn+1(xi) = Z(tn+1 ; vi)
This interpretation allows the specification of the variances�2� = Var(�in+1), since:8>>><>>>:
Cov(Yn+1(xi) ; Zin+1) = Cov(Z(tn+1; vi) ; Z(tn+1 ;vi) = � 2v
Var(Yn+1(xi)) = � 2v + � 2
�
� 2� = � 2
v � � 2v = S2[vjv]
This variance should be specified by the dispersion variance, which can be obtained from the vari-
ogram, according to:
S2[v jv] = (v;v)� (v; v):
where we used the following standard notation:
(v;v) =1jvj2
Zv
Zv (x� y) dx dy
In practice, since the unconditional variances at the supportv are calculated by the filter, following in
the linear case the equation:
�n+1 = F �n F t + AAt
the model is non stationary, and the variances Var(Zin+1) = �n+1(i; i) depend on the nodei. If we
apply a proportional effect model, which amount to say that the short scale spectrum is in every cell
proportional to a reference spectrum, the dispersion variance has to be calculated from a reference
variogram, and then corrected by the variance ratio to obtain the error variance.
14 Support effect models and data assimilation
3.2 Lognormal model
The pure Gaussian case is relatively uninteresting, since the distributions at any support are obtained
by merely applying a variance correction. More interesting models are obtained when the distributions
are assumed log-normal:
Z(vi) = mi expfsiv Y iv � 1
2si 2v g
Z(vi) = mi expfsiv Y iv � 1
2si 2
v gThe simpler log-normal model assumes that the pair(Y i
v ; Y iv) is Gaussian with correlation coefficient
r, and standard margins. A more general model, termed hermitian model is outlined in the appendix.
The variance relation Cov(Z(vi) ; Z(vi)) = Var(Z(vi)) can be shown to impose the following equa-
tion:
siv r = siv
For data assimilation, the conditional distribution :
L(Z(v1); ::; Z(vn) jZ(v1) ::; Z(vk))
must be specified. According to the model, these variables are independent lognormal. The associated
density is explicitely given by:
L(Z(v) jZ(v)) � Yifi(zi jZ(v)) =
� 12�
�n=2 Yi
1�i zi
exp��1
2yi(zi) 2
�The log-normal parameters and transformation involved in this expression are given by:8>>>>><>>>>>:
�i = sivp
1� r2
�i = mi expfsivrY iv � 1
2si 2
v r2gyi(z) =
1�i
log� z�i
�+�i2
and the conditioning variables are:
Y iv =
1siv
log�Z(vi)
mi
�+
12siv
These expression make possible to obtain the posterior distribution, and thus to do Bayesian data
assimilation.
Chapter 4
Monte-Carlo approach
Monte-Carlo approaches for the change of support of Gaussian transformed random functions have
been considered for a long time (see [16]). Some twenty years ago computer cost was still a serious
limitation of the method, but the advent of fast and cheap computers makes this limitation less and
less relevant. Simulation based methods are now widely used, partly in the context of Bayesian geo-
statistics (see [3]). Recently, Monte-Carlo change of support was considered in [2], for a Gaussian
transformed random function. More generally, the approach could benefit from the advances in con-
ditional simulations of random fields, and other models can be considered. This chapter takes a closer
look at that perspective.
4.1 Change of support using conditional simulations
When conditional simulation algorithms capable of handling the data at hand are available, an obvi-
ous alternative to change of support models is simulation at fine scale of the spatial field. The desired
quantities are then obtained by numerical integration, and conditional expectation can be approxi-
mated by repeated independent simulations and standard Monte-Carlo approximations.
Explicitely, if Zkc (x); k = 1; ::;m are independent conditional simulations ofZ(x), we have to:
1. approximate the spatial averages by numerical integration:
Zkc (v) =1jvj
ZvZkc (x) dx � X
iwi Zkc (xi)
2. and approximate the desired conditional expectations by averaging over the realizations:
E[h(Z(v)) jdata] � 1m
mXk=1
h(Zkc (v))
The apparent simplicity of the method makes it very attractive, but a collection of conditions to its
success have to be satisfied. Let us review these.
15
16 Monte-Carlo approach
4.2 Conditioning upon the data
The first objective of the method is the generation of random field which respect the available data.
Pure Gaussian random functions can be elegantly conditioned on data which are linear transforms, like
spatial averages, or more generally, convolutions. The idea underlying the algorithm is the following
split-up:
Y (x) = Y k(x) + (Y (x)� Y k(x))
whereY k(x) =P�i(x)Y (xi) is the simple kriging estimate ofY (x) andY (x) � Y k(x) is the
associated estimation error. We use the letterY here to denote Gaussian random function instead of
the notationZ used so far for reasons that will be apparent latter. The key property endowed by this
Gaussian model is that the two terms in the above decomposition are independent random functions.
Thus, ifY s(x) is another Gaussian function with the same distribution thatY (same mean and same
covariance), then the estimation errorY s(x)�P�i(x)Y s(xi) has the same distribution than the true
error. This shows that a conditional simulation can be obtained using this simulated error, as in the
following expression:
Y sc (x) = Y k(x) + (Y s(x)�X
i�i(x)Y s(xi))
This procedure applies to any data which are linear transforms ofY , and not only to point dataY (xi)as considered above.
Simulation of Gaussian anamorphozed spatial field -this terminology is used to denote functions
of the formZ(x) = �(Y (x)) whereY is as above a Gaussian spatial function- is standard when data
are at point support and when� can be inverted. The values ofY (xi) are then known and we are back
to the previous situation.
When data is still at point support, but� cannot be inverted, Gibbs sampling can be used [4]. This
situation is encountered for instance in the Gaussian representation of lithofacies, and then� is an
indicator function of a setZ(x) = 1Y (x)2A. It should be noted that this solution is at the price of an
approximation, since Gibbs sampling is only an asymptotic method.
The more complex case of data at non point support is obviously even more complicated, since
data now have the form:
Z(v) =1jvj
Zv
�(Yx) dx
Simulated annealing is a possibility to handle such complex constraint. It has been used effectively to
data which are quadratic means, a case which happens in dealing with seismic data [6] and [7]. But
again this is only an asymptotic method, and convergence is more difficult to assess than in the case
of Gibbs sampling considered previously.
4.3 Short scales behavior and truncations
A second point to consider is truncation effects. Generally truncation effect happens at high frequen-
cies, because simulation is calculated only on a finite grid, which by its discrete nature cannot account
Monte-Carlo approach 17
for short scales. In the case of turning band method, another asymptotic to consider, is the fact that
isotropy and multinormality are only achieved for an infinite number of directions generated. Finally,
in the case of the gamma random field considered in more details in the appendix, the simulation
require an infinite number of components, and thus truncation occurs again.
The third point point to discuss, which has obvious connections with the previous one, is the
approximation of spatial integrals by finite sums, following a numerical integration procedure.
Because of these two approximations, a part of the support effect is not accounted for by Monte-
Carlo simulations.
4.4 Monte-Carlo approximation
And finally, we have to consider the convergence of the Monte-Carlo procedure itself. The rate of
convergence is known to be very poor, the error variance being proportional to the inverse of the
number of simulations performed. The order of magnitude of the error is thus
� �q
Var(hZ(v))1pm
which is indeed very slow. A classical mean to improve Monte-Carlo methods is importance sam-
pling. This involves the use of an alternative simulation model in which the variance would be less.
But importance sampling would require the density function at block support to be known. Another
classical idea is to generate anti-correlated rather than independent samples to speed up the conver-
gence. Neither of these techniques seems directly applicable to the generation of random field.
A more realistic alternative to speed up convergence could be through the addition of simulated
conditioning points, for which the required densities are available. If these points are chosen in an ap-
propriate way, for instance at the block centers, they can conditioned strongly enough the simulations
to speed up convergence.
To conclude this section, we stress that Monte-Carlo methods offer advantages which can pay-off their
computational complexity:
� They are exact asymptotically. By this we means that once the spatial model is specified at point
support, no modeling approximation is required to obtain the distributions at various support.
Only numerical approximations which can to some extend be controlled, are necessary.
� They offer full flexibility concerning the geometry of the supports. No compatibility constraint
exists, and various supports can be simultaneously considered.
Chapter 5
Application to AIRPARIF air qualitydata
Air quality data of Paris region have been made available to the participants of the IMPACT project
by the AIRPARIF association. Contacts have been taken with the Air Pollution group at Laboratoire
de Météorologie Dynamique (LMD) so that the data can be supplemented with runs from the model
CHIMERE (see [14], [15] for a description and validation of the model and [12] for a use in sensitivity
analysis).
5.1 Presentation of 1999 data
This study considers only data from the year1999, measured at23 fixed stations forNO andNO2,
as well as at19 stations forO3. The location of the stations is shown on Figure5.1.
Data are hourly measurements. Since our aim is only to illustrate the support and models, we
have to select one variable to study. Which one to use? It is known that due to the fast dynamic of the
chemical reaction:
NO + O3 ! NO2 + O2
ozone is consumed by this reaction as long as there isNO available, and conversely. As a result, we
expect these two to be highly anticorrelated, in the sense that they rarely coexist. The Figure5.2shows
that this is indeed the case. On this figure we have separated the plots concerning the seasons, and
also represented by different characters the days of week, in order to explore these effects meanwhile.
This figure is limited to the data from the station labeledParis 6, but the observed effects are typical.
It is interesting to analyze the periodicities in the data. Yearly and dayly cycles are of course
expected, and this will be referred to as seasonal and hourly effects. But in addition to this, we want
to explore the day-of-week effect, since human activity is depending on it, and can be considered as
entirely responsible of this effect. To this aim, we calculated the yearly averages under each specific
Season� Day� Hour combination. The results, for theParis 6station are displayed on Figures5.3
to 5.5 for ozone,NO andNO2 respectively. On these graphs, presented in matrix form, each line
represents a season, and each column a day-of-week. The seasonal effect appears clearly on these
18
Application to AIRPARIF air quality data 19
560000 580000 600000 620000 640000 6600002360
000
2380
000
2400
000
2420
000
2440
000
2460
000
X
Y
Paris6Paris7Paris13
Paris18
Melun
MontgeronGarches
GenneV
NeuillyAuberV
VitryS
MantesTremblay
Fontainebleau
Rambouillet
Paris12Versailles
Evry
Issy
BobignySaintDenis
IvryS
Argenteuil
MontgéFremainV
Saints
Prunay
NOx and O3 measured
NOx only
O3 only
Analysers locations
580000 590000 600000 610000 62000024
2000
024
3000
024
4000
024
5000
0X
Y
Paris6Paris7
Paris13
Paris18
Montgeron
Garches
GenneV
Neuilly
AuberV
VitryS
Tremblay
Paris12
Versailles
Issy
Bobigny
SaintDenis
IvryS
Argenteuil
Analysers locations (details)
Figure 5.1: Locations ofNOx andO3 stations are displayed in black, Stations measuringNOx only arerepresented in red, andO3 stations are in blue.
graphs, where higherO3 concentrations are observed in Summer. The maximum daily concentrations
are observed around12 am, an effect known to be due to thermal convection. As a consequence we
observe a minimum ofNO at the same time. The day-of-week effect is particularly evident in Winter
and in Autumn, the increase in terms ofNO, as well asNO2, from Sunday to Saturday, is followed
by a decrease, the minimum being observed on Sunday for this station. The accumulation effect of the
air pollution is obvious if it is assumed that this results from the decrease of activity in the week-end.
To reduce the effect of this chemical reaction we decided to study for the future the sumNOx =NO +NO2, which can be expected more stable than either variable considered separately.
5.2 Future work
The planned work on these data is the following
1. Estimate the three periodicities, that is the season the day of week and the hour effects. These
will be considered further as mean terms.
2. Tailor a change of support model in the time dimension. Empirical regularization of data make
it possible to validate a model.
3. Make the assumption that spatial variability can be compared to time variation, and thus obtain
a change of support model in the space domain.
20 Application to AIRPARIF air quality data
Winter
ooooooooooooo
ooooooooooooo ooooooooooooooooooooooooooooooo
oo oooooooooooooooooooooooo oooooooooooo ooooooooooo ooooooooo ooo oo oooooooooooooooooooooooooooooooooo ooooooooooooooooooo
oooooo
ooooooooo ooo ooooooooooooo oooooooo o ooooo ooooooooooooooooooooooo ooooooo
o o oooooooooooooo ooooooo ooo ooo ooooooo o ooooooo oooooo ooooooooooooooooooooooo oooooooooooooo oooooooooooooooooooooooo ooooooooooooo oooooooooooooooooooooooo
ooooooooooo o
ooooooo oooooooooooooooo oooo ooooooooo ooooooo oo oooooooooooooooooo o oooo oo+++ +++ + ++ +++++++++++++ ++++++++++++++++++++++ ++ +++++++ + +++++++++++
++++++++++
++++++++ ++++++++++
+++++++++++++ +++++++++
++ ++++++++++
+
+
+++++
++++
++++++++++ ++++
++++++ +++++++++++ ++++++++ + ++++++++++++++++++++++++ +++++++++++++ + +++ ++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++ + ++++++ ++++ +++ ++++++++++ ++++++++++++++++++++
++++++++
+++++
+++
++++++ ++++++++
++++++++++++ ++++++++++++++++ +++++++++++++ +++++ + +++++++++++++ xxxxxxxxxxx xxxxxx
xxx
xxxxxxxxx x xxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x
xxxxxxxxxxxxxxxxxxxxxxx
x xxxxxxxxxxx
x xxxxxxx x xxxxxxxxxxx xxxxx xxxxxxx xxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx
xxxxxxxxxx
xxxxx x xxxxxxxxxxxxxxxxxxxxx xx xx xxxxxxxxx x
xxxxxxx xx xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xx xxxxxxxx xx xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxx
x
xxxxxx
xxxxxxxxx
x
xxxxxxxxxxxxxxxxxxxxxx
xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
********
******
*********
** * ********
**** ** ***************** * **********
****************** *** **************** * * **** ***
**
*
*
*** ** * ****** **
********* ** * ********** ************* * *****
******* *
*
************************ * * **************** ************
***************
**
010
020
030
040
0
Spring
oooooooooo ooooooooo oooooo oooooooooooo oooooooooo ooooooo ooo o oooooooooooooooooooooooo oooooooooooooooooooo
ooooo ooooooooooo oo o ooooooooooooooooooo o ooooooooo
o ooo o oooooooooooooooooooo oo o oooooooooooo oo ooooooooooooooooo ooooo oo o oooooooooo oooooooooooo oooo ooooo ooo ooooooooo oo ooooooooo oooooooooooooooo oooooooooo o
oo oooooooo ooooooo oooooo ooooooooooooo oo oooooooooo ooo oooo ooooo
oo
ooooooo
o o o ooooooooooo oooo ooooo o ooooo oooooooooooooooo ooo oo o ooooooooooooo ooooooo oo o oooooooooooooo oo oooooooo oooooooooooooo ooooooo ooo
o ooooo ooooooooooooooooo o oooooooooooooooooo oooooooo o ooooooooooooooooooo ooooooooooooooooo ooooooo o o o oooooooooooooo ooooooo ++++++++++++++++++++++ +++++++++++++++++++++++++++ +++++++++
++++++++
+ ++++++++++++++ ++++++
+++++++++
++++++++++++ + +++++++++++++++++++++
++
++++++++++++ ++++++++ +++++++++++++++++ +++++++++ + +++++++++ +++++++
+
+
+++
++
+ +++++++++++++++++++ ++++++ ++++++++++ + +++++++++ ++++++++++++++ + ++++
+++
+
+ +++++ +++++ +++++++++++++ + +++++++++++++++ +++++++ ++++++ ++++++ ++++++ ++++++ + ++ ++ +++++++++ +++++++++++++++++++ + +++ ++++ +++++++ + + ++++ +++ +++++ +++++++++++ + + +++++++++++++++++
+
+
++ + +
++++++++++++++++++++++++
+++++++++++ + +++++++
++++
+ + +++++++++++++++++++ ++++++++++++++++ + +++++
+
+++ + +++++++ +++++++++++ ++
+ ++ ++ ++++++++ +++ ++++++++++ + + +++++++++++ + +++++++++++++ + ++ +++++++++++++++ +++++++++ xxx xxxxx x xxxxxxxxxxxx xxxxxxxxxxxxx xx xxxxxxxxxxx xxxxx
xx
xxxxxxxxxxxx
xxxxxxxxxxx
xxx
x xxxxxxxxxxxxxxxxxxx
xxxx
xx xxxxxxxxxxxxxxxxxxx
xx
xxxxx x
xxxx x xxxxx xxxxx
xxxx xxx xxxxxxxxxxxxx
xxxx
xxx
xx x xxxx xxxxxxxxxxxxx
x
xx
x
x xxxxxxxxx xxx xxxxxxxxx
x x x x xx xxxxxx xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxx xxxx xxxxxxxxxxxx
xxx
xxxxxxx xx xxxxxxx x x xxxxxxxxxxx x xx xxxxxx x x xxxxxxxxxxxxxxx x xxxxxxxx xxxx xxxxxxxxxxxxxxxxxxx xx x xx xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxx x x x
xx
x xxxxxxxxxxxxxxxxxxx xxxx xxxxxxxxxx x xxxxxxxx x x xxxxxxxxxxxxxx xxxxxx x xx x xxxxxxxxxxxxxx xxxxxxx
xx xx xxx xx xxxxxxx xxxxxxxxxxx xxxxx xx xx xxxxxxxx xxxxx x xx xx x xxxxxxxxxx xxxxxxxx x
xx xxx
x xxxxxxxx x********** * ******** ** * *** ****
*****
********
*****
* * ********
* * ************ ** ** ****** ** ************ * ******* * * * * *** *** **** ******
*
*
*
** ** * *********
*** *****
*
**
*
***** ***** ** * ********** * * * * ** ****** * * * ******
***
**
* ** * ****** * **** * *******
*
*
** ****** ** *** **** * ** ** ************** * ******** * * * * *********** *** ***** * * *** * ******** *
o+x
*
Sun−MonTue−WedThu−FriSat
0 50 100 150 200
Summer
oo ooooooo oooooooooooo oooooooooooooooo oooooooooo ooo ooooooo o oo oooooooooooo
o o ooooo ooo oo o ooooooooooooooooooo o o o o ooooooooooo oooooooooooooooooooo ooooo ooooooooo
o o ooooooo oooooooooooooooo oooooooooooooooooo o ooooo oooooooooooooooo ooo oooo o ooooooo ooooo oooooo ooo o o oooo o oooooooooooooooo o o oooooooooooooooooooooo o o o oooooooooooooo oooooo o o o o o oooooooooooooooooo o o o o ooooooooooooooo oooo o o ooooooooooooo o ooooo oooo o o o o o o ooooooooooo oooooo oo oo o o oooooooooooo o ooo oooo oooooo ooooooooooooooooooo oo oo oooooooooooo oooooooooooooooooooooooooooooooo oooooooooooooo oo oooooooo o o o oooooooooooooooooooooo o o o oooooooooooo oooooooo o o o oo ooooooooo oooo oooooo oooo o o+++++
++++
+ + +++ +++++++++ + ++++++ + + + +++ +++++++++++++++++++ ++ + ++++++++++++ ++ +++++++ ++ + + + +++ ++++++ + +++++++++++ + + + +++++++++++++++++++ + + ++++++++++++++++++++++
+ + + +++++ ++++++++ +++++ +++++++++++++++++ +++ +++++++++++++ ++++++++++ + +++++++++++++++++++++++
+ + +++++++++++ + + ++++++++ + + ++ +++++++++++++++
+++++++++++ +++ +++++ + + + + +++++++++++++++++++++ + ++++++ + ++ +++++++ ++++++++ ++ + +++++++++++++++++++++ + + + + ++++++++++++++++
+
++
++ + + ++++++++++++++++++++ ++ + ++ ++++++++++ +++++ + +++++ + +++++ +++++++ ++++++++++++ ++++++ + +++++++ +++++++ ++ ++ +++++++++ +++++++++++ + ++ +++++++++++++ +++++
+++++ + + + ++++++++++ + +++++++++
++ + ++++++++++ + + + +++++
+++ + + + +++++++++++++++
+++ + + ++++++++ +++xx xxx x x x xx x xxx xxxxxxxxxxxxxxx x xxxxxxx x xxxxxxxxxxxxxxxxx xxxx xxxxxxxxxxxxxxxxx
xxx
xxxx xxxxx x xxxxxxxxx
xxx x
x x xxxxxxxxxxxx
x
x xx xxxx x x x xxxxxxxxxxxxx xxxxxxxxx xxxxxxxx
xx x xxxxxxxxxxxx xxxxxxxxx xx x xxxx xxxxxxxxxxxxxx
xx
xx x xxx xxxxxxxxxxxx xxxxxxxx x xxx x xxxxxxxxxxxxxxxxx
xx xx xxxxxxxxx xxx xxxxxxx xxxxxxxxxxxxxxxxxx xx xxxxx x x xxx xxxxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx xxxx xxxx xxxxx xxxx xxxxxxxx x x x x xxxxxx xxxxxxxxxxxxx
xx x x x xxxxxxxxxxx x xxxxxxxx x xxx x xxxxxxxxxxx xxxxx xxx x x xxxxxxxxxxxxx xx x xxxxxxxx x xxxxxxxxx xxxxxxxxxxx x
x x xxxxxxxxxxxxxx xxxxxx xxxxxx xxxxxxx xxxxxxxxxxxx x xxxxxxxxxxxxxxx xxxxxxxxx
xx
xxxxxx
xx xxx xxxxxxxxx x x xxxxxxxxxxxx x ** ******* ************** * ****** * ** * ******* ******* ******* * * * * ** * ********** *****
** *
* * * * * **************** ** ** * * ********** ** * *****
** * * * * ************* ******** * * * * ** * * ******* ********* *** * * ** ********* ******** *
* * * * *********** ********
** * ** ****** **** *** ****
* * ** ** * ************* ****
* * * * * * ************* ****** * * * ********** **
0 50 100 150 200
Autumn
oo oooooooo o o o oooooooooooo
oooooo o
oo o o oooooooooo oo
oo
ooooo
oo o o o ooooooooooo
oooooooooo o oo o oooooooooooooooooooooo oooooooooo o
oo oooooooo o o oooooooooo o o ooooooooooo o o oooooooooo oooooooooooo ooooooooooooooo oooooooooooooooooooooooooooooooooo oooooooo oooooo ooooooooo o oooooooooooooooo oooooo o o o ooooooooooooooooooooooooo ooooooooooooooooooooo oooooooooooooooo oooooooo ooooooooo ooo
oooooooooo
o o oo oooooo oooooooooooooooooooooo
oooooo oooooooooo ooo oooooooooooooooooooo o o
oooooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo ooo ooooooo o ooooooooooooo ooooooooooo oooooooooooo + + +++++ ++ ++ + +++++++ ++++ +++++
++ ++++ + +++ ++++++++++++ +++++++ ++ + + +++++++ ++++++
+++ + + ++++++++++ ++++++++ + + ++++++ ++++ +++ +++++++++ ++ ++++++++++++++++++++ +++++++ ++++ ++ +++++ +++++++++++++++++ + ++++ +++++++ + ++ ++++++++++++
+++++++++++++++++++++++++++
++++++++++++++++ +++++
++++++++++++++++++++++++++++++
+++++++++ +++++++++ + +++++++++++++ ++++++++++++++++++++++++ ++++++++++++++ +++++++++++++++++++++++++
++++++++++++++++++++++++ ++++++++++++++++++
+
+++ + + ++++++++++++++++++
+
++
++
++++++
++++++++++++++
+++
+
++++++++++++++++++++++++++++++
+
+++++
+
+ + +++++++++
+++++
+++++++++ +++
++
++
+++
+++++ +++++++ + xxxxxx
x
xxx
xx x xxxxxxxxxxxxx xxxx
xx
xx x xx xxxxxxxxx x x
xxxxx
x
x
x
x
x x x x x xxxxxx xxxxxxxx
xxx x
x x x xx xxxxxxxxxxxxxxxxxxxxx
x x x xxxxxxxxx
xxxx
xxx
xx
xx
x x xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx x xxxxxx x x xx xxxxxxxxx x x x xxxxxxxxxxxxx xxxxx xxxx xxxxxxxx xxx xxxxxxxxx x x xxxxxxxxxxxxxx xxxxxx
xxxxxxxxxxxxxxxx
xxxxxx xx xxxx xxx xxxxxxxxx
x
xxx
xx
x
x
xxxx
x
xx
xxxxxxxxx
x
x
xx
xx
xxx
xxxx
x
xxx
xx
xxxxxxxxxx
xx xxxxx
x
x
xxxx
xxxxxxxxxxxx
xxxxxxxx
xx
xxx
xx xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx
xxxxxxxxx
xxxxxxxxxxxxx xxxxxxx
xxxxxxxxxx xx x
xx
xxxxxxxxxxxxxxx
xxx xxxxxxxxxx
xxxxxxxxxxxx xxxx x ******* * * * * * * *****************
*****
** * * ********* ******
*** ** * ** ********** ******
***
* * * ** ***** * *** * ******* *** ********** * **** * * * ************** ** *********
**
***
*
****
************************* ********* * ***************************
********* **** ***********
********* *** ****
******** 0
100
200
300
400
O3
NO
Paris6
Figure 5.2:Scatterplots ofNO andO3 hourly concentrations atParis 6, during the year1999. As expectedwhen one element is present, the other one is almost absent.
Application to AIRPARIF air quality data 21
2040
6080
100
5 15 5 15 5 15
2040
6080
100
2040
6080
100
5 15 5 15 5 15 5 15
2040
6080
100
Hour
O3
at P
aris
6 in
199
9
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Win
ter
Spr
ing
Sum
mer
A
utum
n
Figure 5.3:Hourly averages ofO3 at Paris 6station.
22 Application to AIRPARIF air quality data
020
4060
80
5 15 5 15 5 15
020
4060
80
020
4060
80
5 15 5 15 5 15 5 15
020
4060
80
Hour
NO
at
Par
is6
in 1
999
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Win
ter
Spr
ing
Sum
mer
A
utum
n
Figure 5.4:Hourly averages ofNO at Paris 6station.
Application to AIRPARIF air quality data 23
2040
6080
5 15 5 15 5 15
2040
6080
2040
6080
5 15 5 15 5 15 5 15
2040
6080
Hour
NO
2 at
Par
is6
in 1
999
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Win
ter
Spr
ing
Sum
mer
A
utum
n
Figure 5.5:Hourly averages ofNO2 at Paris 6station.
Chapter 6
Conclusion
In this report, only a general theoretical review has been given. Change of support can be viewed
as statistical change of scale for additive variables, and there could be interesting connections with
self-similar models. We did not attempt to explore these connections here. We preferred to review
briefly the existing approaches, in order to make it possible to identify the needs concerning support
effect models in the course of the IMPACT project. We emphasized the interest of change of support
models when a simulation based on physical model is compared to data, as this is the case in data
assimilation. But the situation is probably relevant to other problems in which such comparisons are
attempted.
24
Appendix A
An example: the Hermitian model
In this section we describe a fairly general model for continuous distributions. It should be mentioned
that alternative models exist, some of them have continuous marginals, others are discrete, and mixed
models exists as well. So this should really be considered as an illustrative example.
The starting point of this model is a Gaussian representation of the concentrations at sample
support. It is assumed that a monotonous transform is known, termed anamorphosis� : R+ ! R,
such thatZ(v) = �(Yv) whereYv is a standard Gaussian variable. This transform is determined from
the distribution ofZ(v) by:
Fv(z) = P [Z(v) < z] = G[��1(z)]
whereG is the standard Gaussian distribution function. Thus we have� = (Fv)�1 � G, which is
always possible whenFv is continuous. The distribution at supportv is then continuous as well, and
we have the same representationZ(v) = �v(Yv) with a transform noted�v. It should be understood
thatYv is by no means the spatial average ofYv, since the average operates on theZ scale and not on
the Gaussian transform. This is the reason why we used the notationYv instead ofY (v).The model assumption concerns the bivariate pair(Yv; Yv) whereZ(v) = �(Yv). It will be as-
sumed that this pair follows a mixture of standard Gaussian distributions, termed Hermitian model.
Specifically, ifgr(u; v) is the density of a standard Gaussian pair with correlationr, the hermitian
distribution can be expressed as:
gh(u; v) =Zgr(u; v) !(dr)
where! is a distribution on[�1;+1]. In practice negative correlations are very unrealistic and the
domain of! can be assumed restricted to[0; 1]. The introduction of the mixing offer a higher level
of flexibility to the model, compared to the pure Gaussian model (obtained by taking! = �r). For
instance, if! = p �0 + (1 � p)�1, we obtain the mosaic distribution, for whichYv andYv are inde-
pendent with probabilityp, and equal with probability1� p. The change of support generated by this
bivariate model is in fact the affine change of support, according to which block and sample density
are the same up to an affine rescaling1. Models in which! is a beta distribution have been considered1The terminology in use is unfortunately a bit confusing, for what is termed mosaic change of support model is in fact
something different.
25
26 An example: the Hermitian model
by L .Hu [5].
Calculations are greatly simplified by considering hermitian polynomials:
�n(y) =1pn!
g(n)(y)g(y)
In this formula,g is the standard Gaussian density, andg(n) its n-derivative. It turns out that these
expressions are indeed polynomials, which can be calculated from a three terms recurrence. Hermite
polynomials are standardized and orthogonal relatively to the Gaussian distribution:R�n(y) g(y) dy = 0R
�n(y) �m(y) g(y) dy = �nm
Moreover, the orthogonality holds also for the bivariate Gaussian distribution:Z�n(y) �m(y
0) gr(y; y
0) dy dy
0= �nm rn
This result is a consequence of the following relation concerning the conditional expectation of a
Hermite polynomial:
if (Y; Y0) � gr then E[�n(Y ) jY 0 ] = rn �n(Y
0)
In other words, the Hermite polynomials are the factors of the Gaussian model. A consequence of this
is the following expression of the bivariate density:
gr(y; y0) = g(y) g(y
0)Xk�0
rk �k(y) �k(y0)
This development holds forjrj < 1 only.
These expressions translate readily to the hermitian model. If now the pair(Yv; Yv) follows a
hermitian model, we have the following expressions, in whichE[Rn] =Rrn !(dr):
E[�n(Yv) jYv] = E[Rn] �n(Yv) (A.1)
E[�n(Yv) jYv] = E[Rn] �n(Yv) (A.2)
The interest of these expressions follows from the fact that Hermite polynomials allows represen-
tations of functions having finite order two moment relative to the Gauss density:
ifZ�(y)k g(y) dy < 1 for k = 1; 2
then,Y being a standard Gaussian variable,Y � N [0; 1]:
�(Y ) =Xk�k �k(Y ) for �k = E[�(Y ) �k(Y )]
An example: the Hermitian model 27
this development being convergent in the quadratic mean. Thus, ifZ(v); Z(v) satisfy Cartier’s rela-
tion, and if they are transformed fromYv; Yv, which follows hermitian model with parametersE[Rn],the transform�v can be obtained.
E[Z(v) jZ(v)] = Z(v) = �v(Yv) =X
�vk �k(Yv)
But, since� and�v are monotonous, we have, using (A.1):
E[Z(v) jZ(v)] =Xk�k E[�k(Yv jZ(v)] =
Xk�k E[Rk] �k(Yv)
Unicity of developments on factors legitimates the identification:
�vk = �k E[Rk]
Assuming� and! known, the distribution at larger support can thus be calculated, since it is fully
characterized by the coefficients�vk . The coefficients�k can be obtained from the distribution at sam-
ple support provided a stationarity assumption is legitimate. As an alternative to this, drift estimation
and removal could be done. But since it is known that the underlying variance is underestimated by
the residual variance, only in case of strong drift evidence, or in case of marked periodicity, should
this be attempted. Concerning the distribution!, the variance relation:
VarfZ(v)g =Xk�1
� 2k
nE[Rk]
o2
gives another constraint which is enough to fully specify the model in the pure Gaussian, or in the
mosaic case, for only one degree of freedom remains. In the case of the beta distribution which have
two parameters, an arbitrary level of freedom remains.
Concerning convergence towards normality, let us first observe that in the pure Gaussian case,
since� 2v =
Pk�1 � 2
k r2k, is monotonous relatively tor on [0; 1], when�v ! 0, we have as well
r ! 0. Hence:
�v(Y ) � �0 + �1 r Y + 0(r2)
we can expect asymptotic normality ofZ(v) for large support. More precisely, we shall consider the
distribution of the standardized variable:
Z(v)� �0
r�1= Y +
Xk�2
�k�1
rk�1 �k(Y )
whereY � N [0; 1]. We shall see that the above sum tends in probability to zero. We have thus to
prove that8� > 0:
P
24������Xk�2
�k�1
rk�1 �k(Y )
������ > �
35 ! 0
In fact, since the variance is simply:
Var
8<:Xk�2
�k�1
rk�1 �k(Y )
9=; =Xk�2
��k�1
�2r2k�2 � r2
Xk�2
��k�1
�2
28 An example: the Hermitian model
This quantity tends to zero, by virtue of the fact that the variance at small support is� 2v =
Pk�1 � 2
k ,
and thusPk�2(�k=�1)2 is bounded.
Convergence in probability follows by application of Chebyschef inequality.
On the other hand, let us consider now the mosaic distribution,!(dr) = p�0(dr) + (1�p)�1(dr).The variance condition imposep! 1, and since the coefficients are such that:
�v0 = �0
�vk = (1� p)�k (k > 0)
we have� 2v = (1� p)2P
k�1 � 2k = (1� p)2� 2
v , and the parameterp is therefore determined by the
variances:
(1� p) =�v�v
The transformation at the larger support is given by:
Z(v) = m +�v�v
Xk�1
�k�k(Yv) = m +�v�v
(�(Yv)�m)
and, the distributions are such that:
Z(v)�m � �v�v
(Z(v)�m)
In other words, according to this model, the distribution at large support is obtained merely by sharp-
ening around the mean. The algorithm resulting from the application of this model is termed affine
correction, and it is known to have poor quality. In particular, convergence towards normality cannot
obviously be satisfied by this model.
We have just seen that asymptotic normality cannot be obtained by arbitrary choices of Hermitian
models. A sufficient condition is the convergence ofE[Rk]=E[R]! 0, uniform ink.
Appendix B
Monte-Carlo approach - simulation ofGamma model
This section is based mainly on a collection of papers published by Robert Wolpert and Katja Ickstadt
(see for instance [20], [19], and [1]). The applications considered by the authors concerns principally
epidemiology, where data can be considered as Poisson distributed, with intensity determined by
a latent spatial field to be estimated. The data augmentation algorithm proposed by the authors is
particularly tailored to this situation. Conditional simulations where data are the spatial average at
various support is a somewhat different problem.
B.1 Simulation by inverse Lévy measure
Let us recall that a random variable follows an infinitely divisible distributions if it can broken into an
arbitrary number of independent identically distributed random variable. So for each integern > 0,
the variableX can be decomposed into:
X = X1 +X2 + :::+Xn
whereXi are independent and identically distributed. Applied to spatial fields, if we were ignoring
the spatial correlation, the distributions at every support should be infinitely divisible. In practice, cor-
relation is more often the rule than the exception, and the argument does not have so firm foundations.
Nevertheless, it remains that spatial fields with infinitely divisible distributions are very interesting
from coherence point of view.
It is known from Lévy’s representation theorem, that positive infinitely divisible distributions, if
their mean is normalized in a suitable way, have positive Laplace transform, satisfying the formula:
� logE[e�sX ] =Z 1
0(1� e�su) �(du)
where� is a positive measure, termed Lévy’s measure, satisfying:Z 10
min(1; u) �(du) < 1:
29
30 Monte-Carlo approach - simulation of Gamma model
This condition ensures the convergence of the above integral, but in general, the measure of a neigh-
borhood of the origin is not finite. Let�c(u) = �[u;1[, which is finite foru > 0. Lévy’s theorem
being granted,X has a representation in terms of a Poisson process, which we describe now.
LetH be a Poisson process onR � R+, having intensitydt � �(du). By this we means that for
each measurable disjoint setsAi 2 R � R+, the random variablesH(Ai) are independent Poisson
variables with parameters�i =RAi dt� �(du). Moreover, given the numberH(A), the points of the
process inA are distributed independently, following the probability
1(t;u)2Adt� �(du)(dt� �)(A)
Then we shall see that the process defined fort > 0 by:
Xt =Z t
0
Z 10
u H(dt0; du) =XiUi 1Ti2[0;t[
is such thatX1 is infinitely divisible, with Lévy measure�. In the above formula, we have denoted by
(Ui; Ti) the points of the processH.
Proof: Let � > 0, andX�t be the truncated integral:
X�t =
Z t
0
Z 1�
uH(dt; du):
Then, due to the fact that�c(�) is finite, there is a finite number of pointsN �t of the process in the set
[0; t[�[�;1[. The Laplace transform:
E[e�sX�t ] = E[expf�sX
iUi 1Ti<t 1Ui>�g]
can be calculated by conditioning onN �t , since each point follow independently each one another the
distribution:
(T;U) � P (dt0; du) =dt0t�(du)�c(�)
1t02[0;t[ 1u2[�;1[
We have thus:
Ehe�sX�
t
i= E
"�Z 1�
e�su �(du)�c(�)
�N�t#
= e��c(�) tXk
(�c(�) t)k
k!
�Z 1�
e�su �(du)�c(�)
�k= exp
���c(�) t + t
Z 1�
e�su�(du)�
= exp�tZ 1�
(e�su � 1) �(du)�
Monte-Carlo approach - simulation of Gamma model 31
Hence, we obtain the log characteristic transform:
� log(Ee�sX�1) =
Z 1�
(1� e�su) �(du)
and in the limit, we have the desired result.
Simulation: A simple simulation method can be derived by observing that the projections on
theu axis of the process points in the bandt0 2 [0; t[, follow a Poisson process with intensity�(du) =t:�(du):
Ui such that0 � Ti < t have distributionUi � Po(t:�)
The associatedTi are uniformly distributed in the intervalt 2 [0; t[. The first step requires a method
to simulate Poisson process with intensityt�. The key to this is the following change of variable:(y = t �[u;1[ = t �c(u)
dy = �t �(du)
which transforms a process Po[t �] into a standard Poisson process (independence of counts over
disjoint sets is ensured by monotonicity of the transform, and ifNY (dy) � Po[dy], thenNU (du) is
also Poisson distributed with parametert:�(du)). Hence the algorithm:8>><>>:Yi � Po(1)
Ui = ��c(Yit )
Ti � U [0; t[
where��c is the inverse of�c.
B.2 Infinitely divisible process
We consider only positive spatial process, and we denote them byZ. For test functions having the
form� =Pi �i 1Ai whereAi are disjoint measurable sets, we writeZ(�) =
Pi �iZ(Ai). This nota-
tion is extended to test functions which are limits of this form when possible. The Laplace functional,
which is defined by:
L(�) = � logE[e�Z(�)]
can be used to characterize positive spatial fields, in a similar way Laplace transform characterize
the distribution of positive random variables. It turns out that infinitely divisible and positive spatial
process, with independence over disjoint sets have Laplace transform which can be written as:
L(�) =Z
(1� e�u�(x)) �(dx; du)
where� satisfies some constraints, analogous to the one specified in the previous section. Gamma
process are obtained when:
�(dx; du) = �(dx)e��uu
du
32 Monte-Carlo approach - simulation of Gamma model
The Laplace transform of gamma process can be calculated by first considering test functions�(x) =P�i 1x2Ai , for which we have:Z
(1� e�uP�i1Ai ) �(dx) =Xi
(1� e�u�i)�(Ai)
since: Z 10
(1� e�u�i)e��uu
du = log�
1 +�i�
�we obtain, after taking the limit:
L(�) =Z
log�
1 +�(x)�
��(dx)
Remarks:
� The contributions from disjoint setsAi enter additively into the Laplace transform, which en-
sures the independence of the random variablesZ(Ai).
� For� = s 1A, we obtain:
E[e�sZ(A)] =�
1 +s�
���(A)
which shows that the average ofZ overA is gamma distributed, with shape parameter�(A)and scale1=�:
Z(A) � Ga(�(A); ��1)
� Wolpert and Ickstadt propose the following extension of the gamma random field:
�(dx; du) = �(dx)e��(x)u
udu
in which the scale parameter vary through space. However, in this model, the spatial averages
Z(A) are no longer gamma-distributed. Only if� is slowly varying through space, and if
the dimension ofA is small do we have approximate gamma distributions. The simulation
algorithm have to be modified to deal with this generalization.
B.3 Simulation algorithm.
The positive Lévy spatial process can be generated from a Poisson processH on R+ � Rn with
intensity�(dx; du) using the algorithm described in the previous paragraph:
Z(A) =Zx2A uH(dx; du) =
Xi
1Xi2A Ui
and more generally:
Z(�) =Z�(x)uH(dx; du) =
Xi�(Xi) Ui
Monte-Carlo approach - simulation of Gamma model 33
The proof is the same.
The generation ofH is almost unmodified if� is the product of two measures, such as�(dx; du) =�(dx)�(du). We describe now the algorithm proposed by the two authors, in a slightly simplified
version, for simulation of points inR+ �A, in the extended gamma case (space-varying�):
� Let Yi; i = 1; :: be a standard Poisson process onR+.
� LetXi follow the distribution:
Xi � �(dx)�(A)
1x2A
� DetermineUi from the following equation:Z 1Ui
e��(Xi)u
u�(A) du = Yi
which is equivalent, ifEi(x) =R1x e�u=u du is the exponential integral function, to:
Ui =1
�(Xi)E�1i
� Yi�(A)
�In practice, only a finite number of points can be generated, and it is necessary to stop at some point
(note thatUi ! 0)
B.4 Conditioning to data.
Spatial fields generated by the algorithm considered so far in this section do not have spatial corre-
lation, and only the parameters� and� can break the spatial homogeneity. It is very easy to obtain
spatially correlated fields by moving averages. For example,Y being a gamma spatial process gen-
erated according to the model described in the previous paragraph,Y � Ga(�(dx); �(x)�1), we can
consider:
Z(y) =Z�(y; x) Y (dx) =
Xi�(y;Xi) Ui
Note that simulation at supportv are obtained simply by integration the kernel relatively toy. If:
�(v; x) =1jvj
Zv�(y; x) dy
then:
Z(v) =Xi�(v ; Xi) Ui
Monte-Carlo simulations at various support seems easy to implement. The difficult problem remaining
is the conditioning to data. The authors have considered the case of conditionally Poisson distributed
data, that is, conditioned onZ, the data are:
Ni � Poisson(Z(vi))
with independence over disjoint supportsvi. The ingredient to simulateY conditioned onN1; ::Nk is
data augmentation.
Bibliography
[1] BEST, N., ICKSTADT, K., AND WOLPERT, R. Spatial poisson regression for health and expo-
sure data measured at disparate spatial scales.Journal of the American Statistical Association
95 (2000), 1076–1088.29
[2] CHRISTENSEN, O. F., J, D. P.,AND RIBEIRO, P. J. Analysing positive-valued spatial data:
the transformed gaussian model. IngeoENV – Geostatistics for Environmental Applications
(Amsterdam, 2001), P. Monestiez, D. Allard, and R. Froidevaux, Eds., Kluwer, pp. 287–298.15
[3] DIGGLE, P., TAWN , J., AND MOYEED, R. Model-based geostatistics.Applied Statistics 47, 3
(1998), 299–350.15
[4] FREULON, X. Conditionnement du Modèle Gaussien par des inégalités ou des randomisées.
Doctoral thesis, Ecole des Mines de Paris, Fontainebleau, 1992.16
[5] HU, L. Y. Mise en oeuvre du modèle gamma pour l’estimation de distributions spatiales. doc-
toral thesis, Ecole des Mines de Paris, Fontainebleau, 1988.26
[6] LAJAUNIE, C. Simulation of velocity fields under constraints on the stack values. Tech. Rep. N-
8/98/G - confidential, Centre de Géostatistique, Ecole des Mines de Paris, Fontainebleau, 1998.
16
[7] LAJAUNIE, C. Simulation of velocity constrained by stack analysis model, methods and a case
study. Tech. Rep. N-20/99/G - confidential, Centre de Géostatistique, Ecole des Mines de Paris,
Fontainebleau, 1999.16
[8] LARSEN, R. I. A new mathematical model of air pollutant concentration averaging time and
frequency.Journal of the Air Pollution Control Association 19(1969), 449–467.5
[9] MATHERON, G. Isofactorial models and change of support. In Verly et al. [17], pp. 449–467.7
[10] MATHERON, G. The selectivity of the distributions and the second principle of geostatistics. In
Verly et al. [17], pp. 421–434.7
[11] MATHERON, G. Change of support for diffusion type random functions.Mathematical Geology
17 (1985), 137–165.7
34
Bibliography 35
[12] MENUT, L., VAUTARD , R., BEEKMANN , M., AND HONORÉ, C. Sensitivity of photochemical
pollution using the adjoint of a simplified chemistry-transport model.Journal of Geophysical
Research 105, D 12 (2000). 18
[13] ORFEUIL, J. P.Interprétation géostatistique du modèle deLarsen. Tech. Rep. N-413, Centre de
Géostatistique, Ecole des mines de Paris, Fontainebleau, 1975.6
[14] VAUTARD , R., BEEKMANN , M., DELEUZE, I., AND HONORÉ, C. La pollution photochimique
en région parisienne simulée par le modèleChimère et l’influence du transport régional d’ozone.
Tech. rep., Laboratoire de Météorologie Dynamique, Ecole Polytechnique, 1997.18
[15] VAUTARD , R., BEEKMANN , M., ROUX, J., AND GOMBERT, D. Validation of a deterministic
forecasting system for the ozone concentrations over theParis area.Atmospheric Environment
35 (2001), 2449–2461.18
[16] VERLY, G. The block distribution given a point multivariate normal distribution. In Verly et al.
[17], pp. 495–516.15
[17] VERLY, G., DAVID , M., AND JOURNEL, A. G., Eds. Geostatistics for Natural Resources
Characterization, vol. C-122 ofNATO ASI Series C-122. Reidel, Dordrecht, 1984.34, 35
[18] WACKERNAGEL, H., THIÉRY, L., AND GRZEBYK, M. TheLarsen model from a deWijsian
perspective. IngeoENV II – Geostatistics for Environmental Applications(Amsterdam, 1999),
J. Gomez-Hernandez, A. Soares, and R. Froidevaux, Eds., Kluwer, pp. 125–135.6
[19] WOLPERT, R., AND ICKSTADT, K. Simulation ofLévy random fields. InPractical Nonpara-
metric and semiparametric Bayesian Statistics(Berlin), Müller, Dey, and Sinha, Eds., Springer-
Verlag, pp. 227–242.29
[20] WOLPERT, R., AND ICKSTADT, K. Poisson/gamma random field models for spatial statistics.
Biometrika 85(1998), 251–267.29