Dynamic models for space-time prediction via Karhunen-Loève expansion

18
Statistical Methods & Applications (2003) 12:61-78 DOI: 10.1007/s 10260-002-0046-7 SMA (~) Spfinger-Verlag 2003 Dynamic models for space-time prediction via Karhunen-Lo ve expansion Lara Fontanella, Luigi Ippoliti Dipartimento di Metodi Quantitativi e Teoria Economica, Universit~ degli Studi "G. d'Annunzio", Viale Pindaro, 42, 65127 Pescara, Italy (e-mail: {lfontan,ippoliti} @dmqte.unich.it) Abstract. The paper is concerned with the spatio-temporal prediction of space- time processes. By combining the state-space model with the kriging predictor and Karhunen-Lobve Expansion, we present a parsimonious space-time model which is spatially descriptive and temporally dynamic. We consider the difficulties of apply- ing principal component analysis of stochastic processes observed on an irregular network. Using the Voronoi tessellation we make adjustments to the Fredholm in- tegral equation to avoid distorted loading patterns and derive an "adjusted" kriging spatial predictor. This allows for the specification of a space-time model which achieves dimension reduction in the analysis of large spatial and spatio-temporal data sets. As a practical example, the model is applied to study the evolution of the Nitrogen Dioxide (NO2) measurements recorded in the Milan district. Key words: Kalman filter, ARIMA models, Karhunen-Lobve expansion, Dynamic linear model, Kriging 1. Introduction All data have location in time and space. Sometimes it is necessary to take these lo- cations explicitly into account in the analysis. Spatio-temporal models have gained widespread popularity in recent years. One reason for this is an abundance of im- portant and challenging new applications arising in the environmental and health sciences. Some space-time models are based on the assumption of temporal station- arity. In this context STARMA (Pfeifer and Deutsch 1980), STARMAX (Stoffer 1986) and STARMAG (Di Giacinto 1994; Terzi 1995; Ippoliti 2001) models add spatial covariance matrices to standard vector ARIMA (Liitkepohl 1993) models. More recently, models for spatio-temporal data have been constructed by combin- ing dynamic linear models (DLMs), or state space models, with variogram-based models from spatial statistics. Many of the authors have considered a Bayesian

Transcript of Dynamic models for space-time prediction via Karhunen-Loève expansion

Statistical Methods & Applications (2003) 12:61-78 DOI: 10.1007/s 10260-002-0046-7 SMA

(~) Spfinger-Verlag 2003

Dynamic models for space-time prediction via Karhunen-Lo ve expansion

Lara Fontanella, Luigi Ippoliti

Dipartimento di Metodi Quantitativi e Teoria Economica, Universit~ degli Studi "G. d'Annunzio", Viale Pindaro, 42, 65127 Pescara, Italy (e-mail: {lfontan,ippoliti} @ dmqte.unich.it)

Abstract. The paper is concerned with the spatio-temporal prediction of space- time processes. By combining the state-space model with the kriging predictor and Karhunen-Lobve Expansion, we present a parsimonious space-time model which is spatially descriptive and temporally dynamic. We consider the difficulties of apply- ing principal component analysis of stochastic processes observed on an irregular network. Using the Voronoi tessellation we make adjustments to the Fredholm in- tegral equation to avoid distorted loading patterns and derive an "adjusted" kriging spatial predictor. This allows for the specification of a space-time model which achieves dimension reduction in the analysis of large spatial and spatio-temporal data sets. As a practical example, the model is applied to study the evolution of the Nitrogen Dioxide (NO2) measurements recorded in the Milan district.

Key words: Kalman filter, ARIMA models, Karhunen-Lobve expansion, Dynamic linear model, Kriging

1. Introduction

All data have location in time and space. Sometimes it is necessary to take these lo- cations explicitly into account in the analysis. Spatio-temporal models have gained widespread popularity in recent years. One reason for this is an abundance of im- portant and challenging new applications arising in the environmental and health sciences. Some space-time models are based on the assumption of temporal station- arity. In this context STARMA (Pfeifer and Deutsch 1980), STARMAX (Stoffer 1986) and STARMAG (Di Giacinto 1994; Terzi 1995; Ippoliti 2001) models add spatial covariance matrices to standard vector ARIMA (Liitkepohl 1993) models. More recently, models for spatio-temporal data have been constructed by combin- ing dynamic linear models (DLMs), or state space models, with variogram-based models from spatial statistics. Many of the authors have considered a Bayesian

62 L. Fontanella, L. Ippoliti

approach, often relying on Markov Chain Monte Carlo simulation for posterior in- ference (Tonellato 1997). Starting from earlier work of Huang and Cressie (1996), state space approaches in a non Bayesian framework include for example Mar- dia et al. (1998); Wikle and Cressie (1999), where principal component analysis of stochastic processes and kriging are combined with state space formulation to model the space-time dependence. In this paper, in the interest of model parsi- mony and identification of sources spatial variation, we also consider a state space model (West and Harrison 1997) together with principal component analysis of stochastic processes or Karhunen-Lobve expansion -KLE- , (Obled, Creutin 1986; Jona-Lasinio 2001). The difficulties of this approach are considerable for a con- tinuous domain when data are collected only from a sparse and irregular network. In fact, as pointed out by Karl et al. (1982), KLE can produce distorted loading patterns. To overcome this problem we consider here the application of the Voronoi polygons method (Okabe et al. 1992) as a quadrature technique to obtain an "ad- justed" kriging spatial predictor. Combining the kriging approach with the state space model, it is also shown that the adjusted kriging predictor characterises the simultaneous spatio-temporal prediction at locations and time points of interest. The final crucial point concerns parameter estimation. Mardia et al. (1998) pro- posed a two stage iterative estimation process where, given values of the spatial parameters the likelihood is maximised with respect the temporal parameters and given values of the temporal parameters the likelihood is maximised with respect the spatial parameters. In this setting, to avoid this iterative estimation process we adopt a "full" EM algorithm (Dempster et al. 1977) that, starting from an initial estimate of the spatial parameters, leads to a simultaneous estimation of all model parameters in the recursive Kalman filter algorithm.

The paper is organised as follows. Section 2 provides the basic formulation for the space-time model. The decomposition of the spatial process is presented in Sect. 3, where the quadrature factors are introduced to obtain a coherent finite formulation of the Fredholm integral. The spatial and spatio-temporal prediction problems are then discussed in Sect. 4. Finally, in Sect. 5, the model is applied to a data set of Nitrogen Dioxide (NO2) levels.

2. Model formulation

Consider a univariate spatio-temporal stochastic process Z(s, t), where s is the generic location within the geographical domain of interest D, and t a discrete index of times. That is, we consider Z(s, t) at (si , t) for i = 1, 2 , . . . , n and t = 1, 2 , . . . , T. For notational convenience we assume that the data are sam- pled from a fixed station network, i.e. the same locations (sl, s2 . . . . . s,~) are used at every observation time t. For the analysis of monitoring data in real time, the spatio- temporal process is interpreted to be a temporal sequence of spatial processes. This sequential approach focuses attention on statements about the future development of a spatial series conditional on existing information. As time evolves, informa- tion relevant to forecasting the future is received and should be used to revise the forecaster's views. In this case, the class of dynamic models represents a convenient framework for analysing the temporal evolution of spatial series. The most widely

Dynamic models for Karhunen-Lnrve expansion 63

known applied subclass is that of normal dynamic linear models referred to simply as dynamic linear models, or DLMs, when the normality is implicit (West, Harrison 1997). For an observation vector Z (., t) the DLM is characterised by the following two equations

c~(t) =- Oc~(t - 1) + r/(t) r/(t) ~ N(0, S~) (1)

Z(., t) = H a ( t ) + ~(., t) ~(., t) ~ N(0, S~) (2)

The error terms r/(t) and ~(., t) are internally and mutually independent. Equa- tion (2) is the observation (measurement) equation defining the sampling distri- bution for Z(., t) conditional on the state vector ct(t). Equation (1) is the state equation defining, by means of the transition matrix ~, the temporal evolution of

With respect to the generic site s, we assume that the spatial-temporal field is decomposed into mean and error components

Z(s, t) = p(s, t) + c(s, t) (3)

The mean component is expressed as a time-varying linear combination ~ (t) of p spatial common fields h(s) (Mardia et al. 1998)

p(s , t ) = hl(S)al( t ) + h2(s)a2(t) + . . . + hp(s)ap(t) = h ( s ) ' a ( t ) (4)

The substitution of (4) into (3) gives the observation equation of the dynamic model. The model is completed with a Gaussian prior on the initial state: a( t0) '-~ N(m0, Po), with a(t0) independent of r/(t) and ~(., t) for all t. While Eq. (1) gives information about the temporal evolution of the process, Eq. (2) captures its spatial structure by means of the measurement matrix H and measurement noise ~(., t). If s(,, t) is considered spatially independent, all the spatial structure is contained in the measurement matrix H. A decomposition such as (4) always exists and the KLE is an example.

3. Spatial process decomposition and Karhunen-Lo~ve expansion

In this section we focus our attention on the definition of the observation equation for a fixed time t. A zero mean stochastic spatial process, {J(s) : s ={x, y} E D C_ ~2), defined in L2(~2), can be expanded in a series of orthogonal functions. The orthogonality of the basis functions makes the representation efficient and mathematically convenient. In signal processing applications (Mallat 1998) it also guarantees that the components of the signal with respect to the basis functions do not interfere which other. The most familiar set of orthogonal basis functions are those used in the Fourier transform. However, other orthogonal basis functions com- monly appear in statistical literature and orthogonal wavelets (Mallat 1998) provide a well known example. In this paper we deal with the Karhunen-Lobve Expansion (KLE) which allows for the following orthogonal expansion of the process

c<)

= ( 5 )

v = l

64 L. Fontanella, L. Ippoliti

where the 0~ (s) are deterministic functions depending on: the location s, the struc- ture function ~bv and the spatial domain D; the 9~ are uncorrelated (random vari- ables) coefficients. In (5) it is assumed that the process can be expanded in any set of orthonormal basis functions q~v (-) which are the eigenfunctions of the covariance function. These are obtained by solving the Fredholm integral equation

f c(s,s')+~ (~') = A~O~ (s) ds' (6) D

where C(s , s ' ) is the covariance between site s and site st, and Av the variance of the coefficient 9v- Equation (6) is the continuous analog of a matrix eigenvector equa- tion, and for distinct eigenvalues the representation is unique. The coefficients 9~, or principal components, obtained as the projection of Z on the vth eigenfunction, are given by

= / 6(s)0~ (s) ds (7) g~ t ]

D

Equation (7) is known as the Karhunen-LoOve Transform (KLT). However, notice that we consider the process observed at a collection of n sites I = {sl . . . . . sn}, so in practice, only a finite linear approximation of (5), (6) and (7) is possible. In this case, within the framework of linear approximations, the KLE is the most efficient representation of the random process if the expansion is truncated to use fewer than n orthonormal basis functions (Freiberger and Grenander 1965). In other words, if we approximate the random process in terms of some number rr~ < n of basis functions, the optimal basis functions for the truncated expansion correspond to the eigenvectors of C with the m largest eigenvalues. Note that one reason for truncating the expansion occurs if the random process consists of a signal in additive noise. In this case, it can turn out that by using a truncated expansion, a significant part of the noise is eliminated while most of the signal is kept intact (Preisendorfer 1988). However, the difficulties of the approach are considerable for a continuous domain when data are collected only from a sparse and irregular network. The fact that we are considering a process observed at discrete points is a practical limitation to the numerical solution of (6). Accordingly, if there are n sample points in the domain, only n eigenfunctions can be estimated while, indeed, there are a denumerable infinity for a continuous process. Thus, the geometrical relations involving the domain of integration and the relations between the sites si (i = 1 , . . . , n) are completely ignored in a "discrete" matrix formulation of (6). However, this limitation should be recognised as a restriction on the accuracy of the solution, but not as a part of the problem formulation. Hence, the numerical problem encountered in practice, is to estimate C(s, s t) and attempt to solve equation (6) and (7). Obled and Creutin (1986) proposed a general approach based on a set of functions {el(s) , e2(s), ..., en(s)} having a vector space structure over D. This approach leads to the following finite formulation of the Fredholm integral

1 , . . , n (8) j=l m--I

Dynamic models for Karhunen-Lo~ve expansion 65

where Ej~ = f ej (s) e,~ (s) ds is the quadrature factor. In matrix formulation D

Eq. (8) can be written as CEr = s162 (v = 1, . . . ,n). Accordingly a finite solution of Eq. (7) is

i=1 j = l

The major difference between Eq. (8) and the usual eigenvector equation used in principal component analysis lies in the fact that in Eq. (8) we have to solve the problem of choosing a set of appropriate generating functions. From a practical point of view, the problem is limited to the evaluation of the integral of the Ej,~ term. In the two-dimensional case, Cohen and Jones (1969); Buell (1972) suggested to using piecewise constant functions. Following this approach, a set of areas of influence {~(si), i = 1 , . . . , n} for each sampled site si is defined and e~(-) is assumed to be constant and equal to one over the area and zero elsewhere. In this paper, the areas of influence are obtained by applying a Voronoi tessellation (Okabe et al. 1992) and each area can be taken to approximate the integral of the Ejm term. In such simple case, the g quadrature matrix is diagonal with nonzero entries f ej (s) e,~ (s) ds= wi. The method only requires to compute the influence

D areas wi which directly compensate for the effects due to the variable density of the network. As a consequence, the numerical approximation of the Fredholm integral is

n

= ( l O )

j = l

which can be rewritten in its symmetric form as

n

E C*(si, sj)0~(s3) = Av0~(si) (11) j = l

where 0v (s3) = Cv (sj) , ~ j and C* (si, sj) = C (si, sj) v@iWj. When regular gridded fields are considered, the quadrature factors are not needed. In this case, the simple application of the spectral decomposition theorem (Mardia et al. 1979) to the C matrix provides the estimates fv and lv, respectively of the eigenfunctions Cv and eigenvalues Av of the covariance function.

If only ra terms, corresponding to the m largest eigenvalues, are taken in a truncated representation for 5(s), then (5) can be replaced by

7"Y~

v ~ ]

(12)

Note that Eq. (12) can also be used to predict the value of the process at an unmon- itored site So. In this case, given the coefficients g~, (v = 1 , . . . , m), we deal with an eigenfunction forecasting problem (Obled and Braud 1989).

66 L. Fontanella, L. lppoliti

4. Spatial and spatio-temporal prediction

In what follows, the prediction problem is considered. In particular, we first take into account the spatial prediction of Z(s0, t) at the unmonitored location so and given time t < T. In this context, the kriging technique is combined with the KLE and an "adjusted" predictor is derived. The prediction of the process Z(so, t) is then considered at so and time t > T.

4.1. Spatial prediction

Let us consider the spatial process Z(s) decomposed as Z(s) = re(s) + 5(s) (Cressie 1993), where re(s) is the deterministic mean structure and 5(s) a second-order sta- tionary correlated error process. When the estimate of unknown values are needed at specific locations, different solutions are possible. For such point estimation prob- lem, weighted linear combinations methods are usually used and some of them represent an extreme version of the family of the inverse distance methods. Fre- quently the spatial prediction at an unmonitored site So is obtained by applying the universal kriging predictor (Cressie 1993)

2a(s0) = f(S0)T/3 + c T C - l ( Z - F/3) (13)

where f(s0)T~ and F/3 are linear combinations of polynomials in the spatial coordi- nates s = {x, y} and Co is the covariance vector between So and si, (i = 1~.. . , n). The corresponding prediction variance is (Cressie 1993)

Suk = o'~ -- c : C - i c o (14)

+ ( f ( S o ) - F T C - l c o ) T ( F T C - 1 F ) - 1 ( f ( S o ) - F T C - l c o )

Since the dimension reduction of the state vector represents the critical features of our approach, we solve the local estimation problem by combining this kriging technique with KLE. As noted previously, Eq. (12) can be used to predict the zero mean process at an unmonitored site So and such prediction can be obtained using all or just m < n eigenvectors. However, Eq. (12) highlights the need for a local prediction of the eigenfunctions r at site So. The simple kriging (Cressie 1993) technique constitutes a useful tool but in its original form it does not take into account the role of the quadrature factors. To overcome this problem, we make adjustments to the predictor by considering the influence areas wi.

Result 1. The simple kriging prediction Of Cv (So) is cWWOv, where Ov = IvlOv is the normalised eigenvector and W is a diagonal weighting matrix with entries equal to the square root of the influence area associated with site si (i.e. Wi~ = x/wi).

Proof. The simple kriging prediction of the residual process 5(s) at So is ~(s0) =

c T c - 1 5 m ~ 7ri~ (Si), where (c0TC -1 = 71" = (7fl, 7r2, . . . , "fin) ). From (5) i= l

Dynamic models for Karhunen-Lo~ve expansion 67

it fol lowsS(so) = ~ T r i S ( s i ) = ~ T r i ~ g v C v ( S i ) = ~ g v ~ T r ~ r ---- i = 1 i = 1 v = l v = l i = 1

9~eo C - 1 r = ~ gvq~(so) as in (12). Finally, considering the influence areas v = l v = l

and Eq. (11) we can write

Sv(so) = co~C-l,v = co~WW-~C- lW-% = c:WC*-~Ov = l~-~co~WO~ = r

(15)

Resul t 2. The spatial predictor of the process at the unmonitored site so is

n

~(~o) = f(~o) ~ 3 + Zg~o~WOv v : l

(16)

Of course, this approach does not prove satisfactory for a space time prediction. One of the principal drawbacks is that it does not take advantage of the temporal variation of the phenomenon. For example, at the second time frame, we also have data from the first time frame; at the third time frame, we also have data from the first two time f lames and so on. The method should use this past data and the temporal smoothness to produce a better estimate. In the next section, we use a spatio-temporal model and the Kalman filter to incorporate past data as well as current data into the prediction of the observed phenomenon. In this context, for the analysis of monitoring data in real time, the spatio-temporal process is interpreted to be a temporal sequence of spatial processes.

4.2. Space-time Kalman filter

When the process presents a spatial drift, the first q common fields constitute a q-vector of spatial trend fields f (s ) = { f l ( s ) . . . . , fq(s)}; on the other hand, the remaining (p - q = n) common fields are known as principalfields (Mardia et al. 1998). According to Eq. (16), we obtain the measurement matrix considering the q spatial trend fields and the n principal fields c~W0v (i = 1 , . . . , n; v = 1 , . . . , n). For example, assuming that the process presents a linear trend ( f l (si)= 1, f2 (si) = xi, fa(s i ) = Yi), we have the following (n x p) matrix

H =

1 x~ y~ cTwO~ ... c[W~,~ )

1 x~ V~ e~W{91 . . . c T w o n

If we approximate the random process in terms of some number rn < n, the truncated expansion limits the number of principal fields to the eigenvectors of C corresponding to the m largest eigenvalues. This leads to a dimension reduc- tion of the state vector c~(t). Furthermore, since these first m eigenvectors capture

68 L. Fontanella, L. Ippoliti

both large and small scale spatial variation the truncated expansion represents the unobserved signal process. In this case, the last n - m eigenvectors will be spa- tially uncorrelated (especially if a nugget effect is estimated) and, consequently, the measurement error ~(t) will be both temporally and spatially uncorrelated.

At this point our space time model can be cast in the DLM framework and the Kalman filter (Meinhold, Singpurwalla 1983) can be exploited to predict the unobservable state vector on the basis of the sample information. Furthermore, the recursive structure of the filter also allows for estimating the temporal parameters. In fact, through the forward and backward recursions the EM algorithm (Dempster et al. 1977; Shumway and Stoffer 1982) can be used to maximize the log-likelihood function. In this context, following Ghahrami and Hinton (1996), the EM algorithm provides also the possibility to update temporally the initial "spatial" estimate of the measurement matrix H. This procedure is proposed here as an alternative to the two stage iterative estimation process followed by Mardia et al. (1998); thus it constitutes a "full" EM algorithm that captures, as much as possible, the spatial variation at each occasion and across occasions.

The Kalman filter algorithm is also used to obtain temporal and space time predictions. In particular, if we are interested in obtaining a temporal prediction for the sampled sites, the forecast of the spatial series for the time T + k is

2 ( - , T + k) = H a ( t + kit ) (17)

where ct(t + kit) =O k a( t ) . While the corresponding prediction variance is

Var [2 (., T + k)] = H(OkPTITOk! + En)Ht § Ee (18)

where PT]T is the variance prediction error (computed by the Kalman filter) of the state vector (West and Harrison 1997).

4.3. Spatio-temporal prediction

In this section the spatio-temporal prediction of the process at n* unobserved loca- tions is of main interest. Linear Bayes inferences within the spatio-temporal Kalman filter (Meinhold and Singpurwalla 1983; West and Harrison 1997) and standard re- sults for the conditional multivariate normal distributions (Mardia et al. 1979) can be exploited to obtain such predictions. According to Eq. (2) we can write

z(.,t) z . , (19)

where Z* (., t), c* (., t) and H* denote respectively the random vectors and the matrix of common fields at the n* unmonitored locations. Assuming multivariate normality, for a given a ( t ) it follows

Z , ( . , t ) j ~ N , c~(t); E ~ E ~ , (20)

Dynamic models for Karhunen-Lo~ve expansion 69

where S* = C o v [e* (t), e q(t)] and S** = Coy [~* (t), ~* (t)]. Consequently, the optimal predictor is derived as the conditional expectation (Mardia et al. 1979)

2"(., t) = E[Z* (., t) IZ (-, t)] = H*c~(t) + E~TE~-I(Z (., t) -- Hc~(t)) (21)

and the corresponding prediction error variance is (Kitanidis 1986; Mardia et al. 1998)

** - - y]*T~']--l,~-]*) v a r [ ~ * ( , t ) ] : ( ~ _ ~ _ ~ _ ~ ,

+(H* ,T -1 �9 E ,T ]E-1 H ]T (22)

At this stage, some considerations may be useful to understand the space-time pre- diction approach better. The second term in (21) is a type of kriging prediction applied to the spatial error term s(t). Its contribution to the prediction depends on the truncation parameter m which allows for a dimension reduction (e.g., small values of m imply a spatially correlated e(., t) term and the contribution of the kriging predictor will be significant; on the other hand, large values of m are use- ful to solve only a "denoising" data problem and the resulting contribution of the kriging predictor will be relatively small). In particular, when e(-, t) is a "pure" measurement error, the optimal predictor is only based on the Kalman filter predic- tion of the state vector. Thus, for small values of the truncation parameter m, the error term e(-, t) might be decomposed as e(-, t) --- ~( . , t) + v(. , t) where w(., t) represents the small scale spatial variation that does not have temporally dynamic structure, and v( . , t) a "pure" measurement error. As shown in Wilde and Cressie (1999, Sect. 2.3), this decomposition leads to a more detailed form of the opti- mal predictor where it becomes natural to plug-in our adjusted kriging predictor given in (16). Furthermore, as it can be seen from (21) and (22), a further crucial point is also related to the definition of the measurement matrix H* at locations for which we do not have data. Since the common fields are non-stochastic we could apply some relatively simple interpolation scheme to obtain such basis functions. This approach is justified especially for small values of the truncation parameter m for which it is known (Mardia et al. 1998) that the eigenfunctions show a sim- ple smooth pattern. However, as it has been shown in (15), one can obtain such predictions weighting the simple kriging predictions by means of the numerical quadrature approach. Finally, if we are interested in obtaining a spatio-temporal prediction, the k-step ahead predictions is

2 ( . ,T + k) = H*~( t + kit ) (23)

Note that a further possible approach to the space-time prediction is associated with the Markov Chain Monte Carlo simulation (West and Harrison 1997; Mantovan et al. 2000). In fact, following Tonellato (1997, 1998) the Gibbs sampler could also be used to obtain predictions at unmonitored sites. Notwithstanding the computational efforts, this last approach is useful particularly when the EM estimate appears to be unstable as the complexity of the trend surface increases.

70 L. Fontanella, L. Ippoliti

5. Modelling nitrogen dioxide levels in Milan

In this section, the results of fitting the data to our spatio-temporal model are illus- trated. The data set, provided by the Environmental Agency of Lombardy Region (ARPA), represents the 365 daily maxima observations (from 1 St January to 31 st December 2001) of Nitrogen Dioxide (NO2) levels at n=41 sites in Milan and its neighbourhood. To obtain the spatio-temporal predictions (t > T) on the unmon- itored sites we have deliberately excluded the last five days (27-31 Dec) from the global analysis so that the available data set is considered to be of 360 days. The coordinate system of the monitoring stations is the Italian national grid system (Gauss-Boaga), which is based on the Universal Transverse Mercator (UTM) pro- jection. The map of locations used in this study is shown in Fig. 1. The study of the temporal pattern of the data set highlights the highest values of NO2 in the autumn and winter months. Each of the 41 time series shows a similar temporal pattern with highest values at the first and last days considered. Furthermore, for each of the 360 spatial series, locations in the city of Milan show a higher daily average with respect to the other sites. This is not surprising and may be attributed to a variety of factors as emissions from vehicles, manufacturing and heating systems.

To analyse the common spatial structure, the type of trend and residual spatial correlation must be identified. Following Cristakos and Hristopulos (1998), the empirical variogram was obtained by averaging in time the spatial variograms. On the other hand, the common spatial trend is recognized as a six-parameter quadratic surface, representing six trend fields in the matrix H : hl(si) = 1, h 2 ( s i ) = x i ,

h3(si) Y i , h4(si) x i Y i , h5(si) 2 2 = = = x i , h 6 (si) = y~. Accordingly, as shown in Fig. 2, the omnidirectional empirical variograms for the observed data and for the residuals of a linear trend, clearly exhibit a residual trend structure. To asses the best fit for the empirical variogram computed on the residuals, we investigated a variety of variogram transition models (Cressie 1993). By using the weighted least squares procedure, the IGF index (Indicative Goodness of Fit; Cressie 1993; Eq. 2.6.12) was calculated as a metric for selecting the best fitting variogram model. According to the IGF statistic, Table 1 shows (for the rescaled coordinates) that the Hole Effect variogram, with range 0.0758, partial sill 23.3679 and nugget 79.5994, produced the best fit.

Table 1. Parameter values for omnidirectional variogram models

Models Range Partial sill Nugget IGF Sphericals 0.0735 33.2306 70.0253 8.8095 Exponential 0.0459 35.3301 67.7109 9.0565 Gaussian 0.0595 30.0597 73.1313 8.8664 Hole effect 0.0758 23.3679 79.5994 6.3771

Dynamic models for Karhunen-Lorve expansion 71

"O

5060000

5050000

5040000

5030000

5020000

22 13

24

17 0

0 8 6 29 27 [] o o r~ 5

20 41 26 34 O 7 2511 18 O 13 15 121 171 n m I~

0 19 0 16 r'l 21

0 l 36 rl 39 O 10 30 rl

3 03~72 0 350 m 1.13~ rl

rl

23 i"I

14 O

28 33

40 /2

31 O

5010000 = i ~ ' = �9 �9 .

1470000 1480000 1490000 1 .500000 1510000 1520000 1530000 1540000 1550000

longitude

Fig. 1. Network of Milan district. The names of the sites are: (1) Marche, (2) Juvara, (3) Zavattari,(4) Verziere, (5) CiniseUo B., (6) Agrate Brianza, (7) Sesto S.Giovanni, (8) Monza, (9) Villasanta, (10) LimJto, (1 l)Cassano, (12) Melegnano, (13) Tribiano, (14) Corsico, (15) Rho Centro, (16) Pero, (17) Legnano S.Magno, ( 18)Turbino, (19) Robecchetto, (20) Castano Primo, (21 ) Cuggiono, (22) Meda, (23) Limbiate, (24) Vimercate, (25) Indago, (26) Arese, (27) Garbagnate, (28) Abbiategrasso, (29) Lainate, (30) Settimo, (31) Laccharella, (32) Liguria Romolo, (33) S. Giuliano, (34)Cormanol, (35) Magenta Vf, (36) ECo Lambro, (37) Senato Marina, (38) ELe Abbiategrasso, (39) Via Messina, (40) Motta Visconti, (41) Arconate

Thus, for the residuals of a quadratic trend the theoretical variogram model is defined as

0 [ h = 0 sin(~r Ih1/0.0758) ]

@ (Ih[) : 79.5994 + 23.3679 1 - - ~lhl/0.075s -J h • 0

With the aim of investigating the existence of anisotropy, directional variograms were produced for four angles: 0 ~ , 45 ~ , 90 ~ and 135 ~ (note that 0 ~ represents the E-W directions and 45 ~ represents the N-S directions).

The common behaviour of the directional variograms shown in Fig, 3 suggests a similar spatial continuity for all the considered directions; this is also confirmed in Table 2 by the similarity of the ranges and total sills for the hole effect variograms.

The decomposition of the covariance matrix according to Eq. (11) was then applied considering the influence areas wi obtained by means of the Voronoi tes- sellation shown in Fig. 4.

72

300 i i

[ --E~- observed data - - ~ linear residuals

quadratic residuals

L. Fontanella, L. Ippoliti

250

200

150

E

IO0

_ J

50

i

0.05 I I I I I

0.1 O15 0.2 0.25 0.3 lag

Fig. 2. Omnidirectional empirical variograms

0.35

140

120

100

E

8,8o

.-.2

.~ 60

40

20

omnidrectiorel 0 o 45 ~ 9(Y 135 ~

I I J I I 0.05 0.1 0.15 0.2 0.25

lag

Fig. 3. Omnidirectional and directional empirical variograms

I 0.3

I 0.35

Dynamic models for Karhunen-Lo~ve expansion

Table 2. Parameter values for omnidirectional and directional hole effect variograms

73

Hole effect model Range Partial sill Nugget IGF

Omnidirectional 0.0758 23.3679 79.5994 6.3771

0 ~ Direction 0.0629 33.2902 58.5015 5.6551

45 ~ Direction 0.0778 13.1880 88.8853 4.0868

90 ~ Direction 0.0770 27.2730 83.8281 1.6950

135 ~ Direction 0.0734 34.4180 69.8183 1.0345

50.6

5O.55

5O.5

5O.45

50.4

~ sa~

50.3

5O.25

50.2

50.15

14.8 14.9 15 15.1 15.2 15.3 15.4 Imgitude

Fig. 4. Voronoi tessellation of the observed region

To achieve a dimension reduction the choice of a truncation parameter m is essential. In this study the parameter m was arbitrary chosen to be 15: based on an examination of the eigenvalues this value accounts for the 68% of the variance of the process. Figure 5 shows the level of the explained variance with and without the inclusion of the influence areas. As it can be seen, the introduction of the weights wi allows a more parsimonious model. Accordingly, the measurement matrix H has dimension (41 x21) for the monitored sites. With the identified matrix I-I, we then applied the EM procedure to estimate the temporal parameters. Following Ghahrami and Hinton (1996), the EM procedure was also used to update temporally the spatial information contained in the matrix H. To examine the effectiveness of our model we applied the Kalman filter and compared the predicted spatial series with the observed data. For a quantitative measure of model's precision and accuracy, we performed a cross validation technique (i.e. the NO2 values at site

74 L. Fontanella, L. Ippoliti

0 r

. r

"(3 e--

X (1)

._~ "-5 E 0

100

9O

8O

7O

6O

5O

4O

3O

2O

10.

0 0

Exl~aJnocl xoxianee

I ~ ~ttx:~t inluence areas ~ - - ~th inful~lce areas , t "

: 9 ~ '

/ // j~

S / f /

I I I I I

5 10 15 20 25 30 35 eigemalue ranks

Fig. 5. Explained variance

s~ are predicted using data from other sites except site si). Table 3, shows that the summary statistics for the real and fitted data are clearly comparable. The goodness of fit is also confirmed by the box-plot represented in Fig. 6.

Table 3. Global and local statistics for spatial predictions

Descriptive Real data Spatial prediction

staff stic s 1 / 1/2001-26/12/2001 1 / 1/2001-26/12/2001

Mean 48.129 48.416

Median 46.300 46.785

Std. Deviation 18.381 16.239

Minimum 0.200 4.972

Maximum 177.300 145.749

Interquartile range 23.000 20.780

Skewness 0.857 0.797

Kurtosis 1.905 1.632

Pearson correlation 0.914

Residual sum of squares 909.740

Dynamic models for Karhunen-Lo~ve expansion 75

~9

>

180

160

140

120

100

80

60

40

20

0

i

+

+

i a_

i

J

_1 l +

observad data predicted data

Fig. 6. Box-plot for the No2 data and spatial predictions

Furthermore, to test the model's ability to perform spatio-temporal predictions, Table 4 compares the summary statistics of the two distributions of the real and predicted values, for the period 27-31 December 2001.

As expected, the combination of the Kalman filter with the kriging predictor highlights the difficulties of the model in predicting extreme values (peaks and valleys). This difficulty is evident especially for the second period forecast where the real series shows some extreme values. This is confirmed by Fig. 7 which exhibits the box-plots of real and predicted values. In any case, considering that only the 68% of the variance is explained it seems that the model is able to capture the essence of the space time pattern. This is also confirmed by the last box-plot (bottom-right) which compares the distribution of the real data for period 27-3 l December 2001 versus the respective predicted values.

Acknowledgements. We would like to thank the referees for valuable comments on earlier drafts. Part of the work was funded by the grant 'Ex-MURST 40%'.

7 6 L. F o n t a n e l l a , L. Ippol i l

O

O e::,_

O

::::1

0'3

c5 o c5 o o

r t'-- r tO r 0(3

I I

I>- b - r oo ~ ,,~ t-- DO ,-.q ~ t ~ r

c:5 o ~5 c5 c5 c5 I I I I I

LO r t--- o0

oo I--- b -

r t-.-

l',-- I'--

oo

t',-

LO

06 r

r

OO

t"-

GO t"-

O r O r

l'~ ~ t'-- L~

h'~ I'-- OO

OO ' ~ OO r r r r

t'~ ~ OO

O"2 ~ b -

hO "~ L,"3 ',~

•i• ~ "~"~ r t.D L~ "~'~

~ 5 o c 5 c 5 ~ r

~ r ~ ~ O ~ Q.~'~

c 5 ~ o ~ c6r

r r,D OO ~ OO ~ ~ t-d O'~ ,..~ ~ O[3 t"-. r t'-- ~ t'--

~o~ o~ ~ ~ ~- ~ -~'~

c ~ - 4 c 5 c5Q6

c. D -~ c.~ t-.- ~ r LD r r COCO

Dynamic models for Karhunen-Lobve expansion 77

2~12/01

6O

• • 4-

data predicted 29/12/01

data predicted 31/12/01

• *...2

data predicted

8O

2~12/01

4-_ _L data predicted

30/12/01 + §

4- l data predicted

2712/01 --31 / 1 2/01

• data predicted

Fig. 7. Box-plot of the spatio-temporal predictions

References

Buell CE (1972) Integral equation representation for factor analysis. J. Atmosph. Sci. 28, 1502-1505 Cohen A, Jones RH (1969) Regression on a random field. JASA 64, 1172-1182 Cressie N (1993) Statistics for spatial data. New York: Wiley Cristakos G, Hristopulos D (1998) Spatio temporal environmental health modelling: a tractatus stochas-

ticus. Kluwer Academic Publishers, Boston Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM

algorithm. JRSS serie B, 39, 1-38 Di Giacinto V (1995) Sulla costruzione di modelli lineari per l'analisi di dati eeonomici a struttura

spazio-temporale. Ph.D. Thesis Freiberger W, Grenander U (1965) On the formulation of statistical meteorology. Rev. Intern. Statist.

Inst. 33, 59-86 Ghahramani Z, Hinton GE (1996) Parameter estimation for linear dynamical systems. Tech. Report

CRG-TR-96-2 Huang H-C, Cressie N (1996) Spatio-temporal prediction of snow water equivalent using the Kalman

filter. Computational Statistics and Data Analysis 22, 159-175 Karl T, Koscielny A, Diaz, H (1982) Potential errors in the application of principal component (Eigen-

vector) analysis to geophysical data. Journal of Applied Meteorology 21, 1183-1186 Kitanidis P (1986) Parameter uncertainity in estimation of spatial functions: Bayesian analysis. Water

Resource Research 22, 499-507 Ippoliti L (2001) On-line spatio-temporal prediction by a state-space representation of the generalised

space time autoregressive model. Metron LIX, 1-2 Jona-Lasinio G (2001) Modelling and exploring multivariate spatial variation: a test procedure for

isotropy of multivariate data. Journal of Multivariate Analysis 27, (2): 295-317 Liitkepohl H (1993) Introduction to multiple time series analysis. Springer-Verlag

78 L. Fontanella, L. Ippoliti

Mantovan P, Pastore A, Tonellato S (2000) Apprendimento e previsione con modelli lineari dinamici. Materiale didattico per il corso Nuove metodologie per la previsione. Venezia, 4-9 Settembre 2000

Mardia K, Redfern E, Goodall C, Alonso F (1998) The Kriged Kalman filter. TEST 7, (2): 217-285 Mardia K, Kent J, Bibbi J (1979) Multivariate analysis. Academic Press, London Mallat S (1998) A wavelet tour of signal processing. Academic Press, New York Meinhold RJ, SingpurwaUa ND (1983) Understanding the Kalman filter. Am. Star. 37, 123-127 Obled C, Creutin JD (1986) Some developments in the use of empirical orthogonal functions for mapping

meteorological fields. Journal of Climate and Applied meteorology 25, (9): 1189-1204 Obled C, Brand I (1989) Analogies entre geostatistique et analyse en composantes principales de pro-

cessus ou analyse EOFs. In: Geostatistics Vol. I, Armstrong M (eds.) Kluwer Academic Publishers Okabe A, Boots B, Sugihara K (1992) Spatial tessellations. Concepts and applications of Voronoi

diagrams. Chichester: Wiley Pfeifer PE, Deutsch SJ (1980) A three-stage iterative procedure for space-time modelling. Technometrics

22, 35-47 Preisendorfer RW (1988) Principal component analysis in meteorology and oceanography. Elsevier,

Amsterdam Shumway RH, Stoffer DS (1982) An approach to time series smoothing and forecasting using the EM

algorithm. Journal of Time Series Analysis 3, 253-264 Stoffer DS (1986) Estimation and identification of space-time Armax models in the presence of missing

data. JASA 81(395) Terzi S (1995) Maximum likelihood estimation of a generalised STAR(p,lp) model. Journal of the Italian

Statistical Society 4(3) Tonellato S (1997) Bayesian dynamic linear models for spatial time series. Tech. Report 5/1997, Dipar-

timento di Statistica- Universith Ca' Foscari di Venezia Tonellato S (1998) Spatial prediction with space-time models. Tech. Report 2/1998, Dipartimento di

Statistica- Universith Ca' Foscari di Venezia West M, Harrison J (1997) Bayesian forecasting and dynamic models 2nd (ed.) New York: Springer-

Verlag Wikle K, Cressie N (1999) A dimension approach to space-time Kalman filtering. Biometrika 86, (4):

815-829