WEIGHTED COMPLEX ORTHOGONAL ESTIMATOR FOR IDENTIFYING LINEAR AND NON-LINEAR CONTINUOUS TIME MODELS...

Post on 26-Jan-2023

1 views 0 download

Transcript of WEIGHTED COMPLEX ORTHOGONAL ESTIMATOR FOR IDENTIFYING LINEAR AND NON-LINEAR CONTINUOUS TIME MODELS...

Mechanical Systems and Signal Processing (1998) 12(2), 269–292

WEIGHTED COMPLEX ORTHOGONALESTIMATOR FOR IDENTIFYING LINEAR ANDNON-LINEAR CONTINUOUS TIME MODELS

FROM GENERALISED FREQUENCYRESPONSE FUNCTIONS

A. K. S S. A. B

Department of Automatic Control and Systems Engineering, University of Sheffield,P.O. Box 600, Mappin Street, S1, 3JD, U.K.

A new weighted orthogonal least squares algorithm is derived to estimate linear andnon-linear continuous time differential equation models from complex frequency responsedata. The algorithm combines the properties and advantages of both weighted andorthogonal least squares algorithms. A weighted complex orthogonal estimator, obtainedby combining the proposed algorithm with the modified error reduction ratio test providesan effective and robust way of detecting the correct model structure or determining whichterms to include in the model and identifying the unknown parameters. Since the estimationprocedure does not involve any numerical differentiation of the noisy data, the performanceof the estimator under the influence of significant noise is quite satisfactory. The proposedestimator has been applied successfully to a variety of linear and non-linear systems.

7 1998 Academic Press Limited

1. INTRODUCTION

The literature on system identification is dominated by algorithms which fit discreteparametric models from real-time domain measurements but little work has been reportedwhen the data are complex. Complex data, though rare in the real world, are commonlyfound in the fields of communication and signal processing, and the frequency domainanalysis of a system often involves the manipulation of complex data. System identificationin the frequency domain based on complex-valued frequency response data has not beenwidely studied. The current emphasis on control synthesis and design tools hinge uponparametric models which in turn have tended to inhibit parallel developments ofparametric frequency domain identification techniques which come under thenon-parametric system identification framework.

However, frequency and time domain methods give complimentary perspectives of manyimportant problems in linear system and control theory [1]. Occasionally the two methodshave been viewed as rivals, particularly on issues of implementation and application to realsystems. It is known that the frequency response function and the power spectral densitygive good insight into properties of the data and provide information about the systembehaviour and properties.

With the advent of modern spectral analysers and high-performance data-acquisitionequipment, it is of practical interest to find techniques which identify linear parametricmodels from frequency response data (i.e. to synthesise the transfer function). However,the determination of a linear transfer function as the ratio of two complex polynomialsfrom the discrete frequency response data is in essence a rational approximation problemand the direct solution requires non-linear least squares based techniques. Several

0888–3270/98/020269+24 $25.00/0/pg970146 7 1998 Academic Press Limited

. . . . 270

approaches have been suggested based around a linear least squares framework to try toavoid the non-linear least squares problem [2–6]. These techniques have been put into aunified frame work in [7], where it was recommended that a constrained optimisationapproach was preferred to the least squares based approaches when the data are corruptedby noise.

When the system is non-linear, higher-order or generalised frequency responsefunctions have to be considered. These can be estimated by extending the traditionalfast Fourier transform (FFT), windowing and smoothing concepts to multidimensions [8]or by fitting a non-linear parametric model and then mapping this into frequencydomain. The latter approach developed by Billings and Tsang [9], Peyton Jones andBillings [10] is used in the present study. Whatever method of estimating the higher-orderfrequency response functions is used, parameterising these estimates is much moredifficult than in the linear case and very few authors have considered this problem [11, 12].The main difficulty in the non-linear case is the enormous combinatorial possibilities ofterms which could be included in the model. This is not a major problem if thesystem is linear because the total number of possible terms is relatively small and anacceptable combination of terms can usually be found without much difficulty. In thenon-linear case, the number of possible terms increases rapidly as the degree ofnon-linearity and the order of the input output dynamics are increased. Estimatingnonlinear differential equation models based on the higher order frequencyresponse functions, however, offers a powerful method of reconstructing continuoustime models without computing derivatives and this warrants a detailed study of thisapproach.

In the present study a new algorithm is derived that combines the features of classicalweighted least squares with orthogonal least squares and this is extended to the case whenthe data are complex. The algorithm is used to estimate non-linear continuous time modelsfrom frequency response data without computing derivatives. With no a prioriinformation, the reformulated error reduction ratio provides a measure of the contributionof each candidate model term to the variance (energy) of the dependent variable. This isused to construct non-linear differential equation models term by term. The mostsignificant term is selectd initially and included in the model, then the next most significantterm, and so on. The orthogonality property of the estimator coupled with the modifiederror reduction ratio test therefore provides vital information regarding which terms toinclude in the model. The weighting property of the estimator reduces the sensitivity ofthe estimates to the choice of frequency range, the number of data samples and the effectsof noise. Example showing how the new procedure can be used to construct both linearand non-linear differential equation models from estimated frequency response functionsare included.

2. WEIGHTED COMPLEX ORTHOGONAL ESTIMATOR

Consider a system which can be modeled as

z( jv)= sM

i=1

ui pi ( jv)+ j( jv) (1)

where ui , i=1, . . . , M are the real, unknown deterministic parameters of the systemassociated with the complex regressors pi ( jv), i=1, . . . , M. z( jv) is a complex dependentvariable or the term to regress upon and j( jv) represents the modeling error.

271

If N measurements of z( jv) and pi ( jv) are available at vi , i=1, . . . , N the complexsystem of equation (1) can be represented in matrix form as

Z=Pu+J (2)

where

Z= &z( jv1)···

z( jvN )', u= &u1···

uM', J= &j1···jN' (3)

and

p1 ( jv1) p2 ( jv1) · · · pM ( jv1)

p1 ( jv2) p2 ( jv2) · · · pM ( jv2)P=G

G

G

K

k

······

······

GG

G

L

l

(4)

p1 ( jvN ) p2 ( jvN ) · · · pM ( jvN )

is an N×M complex matrix of known regressors. The matrix P is referred to as theinformation matrix.

However, before any attempt is made to obtain an estimate of u, the complex variablesinvolved in equation (2) should be partitioned into real and imaginary parts; otherwise u

could be complex. Thus equation (2) is represented as

Z= &ZR

––ZI'= &PR

––PI'u+ &JR

––JI' (5)

where the subscript R denotes the real part and I the imaginary part of the component,Z=2N×1 the real vector, P=2N×M the real constant information matrix, u=M×1the real parameter vector and J=2N×1 the real vector.

In a general algebraic context equation (5) is an overdetermined set of equations forthe unknowns u1, u2, . . . , uM and thus has a unique least squares solution provided therank of the information matrix P is M. The weighted least squares solution is given by[13]:

u =(PTQP)−1PTQZ (6)

When the weighting matrix Q is a unit diagonal, equation (6) gives the standard leastsquares solution of the parameter vector u.

By reformulating the problem such that each parameter in u can be estimatedindependently using an orthogonal algorithm, considerable advantage can be achieved.This algorithm is derived in Appendix A.

The implementation procedure for the weighted complex orthogonal estimator usingforward regression is summarised below.

Step 1, Define the model size M and consider all the regressors pi (v), i=1, . . . , M as possible

candidates to qualify for w1 (v); the first term of the auxiliary model.

. . . . 272

, For i=1, . . . , M, calculate

w(i)1 (v)= pi (v)

g(i)1 =

�Z(v), v(i)1 �

�v(i)1 , w(i)

1 � , where v(i)1 =wT(i)

1 Q,

E(i)1 =

(g(i)1 )2�v(i)

1 (v), w(i)1 (v)�

�(Z(v)Q), Z(v)� , (7)

where E is the error resuction ratio., Find the maximum of E(i)

1 , for instance if E(i)1 =max{E(i)

1 , 1E iEM}, then the first termselected should be the jth term of equation (1) i.e. w1 (v)= pj (v)

Step 2, Consider all the pi (v), i=1, . . . , M, i$ j as possible candidates for w2 (v). For

i=1, . . . , M and i$ j, calculate

w(i)2 = pi (v)− a(i)

12 w1 (v),

g(i)2 =

�Z(v), v(i)2 �

�v(i)2 , w(i)

2 � , where v(i)2 =wT(i)

2 Q,

E(i)2 =

(g(i)2 )2�v(i)

2 (v), w(i)2 (v)�

�(Z(v)Q), Z(v)� , (8)

where

a(i)12 =

�v1 (v), pi (v)��v1 (v), w1 (v)�.

, Find the maximum of E(i)2 , say if E(k)

2 =max{E(i)2 , 1E iEM, i$ j}, then the second term

becomes w2 (v)=w(k)2 = pk (v)− a12 w1 (v)

, Continue the process of selecting the terms until the sum E reaches a value close to 100%, Recover the unknown parameters of the original system using equation (A20).

3. MODEL VALIDATION

Model validation is a crucial issue in black box identification where model validity testsare generally used to attach a quality tag to a model. Note, however, that the correctnessof a model can only be definitely proved if the real process is known, in which caseidentification is not needed. Hence it follows that one can only try to show that the modelis invalid. Failure to invalidate the model is then used to infer that the model is correct.From this perspective, the term model validation can be interpreted therefore as modelinvalidation.

In time domain identification, the measured data are asssumed to be generated by somefinite dimensional system corrupted by noise. If the model order is increased above thetrue order, the model will start to fit to the noise. Several techniques are available tovalidate the model and the correlation-based techniques initially proposed by Billings andVoon [14] and improved by Billings and Zhu [15] provide simple and effective ways oftesting the validity of the model.

Cross-validation is another way of validating the model, where a given data set is dividedinto two disjoint sets; one set is used for identification and the other for model validation.The model validation tests based on correlation techniques are applied to time domainmodels and can not be used in frequency domain identification. In the frequency domain,

273

cross-validation is an efficient means of validating the model. This is easily performed bydividing the frequency response measurements into two disjoint sets; an estimation set anda test or validation set. A natural division is to take every other frequency point as theestimation data and the rest as the test data. A model is then estimated using only theestimation data and the quality of the model is assessed by comparing the estimated modelwith the test data set.

4. IDENTIFICATION OF CONTINUOUS TIME MODELS FROM FREQUENCYRESPONSE DATA

Before the problem of estimating the parameters of a continuous time model of anon-linear system can be formulated, the concept of generalised frequency responsefunctions (GFRF) must be introduced.

It is well known that the input–output behaviour of a wide class of non-linear systemscan be represented by the Volterra series

y(t)= sN

n=1

yn (t) (9)

where yn (t), the nth order output of the system is given by

yn (t)=ga

−a

· · · ga

−a

hn (t1, . . . , tn ) tn

i=1

u(t− ti ) dti , nq 0; (10)

and hn (t1, . . . , tn ) is known as the nth order Volterra kernel or generalised impulseresponse function of order n. The multidimensional (nq 1) Fourier transform of the nthorder impulse response yields the nth order transfer function or the GFRF.

Hn ( jv1, . . . , jvn )=ga

−a

· · · ga

−a

hn (t1, . . . , tn ) exp(−j(v1 t1 + . . . vn tn )) dt1 . . . dtn .

(11)

Note that when n=1, the above expression gives the linear transfer function. Since theobjective is to determine the structure and estimate the parameters of a non-lineardifferential equation model from GFRFs, the first step is to compoute the GFRF and showhow this problem can be solved using the weighted orthogonal estimator.

4.1.

Consider a system whose dynamics are described by the non-linear differential equation

sNl

m=1

sm

p=0

sL

l1,lp+ q =0

cp,q (l1, . . . , lp+ q ) tp

i=1

Dliy(t) tp+ q

i= p+1

Dliu(t)=0 (12)

where L is the maximum order of differential; and p+ q=m, and m=1, . . . , Nl

correspond to various orders of non-linearity. The operator D is the differential operator.Once m takes a specific number, all the mth-order terms, each of which contain a pth-orderfactor in Dliy(t) and qth-order factor in Dliu(t) subject to p+ q=m, are included and eachterm is multiplied by a coefficient cp,q (l1, . . . , lp+ q ) while the multiple summation over the

. . . . 274

li (li =0 . . . L) generates all the possible permutations of differentiations. For example, thegeneral linear differential equation is included in the above form by setting the order ofnon-linearity m as 1 to give:

sL

l1 =0

c1,0 (l1)Dl1y(t)+ sL

l1 =0

c0,1 (l1)Dl1u(t)=0. (13)

As an example the differential equation

d2ydt2 + a1

dydt

+ a2 y+ b1dudt

+ b2 u+ c1 y2 + d1 u2 + c2 y3 +d20d2ydt21

2

u=0 (14)

would be represented by the model of equation (12) with the following definitions:c1,0 (0)= a2, c1,0 (1)= a1, c1,0 (2)=1.0, c0,1 (0)= b2, c0,1 (1)= b1, c2,0 (0, 0)= c1,c0,2 (0, 0)= d1, c3,0 (0, 0, 0)= c2, c2,1 (2, 2, 0)= d2.

The frequency domain equivalent of equation (12) is based on the generalised orhigher-order FRFs which are given by mapping the time domain representation into thefrequency domain [16]:

−$ sL

l1 =0

c1,0 (l1) ( jv1 + . . . . jvn )l1%Hasymn ( jv1, . . . , jvn )

= sL

l1,ln =0

c0,n (l1, . . . , ln ) ( jv1)l1 . . . ( jvn )ln

+sn−1

q=1

sn−q

p=1

sL

l1,ln=0

cp,q (l1, . . . , lp+q ) ( jvn− q+1)lp+1 . . . ( jvn )lp+ qHn− q,p ( jv1, . . . , jvn− q )

+ sn

p=2

sL

l1,lp =0

cp,0 (l1, . . . , lp )Hn,p ( jv1, . . . , jvn ). (15)

where the recursive relation is given by

Hasymn,p ( · )= s

n− p+1

i=1

Hi ( jv1, . . . , jvi )Hn− i,p−1 ( jvi+1, . . . , jvn ) ( jv1 + · · ·+ jvi )lp . (16)

The recursion finishes with p=1 and Hn,1 ( jv1, . . . , jvn ) has the property

Hn,1 ( jv1, . . . , jvn )=Hn ( jv1, . . . , jvn ) ( jv1 + · · ·+ jvn )l1. (17)

Equations (15)–(17) for the GFRF is used to derive the regression equations forestimating the unknown parameters cp,q ( · ) in the continuous time model of equation (12).

Note that, the computation of the nth-order FRFs is a recursive procedure where eachlower order of GFRF contains no effects from higher-order terms. This offers a distinctadvantage since the parameters corresponding to different non-linearities or terms can beestimated independently, beginning with the first order and then building up to includethe non-linear terms.

275

4.2. —

The first-order frequency response is only related to the linear input–output terms.Setting n=1 in equation (15) gives

−$ sL

l1 =0

c1,0 (l1) ( jv1)l1%H1 ( jv1)= sL

l1 =0

c0,1 (l1) ( jv1)l1. (18)

Notice that the only effect of multiplying by a constant on both sides has on the formof the equation is to change the parameters. This suggests that all the parameters cannot be estimated uniquely. Hence it is necessary to fix a parameter before the estimationbegins.Without loss of generality it is assumed that the parameter corresponding to the linearoutput term c1,0 (0) is unity. Moving all other terms to the right-hand side gives

−H1 ( jv1)= sL

l1 =1,l1 $ 0

c1,0 (l1) ( jv1)l1H1 ( jv1)+ sL

l1 =0

c0,1 (l1) ( jv1)l1. (19)

which is a linear in the parameters expression. For each value of frequency v, twoequations are obtained, corresponding to the real and imaginary parts. However,for clarity, compare equations (19) and (1) and relate the parameters of both theequations.

So that from equation (1)

z( jv)= u1 p1 ( jv)+ u2 p2 ( jv)+ · · ·+ uM pM ( jv) (20)

where M=2L+1 and

z( jv)=−H1 ( jv1)

u1 = c1,0 (1), p1 ( jv)= ( jv1)H1 ( jv1)

u2 = c1,0 (2), p2 ( jv)= ( jv1)2H1 ( jv1)

······

uL = c1,0 (L), pL ( jv)= ( jv1)LH1 ( jv1)

uL+1 = c0,1 (0), pL+1 ( jv)= pM ( jv)= ( jv1)0 =1

uL+2 = c0,1 (1), pL+2 ( jv)= ( jv1)

uL+3 = c0,1 (2), pL+3 ( jv)= ( jv1)2

······

u2L+1 = c0,1 (L), p2L+1 ( jv)= pM ( jv)= ( jv1)L

The proposed estimation algorithm can now be applied to equation (19) to identify theunknown parameters by replacing the frequency response function H1 ( jv1) by estimatesof this function.

. . . . 276

4.3. - -

To estimate the parameters associated with second-order non-linear terms, set n=2 inequation (15) so that

−$ sL

l1 =0

c1,0 (l1) ( jv1 + jv2)l2%Hasym2 ( jv1, jv2)= s

L

l1,l2 =0

c0,2 (l1, l2) ( jv1)l1 ( jv2)l2

+ sL

l1,l2 =0

c1,1 (l1, l2) ( jv2)l2H1,1 ( jv1)

+ sL

l1,l2 =0

c2,0 (l1, l2)H2,2 ( jv1, jv2).

(21)

By the recursive relation

H1,1 ( jv1)=H1 ( jv1) ( jv1)l1, (22)

and

Hasym2,2 ( jv1, jv2)=H1 ( jv1)H1,1 ( jv2) ( jv1)l2

=H1 ( jv1)H1 ( jv2) ( jv2)l1( jv1)l2. (23)

Finally a linear regression equation is obtained as

−$ sL

l1 =0

c1,0 (l1) ( jv1 + jv2)l1%Hasym2 ( jv1, jv2)

= sL

l1,l2 =0

c0,2 (l1, l2) ( jv1)l1( jv2)l2 + sL

l1,l2 =0

c1,1 (l1, l2) ( jv2)l2H1 ( jv1) ( jv1)l1

+ sL

l1,l2 =0

c2,0 (l1, l2)H1 ( jv1)H1 ( jv2) ( jv1)l2( jv2)l1. (24)

Notice that the parameters c1,0 (l1), l1 =0, . . . , L on the left-hand side have been estimatedas linear terms initially (Section 4.2) while these on the right-hand side of the equationcan be estimated from equation (24) by replacing the first- and second-order FRFs by theirestimates and applying the new estimator.

4.4. - -

When dealing with higher-order terms (n=3, 4, . . . ), some parameters which have beenobtained in previous stages (for orders less than the nth) appear on the right-hand sideof the equation. These are associated with lower-order pure output and cross-productterms but not pure input terms. Moving these terms to the left-hand side gives

277

−$ sL

l1 =0

c1,0 (l1) ( jv1 + · · ·+ jvn )l1%Hasymn ( jv1, . . . , jvn )

− sn−2

q=1

sn− q−1

p=1

sL

l1,ln =0

cp,q (l1, . . . , lp+ q ) ( jvn− q+1)lp+1 . . . ( jvn )lp+ q

×Hn− q,p ( jv1, . . . , jvn− q )

− sn−1

p=2

sL

l1,lp =0

cp,0 (l1, . . . , lp )Hn,p ( jv1, . . . , jvn )

= sL

l1,ln =0

c0,n (l1, . . . , ln ) ( jv1)l1 . . . ( jvn )ln

+ sn−1

q=1

sL

l1, ln =0

cp,q (l1, . . . , ln ) ( jvn− q+1)lp+1 . . . ( jvn )lp+ qHn− q, p ( jv1, . . . , jvn− q )

+ sL

l1,ln =0

cn,0 (l1, . . . , lp )Hn,n ( jv1, . . . , jvn ). (25)

and the recursive relation

Hasymn,p ( · )= s

n− p+1

i=1

Hi ( jv1, . . . , jvi )Hn− i,p−1 ( jvi+1, . . . , jvn ) ( jv1 + · · ·+ jvi )lp . (26)

Equations (21)–(26) hold for asymmetric frequency response data with a particularfrequency permutation and are not necessarily unique. Hence it is essential to usesymmetric frequency response data during estimation. For a given asymmetric GFRFHn ( jv1, . . . , jvn ), the symmetrised GFRF is obtained by summing the asymmetricfunction over all possible permutations of the arguments and divided by the number togive

Hsymn ( jv1, . . . , jvn )=

1n!

sall permutations of fv1, . . . ,vn

Hasymn ( jv1, . . . , jvn ) (27)

For example, in the second-order case, the symmetric version of equations (24) can bewritten as

−$ sL

l1 =0

c1,0 (l1) ( jv1 + jv2)l1%Hsym2 ( jv1, jv2)

= sL

l1,l2 =0

c0,2 (l1, l2) [( jv1)l1( jv2)l2 + ( jv2)l1( jv1)l2]/2

+ sL

l1,l2 =0

c1,1 (l1, l2) [( jv2)l2H1 ( jv1) ( jv1)l1 + ( jv1)l2H1 ( jv2) ( jv2)l1]/2

+ sL

l1,l2 =0

c2,0 (l1, l2)H1 ( jv1)H1 ( jv2) [( jv1)l2( jv2)l1 + ( jv2)l1( jv1)l2]/2 (28)

. . . . 278

For higher-order non-linearities, the symmetric version of equations (25) and (26) can beimplemented accordingly and is used in the estimation of continuous time models.

Note that since the higher-order FRFs do not affect the lower-order FRFs, all theparameters in the left-hand side of equations (25) that correspond to the coefficients ofnon-linear differential terms of order less than n, have been estimated in previous stages(Sections 4.2 and 4.3), so that the unknown parameters associated with the nth-ordernon-linearity can be estimated. The whole estimation procedure is therefore progressivestarting from first-order terms and building in higher-order terms. The estimation resultsobtained in the previous stages must be used in later estimation but obviously not viceversa.

Before the complex orthogonal estimator is applied it is appropriate to address certainissues associated with the least squares estimator and provide a set of guidelines that shouldbe followed during the implementation phase.

4.5.

In this section, an intuitive approach is followed to explain the factors that affect theperformance of the estimator by drawing analogies with the time domain algorithm.

Note that the proposed weighted complex orthogonal estimator is derived by combiningthe conventional weighted least squares algorithm with the orthogonal least square (OLS)estimator of Billings et al. [17] and extending this to the complex case. In the past few years,properties of the original OLS estimator and later versions such as the fast orthogonalestimator [17–19] have been studied extensively and applied to a variety of situations byBillings and his coworkers. It is known that the properties of the OLS estimator areidentical to those of conventional least squares with the difference that the former isequipped with a powerful structure selection criteria to aid in obtaining a parsimoniousmodel.

During the implementation of the estimator to the problems posed in the present sectionthe complex data contained in the regressors are partitioned into real and imaginarycomponents. The estimator cannot distinguish whether the data in the information matrixP are obtained from input–output records of a persistently excited system or if they arethe real and imaginary components of complex frequency response data. But irrespectiveof the nature or source of the data, it is well known that to obtain an accurate estimateof the system parameters, the data must be representative of the underlying system. In timedomain identification this is achieved by exciting the system by a persistently exciting inputand sampling the signals at an appropriate rate.

Similarly, the performance of the complex orthogonal estimator depends critically onthe choice of the frequency range and the number of data points. Therefore, the frequencyrange should be selected with great care. It has been observed that when the errors areweighted uniformly over the frequency range of interest, it becomes extremely difficult tofind the correct frequency span that will give acceptable estimates. The weighted featureof the new algorithm offers a possible solution to overcome the sensitivity of the accuracyof the estimator to the choice of frequency range. This can be achieved by a suitable choiceof the weighting matrix Q. One of the possible choices is to set Q, in equation (6), to be[20]

Q=diag[q1 (v), q2 (v), . . . , qN (v)] (29)

where qi (v) is a monotonically decreasing exponential of the form

qi =exp(−v2i /l) (30)

5.0

1.0

0.0Frequency (Hz)

Wei

ght

vect

or (

q i)

2.5

0.5

0.9

0.8

0.7

0.6

0.4

0.3

0.2

0.1

0.5 1.0 1.5 2.0 3.0 3.5 4.0 4.5

279

The weighting parameter l controls the distribution of the weights for each frequency v

(Fig. 1).The basis of selecting the tuning parameter l is intuitive. Note that the terms qi are

positive weights which can be chosen in order to reflect information known about theaccuracy of the measurements of the dependent variable at a particular frequency. Sinceit is known that the measurements of a signal in the higher frequency range are usuallymore prone to the effects of measurement noise, the errors in this range should be givencomparatively less weight than the lower frequency ranges. This is achieved by properchoice of the parameter l and is discussed in more detail in the simulated results. Figure 1illustrates how the influence of different frequency ranges is affected by different choicesof l.

Under ideal conditions, the total number of noise free data points needed to identifythe parameters associated with the Lth-order transfer of equation (19) is 2L+1. Howeverin practice, the data are rarely noise-free, and the exact model structure is not known apriori. Extensive simulations on a large number of systems lead to the conjecture that theperformance of the estimator does not depend critically on the number of data pointsprovided it is within an optimum range of around 30–100. If the errors are uniformlyweighted, the accuracy of the estimates strongly depends on the choice of the frequencyrange. Since it is difficult even-for an experienced user to know precisely the exactfrequency range that is suitable for a particular system, it is advisable to specify thefrequency range relatively arbitrarily and vary the weighting parameter l interactively toobtain the final estimates. This is illustrated in the simulated examples.

4.6.

In this section, the problem of possible numerical ill-conditioning of the algorithm ishighlighted and a procedure is adopted to overcome this. It is known that when estimatingthe parameters of discrete time linear or non-linear models using either of the least squaresbased algorithms, the information matrix P consists of lagged input–output data. Ingeneral, if the input is persistently exciting the matrix, PTP is well-conditioned (i.e. thecondition number of this matrix which measures the sensitivity or variability of a leastsquare solution is small).

Figure 1. Variation of weighting vector with l. · · · ·, l=5; –––, l=10; ——, l=20.

kk k k

mmmc

y1 (t) y2 (t) y3 (t)

u(t)

. . . . 280

However, the situation is not the same when the information matrix P formed by takingdiscrete frequency response data is analysed. For example, the information matrix for thesystem of equation (19) is given as

( jv1)LH1 ( jv1) ( jv1)L−1H1 ( jv1) · · · ( jv1)L 1

( jv2)LH1 ( jv2) ( jv2)L−1H1 ( jv2) · · · ( jv2)L 1

( jv3)LH1 ( jv3) ( jv3)L−1H1 ( jv3) · · · ( jv3)L 1P=G

G

G

G

G

K

k

······

······ G

G

G

G

G

L

l

(31)

( jvN−1)LH1 ( jvN−1) ( jvN−1)L−1H1 ( jvN−1) · · · ( jvN−1)L 1

( jvN )LH1 ( jvN ) ( jvN )L−1H1 ( jvN ) · · · ( jvN )L 1

Note that, the last column vector of the information matrix corresponding to the pureinput term contains a 1, whereas the first column vector of the matrix related to theLth-order derivative of the output contains the terms ( jvk )LH( jvk ), for k=1, . . . , N. Theratio of the norms of these two vectors are relatively high values of L is quite large whichat the outset indicates that the condition number of the matrix PTP will be very largemaking the matrix highly ill-conditioned. It has been found in several cases that thecondition number of the matrix PTP can be in the range of 1027. Although the presentestimator does not estimate the parameters by explicitly computing the inverse of thematrix PTP, numerical problems of this nature need to be avoided.

Therefore, it is recommended that the information matrix should be normalised priorto applying the weighted complex orthogonal estimator. For example, consider the modelof equation (1)

z( jv)= sM

i=1

ui pi ( jv)

= u1 p1 ( jv)+ u2 p2 ( jv)+ · · ·+ uM pM ( jv)

= u1 >p1 ( jv)> p1 ( jv)>p1 ( jv)> + u2 >p2 ( jv)> p2 ( jv)

>p2 ( jv)> +· · ·+ uM >pM ( jv)> pM ( jv)>pM ( jv)>

= unr1 pnr

1 + unr2 pnr

2 + · · ·+ unrM pnr

M (32)

Figure 2. A three degree of freedom mechanical oscillator

0.5

0

0Frequency (Hz)

Ph

ase

angl

e (d

egre

es)

0.25

–50

–100

–150

0.05 0.1 0.15 0.2 0.3 0.35 0.4 0.45–200

0.5

60

0

Mag

nit

ude

(db

)

0.25

40

20

0

0.05 0.1 0.15 0.2 0.3 0.35 0.4 0.45–20

281

where the superscript nr denotes a normalised term. Thus one obtains the orthogonalversion of the normalised equation and can estimate the normalised parameters unr

1 ,unr

2 , . . . , unrM . The original parameters of the model are then recovered directly from the

normalised parameters.

5. SIMULATED EXAMPLES

In the present section, the effectiveness of the new estimator is demonstrated using threedifferent types of systems. The first example represents a linear mechanical oscillatorpossessing three dominant modes with varying degrees of damping. The second examplerefers to the modified Van der Pol equation, and the third example represents the dynamicsof a non-linear system exhibiting chaotic behaviour.

5.1. 1Consider a mechanical oscillator with three degrees of freedom (Fig. 2). The equations

of motion for the three masses are given by

my1 = k(y2 −2y1)+ u,

my2 = k(y3 −2y2 + y1)− cy2,

my3 = k(−2y3 + y2), (33)

where y1 (t), y2 (t) and y3 (t) represent the displacements of the three masses from staticequilibrium as a result of the force u(t) acting on m. The constants c and k represent theviscous damping and stiffness of the system.

Figure 3. The gain and phase response of the system in example 1.

. . . . 282

T 1

Parameters of the auxiliary model for example 1, l=5.0

Iteration Selected terms (pi ) E Parameter of auxiliary model

1 d2y/dt2 89.6036 2.01912 u 5.1892 −2.11383 dy/dt4 0.0976 0.07744 d2u/dt2 2.8879 −1.47545 du/dt 0.1495 0.23186 d3u/dt3 0.1127 0.03267 d4u/dt4 0.0838 −0.00888 d6y/dt6 1.7397 0.98309 dy/dt 0.0054 0.0228

10 d3y/dt3 0.0113 0.111311 d5y/dt5 0.1190 0.3018

For a steady state response to harmonic excitation defined by u(t)= ejvt, the lineartransfer function between y1 and u is given as

H1 ( jv)=m2s4 +mcs3 +4mks2 +2cks+3k2

m3s6 +m2cs5 +6m2ks4 +4mcks3 +10mk2s2 +4ck2 +4k3. (34)

When the parameters of the system are m=1 kg, c=0.3 N/sm, k=1 N/m, the transferfunction is given by [21]

H1 ( jv)=s4 +0.3s3 +4s2 +0.6s+3

s6 +0.3s5 +6s4 +1.2s3 +10s2 +1.2s+4. (35)

The magnitude and phase response of the system are shown in the Fig. 3.Initially, a set of 100 equally spaced frequency response data was generated from

equation (35) in the frequency range 0–2 Hz and these were used to estimate the parametersof the transfer function. An overparameterised (eighth order) transfer function was initiallyspecified for the estimation to test the capability of the proposed algorithm to detect thecorrect model structure. It was found that when the weighting matrix Q was specified to

T 2

Parameters for example 1 at different levels of noise

Estimated Estimated EstimatedEstimated values values values

Candidate True values (35.38 db) (25.84 db) (21.40 db)terms values l=5.0 l=1.0 l=0·9 l=0·7

d6y/dt6 1.00 1.0033 0.9973 0.9920 1.0004d5y/dt5 0.3 0.3018 0.2984 0.2940 0.2916d4y/dt4 6.0 6.0158 5.9920 5.9756 5.9950d3y/dt3 1.2 1.1995 1.2022 1.2024 1.1997d2y/dt2 10.0 10.0185 9.9952 9.9851 9.9915dy/dt 1.2 1.2028 1.2095 1.2244 1.2268d4u/dt4 1.0 1.0032 0.9935 0.9814 0.9899d3u/dt3 0.3 0.3084 0.2969 0.2853 0.2539d2u/dt2 4.0 4.0084 3.988 3.9663 3.9762du/dt 0.6 0.6218 0.5916 0.5668 0.5154

u 3.0 3.0007 2.9987 2.9962 2.9954

283

be a unity diagonal matrix, implying that the errors are weighted uniformly over the entirefrequency range, the estimator failed to detect the correct structure from the specifiedeighth-order model. However, when the range of frequency was fixed at 0–0·5 Hz, theestimator identified both the correct model terms and parameters. When the errors areweighted uniformly the performance of the estimator strongly depends on the choice ofthe frequency range and the number of data points. However, the sensitivity of theestimator to the above factors was considerably reduced when the errors were weightednonuniformly using the parameter l. Parameter l was chosen interactively and fixed ata value of 5.0. Table 1 shows the terms that were selected as the iteration proceededtogether with the error reduction ratios. The parameters of the model are shown in Table 2.

Since in practice the frequency response data could be corrupted by noise, theperformance of the estimator was studied by adding various levels of noise to the frequencyresponse data. At any frequency response data point H'( jvk ), the noise corrupted datawere taken as

H( jvk )=H'( jvk )+ j( jvk ) (36)

where each component of the noise term is assumed to have the form

j( jvk )= jR (vk )+ jjI(vk ) (37)

jR (vk ) and jI (vk ) are zero-mean, independent noise sequences with equal variance, and

E(j( jvm )j( jvl )T)=0 [m$ l (38)

It is evident that as the noise level in the data increased, the weighting parameter l wasdecreased to obtain the correct model structure and parameters (Table 2). This is inaccordance with the fact that since the accuracy of the frequency response estimates in thehigher frequency range is poorer compared to the lower frequency range under the effectof significant measurement noise, the errors in high frequency ranges were given less weightto provide better estimates. Note that although the results given here are for specific valuesof l, simulations show that the exact value of l used is not crucial. The determination ofthe model structure and the parameter estimates are much the same for the range of valuesaround that given for l (Table 2). Thus the algorithm does not seem to be sensitive toa reasonable range of values for l.

Rather than synthesising the transfer function by generating data from the exact systemtransfer function [equation (35)], it is possible to obtain data from an estimate of the systemtransfer function. To illustrate further the effectiveness of the algorithm, the transferfunction was estimated directly from input–output data using the functions available inMATLAB. The plot of the estimated transfer function is shown in Fig. 4.

It is evident that the transfer function estimate is not accurate in the entire frequencyrange (compare Figs 3 and 4). However, 100 equally spaced frequency response data weretaken in the frequency range 0–0.37 Hz to estimate the parameters of the transfer function.Table 3 shows the terms that were selected from an eighth-order linear model withl=2, as the iteration proceeds, together with the associated error reduction ratios.Note that the estimator is capable of capturing the correct model structure. From this,it is found that the estimated parameters compare well with the true parameters(Table 4).

6. SYNTHESIS OF NON-LINEAR TRANSFER FUNCTIONS

When the system is non-linear, the traditional method of computing GFRFs involvesextending the classical cross- and power spectral density concepts to multidimensions and

0.5

200

0Frequency (Hz)

Ph

ase

angl

e (d

egre

es)

0.25

100

0

–100

0.05 0.1 0.15 0.2 0.3 0.35 0.4 0.45–200

0.5

40

0

Mag

nit

ude

(db

)

0.25

30

20

0

0.05 0.1 0.15 0.2 0.3 0.35 0.4 0.45–20

–10

10

. . . . 284

Figure 4. The gain and phase of the estimated transfer function for example 1.

using correlation or FFT-based techniques coupled with multidimensional windowing andsmoothing [8].

An alternative indirect approach is to estimate a non-linear time domain model (usuallya polynomial or rational NARMAX model) [22] from the sampled input–output data andthen to compute the GFRF of the system from the unbiased process model using therecursive algorithms proposed in [9, 10]. The advantages of the latter approach are thatonly small data sets are required and the complexities of multidimensional windowing and

T 3

Parameters of the auxiliary model of example 1 based on estimated data

Iteration Selected terms (pi ) E Parameter of auxiliary model

1 d2y/dt2 96.8824 2.00982 u 1.5117 −2.29683 d4y/dt4 0.2536 0.71854 d2u/dt2 0.8258 −2.03395 d6y/dt6 0.2137 0.45216 d4u/dt4 0.2725 −1.00097 d3u/dt3 0.0009 0.02628 du/dt 0.0013 0.15249 d5y/dt5 0.0001 −0.0009

10 dy/dt 0.0072 0.229311 d3y/dt3 0.0269 1.2091

285

T 4

Parameters of example 1 based on estimated data

Candidate terms Estimated parameters True parameters

d6y/dt6 0.9975 1.00d5y/dt5 0.2983 0.3d4y/dt4 5.9788 6.0d3y/dt3 1.2091 1.2d2y/dt2 9.9588 10.0dy/dt 1.2206 1.2d4u/dt4 0.9759 1.0d3u/dt3 0.2783 0.30d2u/dt2 3.9525 4.0du/dt 0.5717 0.6

u 3.0204 3.0

smoothing are avoided. The procedure for implementing the above procedure issummarised below.

, Fit a discrete model (often a polynomial or rational NARMAX model) to theinput–output data.

, Compute the frequency response functions, i.e. H1 ( jv1), H2 ( jv1, jv2), . . . ,Hn ( jv1, . . . , jvn ) from the estimated model.

, Estimate the parameters of the non-linear continuous time model using the newestimator replacing the frequency response functions by their estimates in equations(19)–(26).

The effectiveness of the new estimator in estimating the parameters of non-linear systemsusing the above procedure is demonstrated in the following two examples.

6.1. 2Consider the following modified Van der Pol equation:

d2ydt2 +2jvn (1− y2(t))

dydt

+v2n y(t)= u(t). (39)

The system possesses a non-linear damping behaviour and has a stable node at the originwith a domain of attraction which lies within an unstable limit cycle. For j=0.001 andvn =45p rad/s, the linear frequency response exhibits a resonant peak at a frequency of22.5 Hz. In order to estimate a discrete model of the system, the system was excited bya uniformly distributed white noise sequence with a root mean square value of 10, andinput–output data were collected by sampling at 200 Hz. A discrete NARMAX model wasestimated using structure detection, parameter estimation and model validation to yieldthe model

y(k)=1.4858y(k−1)−0.9859y(k−2)

+0.000025u(k−1)+0.0141y(k−1)y(k−1)y(k−1)

−0.0141y(k−1)y(k−1)y(k−2)+Cj (k−1)U j + j(k). (40)

where the last two terms of equation (40) represent the noise model which was includedto ensure unbiased process model coefficients. This model can now be used to computeGFRFs using the procedure of [10]. The procedure consists of computing a set of explicitalgebraic equations in a recursive framework.

. . . . 286

T 5

Parameters of the linear auxiliary model for example 2

Iteration Selected terms (pi ) E Parameter of auxiliary model

1 u 99.9788 −0.00050482 d2y/dt2 0.0208 0.0004663 dy/dt 0.0004 0.000019

To estimate the coefficients of the linear terms, 100 equally spaced frequency responsedata were generated in the frequency range 0–20.0 Hz. To demonstrate further theeffectiveness of the proposed estimator in detecting correct model structure, initially anoverparameterised model (linear continuous time model having fifth-order dynamics ofinput and output) was specified. The weighted parameter l was chosen to be 10. Theselected terms together with the associated error reduction ratios are shown in Table 5.It was found that after three iterations the sum of the error reduction ratios equaled100% indicating that these three terms captured the linear dynamics associated with thesystem.

It is evident from the model of equation (40) that the system does not possesssecond-order non-linear terms implying that the system does not have a second-ordertransfer function. Hence the continuous time model of the system will not have any seconddegree non-linear terms. To estimate the terms corresponding to third-order non-linearity,27 equally spaced frequency response data were generated in the frequency range 0–5 Hz.Initially an overparameterised model having third-order non-linear terms was specified. Itwas found that with the inclusion of a term y2 dy/dt, which contributes 99.923% in thefirst iteration; the contribution of other terms in the next iterations are insignificant.The final estimated parameters of the system are shown in Table 6 which indicates thatthe estimates are in good agreement with the true parameters of the system. Notice thatin all the examples no prior knowledge of the model structure or terms to include in themodel was assumed. The algorithm selects appropriate model terms and parameterestimates automatically from the data samples provided.

6.2. 3Consider the well-known Duffing–Holmes equation [23] given by

d2ydt2 + d

dydt

− by+ y3 =A cos (v(t)) (41)

T 6

Final parameter for example 2

Candidate terms True value Estimated value

d2y/dt2 1.0 0.9904dy/dt 2.8286 2.8213

u 1.0 0.99999y2 dy/dt 2.8286 2.8113

287

T 7

Parameters of the linear auxiliary model for example 3

Iteration Selected terms (pi ) E Parameter of auxiliary model

1 u 94.0888 0.80032 d2y/dt2 5.5058 −0.96583 dy/dt 0.4052 −0.1530

For d=0.15, b=1.0, A=0.3 and v=1.0 rad/s the system exhibits chaotic behaviourand settles to a strange attractor. The estimated discrete NARMAX model of this systemat a sampling frequency of 4.4446 Hz is given by [23].

y(k)=0.84725y(k−1)+0.35713y(k−3)−0.069431y(k−1)3

+0.01278y(k−4)+0.061319u(k−1)+0.40325y(k−2)

−0.00024349y(k−1)y(k−2)y(k−5)−0.46215y(k−5)

+0.096339u(k−3)−0.15316y(k−2)y(k−3)y(k−4)

−0.0073618y(k−1)2y(k−5)+0.071822y(k−1)y(k−3)y(k−4)

+Uj + j(t)

From the model, it is evident that the system has linear and third-order transfer functions.To estimate the parameters of the linear part of the continuous time model of the system,100 equally spaced frequency response data were generated from equation (42) in thefrequency range 0–2.0 Hz with the weighting parameter l=0.8. To show further thecapability of the proposed estimator in selecting the correct model structure, anoverparameterised linear model of fifth order was specified. It was found that after threeiterations, the sum of error reduction ratios of the first three terms selected equaled99.9998% which implies that these three terms are adequate to represent the underlyinglinear dynamics of the system (Table 7).

For the estimation of the third-order non-linearities, 27 frequency response data wereused in the frequency range of 0–0.2 Hz. An overparameterised model of third-ordernon-linearity was initially specified and it was found that in the first iteration with theselection of the term y3, the total E becomes 99.99%. The contribution of other terms insubsequent iterations were insignificant; thus indicating that the term y3 captured thenon-linear dynamics of the system. The final estimated parameters of the system comparedwell with the original system parameters (Table 8).

T 8

Final parameters for example 3

Candidate terms True value Estimated value

d2y/dt2 0.15 0.15297dy/dt 1.0 0.97946

u 1.0 0.99560y3 1.0 0.9942

. . . . 288

7. CONCLUSIONS

A new complex orthogonal estimation algorithm has been derived to estimatecontinuous time non-linear differential equation models from frequency response data. Thealgorithm includes structure detection procedures which determine the linear andnon-linear terms to be included in the estimated continuous time model. This means thatgiven frequency response estimates and no other a priori information about a system,parsimonious differential equation models can be determined without the need to computederivatives. The frequency response estimates can be determined by any procedure whichyields unbiased estimates. The effects of weighting were investigated and shown to havea significant effect on the quality of the estimates obtained. Several simulated examplesincluding both linear and non-linear systems were included to demonstrate the effectivenessof the new procedure.

ACKNOWLEDGEMENTS

AKS gratefully acknowledges the financial support provided by the CommonwealthScholarship Commission of the United Kingdom and is thankful to Board of Governors,Regional Engineering College, Rourkela, India for granting study leave. SAB gratefullyacknowledges that part of this work was supported by EPSRC grant ref. GR/J05149.

REFERENCES

1. L. L and K. G 1981 Automatica 17, 71–86. Frequency domain versus time domainmethods in system identification.

2. E. C. L 1959 IRE Transactions on Automatic Control 4, 37–43. Complex curve fitting.3. P. A. P 1970 IEEE Transactions on Automatic Control, 480–483. An improved technique

for transfer function synthesis from frequency repsonse data.4. P. J. L and G. J. R 1969 Proceedings of the IEE 126, 104–106. Sequential transfer

function synthesis from measured data.5. H. S 1984 International Journal of Control 39, 541–550. Transfer function synthesis using

frequency response data.6. A. H. Wfi 1986 International Journal of Control 43, 1413–1426. Transfer function

synthesis using frequency response data.7. A. H. Wfi 1987 International Journal of Control 43, 1413–1426. Asymptotic behaviour of

transfer function synthesis methods.8. K. I. K and E. J. P 1988 IEEE Transactions ASSP 36, 1758–1769. A digital method

of modelling quadratically nonlinear systems with a general random input.9. S. A. B and K. M. T 1989 Mechanical Systems and Signal Processing 3, 319–339.

Spectral analysis for nonlinear systems, part I. Parametric nonlinear analysis.10. J. C. P J and S. A. B 1989 International Journal of Control 50, 1925–1940. A

recursive algorithm for computing the frequency response of a class of nonlinear differenceequation models.

11. K. M. T and S. A. B 1992 International Journal of Science Systems 23, 1011–1018.An orthogonal estimation algorithm for complex number systems.

12. K. M. T and S. A. B 1992 Mechanical Systems and Signal Processing 6, 69–84.Reconstruction of linear and nonlinear continuous time models from discrete time sampled datasystems.

13. R. D 1965 Estimation Theory. Englewood Cliffs, NJ. Prentice Hall.14. S. A. B and W. S. F. V 1986 International Journal of Control 44, 235–244. Correlation

based model validity tests for nonlinear models.15. S. A. B and Q. Z 1993 International Journal of Control 59, 1439–1463. A structure

detection algorithm for nonlinear dynamic rational model.16. S. A. B and J. C. P J 1990 International Journal of Control 52, 863–879.

Mapping nonlinear integro-differential equations into the frequency domain.

289

17. S. A. B, M. J. K and S. C 1998 International Journal of System Science 9,1559–1568. Identification of nonlinear output affine systems using an orthogonal least squarealgorithm.

18. S. A. B, S. C and M. J. K 1989 International Journal of Control 49,2157–2189. Identification of MIMO nonlinear systems using a forward regressor orthogonalestimator.

19. Q. Z and S. A. B 1996 International Journal of Control 64, 871–886. Fast orthogonalidentification of nonlinear stochastic models and radial basis function neural network.

20. P. L and S. K 1986 Curve and Surface Fitting: An Introduction. New York:Academic Press.

21. D. E. N 1989 Mechanical Vibration Analysis and Computation. Longman Scientific &Technical Publications.

22. I. J. L and S. A. B 1985 International Journal of Control 41, 303–344.Input–output parametric models for non-linear systems, part I. Deterministic nonlinear systems;and part II: Stochastic nonlinear systems.

23. F. C. M 1987 Chaotic Vibrations—An Introduction for Applied Scientists and Engineers. NewYork: John Wiley.

24. L.A.A and S.A.B 1995 International Journal of Bifurcation and Chaos 5, 449–474.Retrieving dynamical invariants from chaotic data using NARMAX models.

APPENDIX A

The weighted complex orthogonal estimator is derived under the assumption that themodeling error

J(jv)=JR (v)+ jJI (v) (A1)

in equation (2) is a complex white noise sequence of zero mean and variance s2 and thematrix PTQP is positive definite.

Consider the complex system of equation (2) which after partitioning, as shown inequation (5), is represented as

Z=Pu+J (A2)

Introducing a weighting matrix Q (normally a symmetric positive definite matrix or adiagonal matrix) the matrix PTQP can be shown to be symmetric and positive definite, andhence can be expressed as

PTQP=TTDT (A3)

where D is a diagonal matrix and T is an upper triangular matrix with unity diagonalelements such that

T=GG

G

K

k

10···

0

a12

1...

a13

a23...

. . .

. . ....1

a1M

a2M···

aM−1M

1

GG

G

L

l. (A4)

Since TTT= I the model of equation (A2) can be transformed to an auxiliary modelparameterised with respect to vector g as

Z=Pu+J

=P(T−1T )u+J

=Wg+J (A5)

. . . . 290

where

W=PT−1, and g=Tu. (A6)

The properties of the matrix W are such that WTQW is orthogonal because

WTQW=(PT−1)TQ(PT−1)= (T−1)TPTQPT−1

= (TT)−1TTDTT−1 =D. (A7)

Further let

V=WTQ. (A8)

Therefore,

VW=D. (A9)

Now premultiplying WTQ and post-multiplying by T in equation (A6), leads to

WTQWT=WTQPT−1T

=WTQP. (A10)

Hence

T=(WTQW)−1WTQP

=(VW)−1VP

=D−1VP. (A11)

The regressors of the auxiliary model of equation (A5) can be obtained recursively from

W=P−W(T− I) (A12)

that is

w1 (v)= p1 (v)

wi (v)= pi (v)− si−1

k=1

aki wk (v) for kQ i (A13)

where from equations (A4) and (A11)

aki =�(wT

k (v)Q), pi (v)��(wT

k (v)Q), wk (v)�

=�vk (v), pi (v)��vk (v), wk (v)� for k=1, . . . , i−1 (A14)

and �· , ·� denotes the dot product of the vectors. The auxiliary parameter vector

g= &g1···

gM' (A15)

satisfies the equation

g=D−1WTQZ−D−1WTQJ (A16)

291

so that the estimated g is given by

g=D−1WTQZ (A17)

or

gi =�Z(v), vi (v)��vi (v), wi (v)� for i=1, . . . , M (A18)

Inspection of equation (A18) indicates that if vi (v) and hence wi (v) is not related to theoutput Z(v), the estimated parameter gi will be small, since the average value�Z(v), vi (v)� will not be significant. Hence the orthogonal parameter gi could be usedas an indicator of the significance of the terms wi (v). If the magnitude of the estimatedorthogonal parameter gi is less than a certain threshold, the associated term wi (v) shouldbe regarded as insignificant. However, the value of the threshold depends critically on theproblem under consideration. Hence it is desirable to have a structure selection criteriabased on the error reduction ratio which is derived below.

Once the parameters gi , i=1, . . . , M are estimated, the original system parameters ui ,i=1, . . . , M can easily be recovered according to the formula

u = g−(T− I)u (A19)

that is

u M = gM

u k = gk − sM

i= k+1

aki u i , for k=1, . . . , M−1. (A20)

Therefore, by using the above equations, the unknown parameters ui , i=1, . . . , M canbe estimated step by step.

The proportion of the variance of the dependent variable explained by wi (v) is

g2i �vi (v), wi (v)�

�(zT(v)Q), z(v)� . (A21)

This is because from equation (A5)

ZTQZ= gTWTQWg+JTQJ+ gTWTQJ+JTQWg

= gTVWg+JTQJ+ gTVJ+JTQWg. (A22)

The assumption on noise properties ensure that E[VJ]=0, and E[JTQW]=0; hence

�(ZT(v)Q), Z(v)�= sM

i=1

g2i �vi (v), wi (v)�+ �(J(v)Q), J(v)� (A23)

When the model is empty, i.e. no term is selected (M=0), the weighted mean squareof the magnitude of the prediction errors will achieve a maximum value of

�(J(v)Q), J(v)�=M=0 = �(ZT(v)Q), Z(v)� (A24)

. . . . 292

Once significant terms have been included, the mean square value of the prediction errorJ(v) will be reduced accordingly. The reduction resulting by including the term gi wi (v)can be expressed as a percentage reduction in the total mean square of the prediction errorsby defining

Ei =g2

i �vi (v), wi (v)��(zT(v)Q), z(v)� ×100%, i=1, . . . M. (A25)

This can be used as a selection tool to determine which terms are to be included in thefinal model. However the value of the Ei may depend upon the order in which the termpi (v) [equation (1)] enter into the equation. This problem can be overcome by using theforward regression procedure of [17] (Section 2).