Download - Matrix rational H/sup 2/ approximation: a state-space approach using Schur parameters

Matrix Rational H2 Approximation:

a state-space approach using Schur parameters

Jean-Paul Marmorat∗, Martine Olivi†, Bernard Hanzon‡, Ralf L.M. Peeters §

August 27, 2002

Abstract. This paper deals with the problem of computing a best rational L2 approximation of specifiedorder to a given multivariable transfer function. The problem is equivalently formulated as a minimizationproblem over the manifold of lossless transfer functions of fixed order. To describe this manifold, we usesome Schur parameters which allows for a state-space representation. Such a description presents a lot ofadvantages. It takes into account the stability constraint, possesses a good numerical behavior and provides amodel in state-space form, which is very useful in practice. A rigorous and convergent algorithm is proposedto compute local minima, and demonstrated through several examples, including real-data simulations.

1 Introduction

The identification of linear time-invariant systems can be formulated as a rational approximation problem inwhich some cost function is optimized over a set of systems. In this paper, we shall consider discrete-time,causal systems represented by their proper transfer functions. The cost function is the L2 norm, so that ourapproximation problem states in the space Hp×m

2 , where H2 denotes the Hardy space of the complement ofthe unit disk (see [11]):(RAP) given a transfer function F ∈ Hp×m

2 , minimize

‖F −H‖22 =

12π

Tr∫ 2π

0

[F −H](eit)[F −H](eit)∗dt ,

as H ranges over the set of rational stable (i.e. analytic for |z| > 1) functions of McMillan degree n.It was proved [2] that the global minimum of the L2 criterion exists.

First remark that if H is a solution to (RAP), we must have H(∞) = F (∞). Thus, we may restrict our studyto the case of strictly proper transfer functions. We shall denote by H2,0 the orthogonal complement of the theHardy space H2 of the unit disk, or equivalently the subspace of H2 consisting with strictly proper transfer

∗CMA, BP 93, 06902 Sophia-Antipolis Cedex, FRANCE, {marmorat}@sophia.inria.fr, phone: 33 4 92 38 79 56, fax: 33 492 38 79 98

†INRIA, BP 93, 06902 Sophia-Antipolis Cedex, FRANCE, {olivi}@sophia.inria.fr, phone: 33 4 92 38 78 77, fax: 33 4 92 3878 58

‡Dept. Econometrics, FEWEC, Vrije Universiteit, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands, Tel.: +31-20-446017, Fax: +31-20-44446020, Email: [email protected]

§Dept. Mathematics, Universiteit Maastricht, P.O. Box 616, 6200 MD Maastricht, The Netherlands, Tel.: +31-43-3883365,Fax: +31-43-3884910, Email: [email protected]

1

functions. A strictly proper transfer function can be represented by means of the Douglas–Shapiro–Shieldsfactorization (see. [5]):

H = G P, (1)

where G is a (p × p)-rational lossless (or stable allpass) function of same McMillan degree than H and Pis a (p ×m)-rational unstable matrix. The lossless factor is unique up to an right unitary factor. The setof Cp×p-valued rational lossless functions of degree n will be denoted by Lp

n, and the set of unitary p× pconstant matrices by Up. We shall say that two members G1 and G2 of Lp

n are equivalent if G1 = G2 U ,for some U ∈ Up. The associated quotient space will be denoted by Lp

n/Up. Note that the function H in (1)belongs to the set H(G), the orthogonal complement of GHp×m

2 into Hp×m2,0 . Since H(G) is a vector space,

(RAP) is equivalent to the minimization of

Ψn : Lpn/Up → R

G → ‖F − πn(G)‖22,

(2)

where πn(G) denotes the projection of F onto H(G). Of course, a Douglas-Shapiro-Shields factorizationwith the lossless factor on the right and an associated quotient space also exist. Factorization (1) will beused when p ≤ m, while when p > m we preferably use its right analogous. It was proved in [1] that Lp

n andits quotient Lp

n/Upn are smooth manifolds of dimension 2np + p2 and 2np respectively, so that we now deal

with a minimization problem over a manifold.

In the scalar case, the manifold Lpn/Up is trivial (it is an open subset of a Euclidean space) and an algorithm

to find the local minima of (2) is described in [3]. In [7], these results are extended to the multivariablecase. The main difficulty was to find a nice parametrization for the manifold Lp

n/Upn. It was done by

means of Schur parameters, provided by a tangential Schur algorithm[1], and used as local coordinates. Theoptimization problem is then tackled by using a gradient algorithm through the manifold as a whole, usingthe coordinate maps to describe the manifold locally and changing from one coordinate map to another whenrequired. It must be noted that in this approach, the stability constraint is directly taken into account by theparametrization.

In this paper, we follow the same approach but we make a different choice in the tangential Schur algorithmwhich provides a state-space representation for the lossless functions [14]. This parametrization allows toexpress the criterion directly in state-space form and presents a very nice numerical behavior. It combinesthe technical advantages of the Schur parametrization to the practical ones of the state-space description. Agradient type algorithm was implemented in matlab and some results are presented here, including real-datasimulations coming from an hyperfrequency filter.

The natural framework for our study is the complex case, that is the case of functions whose Fouriercoefficients are complex. The real case, which is relevant in most applications, can be handle by this method(see Examples 1 and 2). However a specific treatment will be relevant and is under study.

2 Parametrization of lossless functions

Let G be a lossless function of degree n. Such a function admits a minimal realization

G(z) = C(zIn −A)−1B +D,

such that the associated realization matrix

R =[D CB A

]

2

is unitary (such a realization is balanced with both Gramians equal to identity).

The following points (see [7] and [14]) are essential for our purpose. Let

G(1/w)u = v, (3)

w ∈ C, |w| < 1, u ∈ Cp, ‖u‖ = 1 and v ∈ Cp, ‖v‖ ≤ 1, be some interpolation condition:

(i) G(z) and w being given, we can always find some direction u such that v given by (3) has norm strictlyless than 1.

(ii) in that case (‖v‖ < 1), a p× p block-matrix function

Θ(z) =(

Θ11(z) Θ12(z)Θ21(z) Θ22(z)

)(4)

can be defined, depending on w, u, v, such that the function G(z) can be represented by the linear fractionaltransformation

G = (Θ4Gn−1 + Θ3) (Θ2Gn−1 + Θ1)−1, (5)

for some lossless function Gn−1(z) of degree n − 1. The (J-unitary) matrix function Θ(z) is uniquelydetermined up to a constant (J-unitary) right factor H.

(iii) for a particular choice of the factor H in Θ(z), a very simple transformation on realizations correspondsto the linear fractional transformation (5): let Rn−1 be a unitary realization matrix of Gn−1(z), then aunitary realization matrix R of G(z) is given by

Rn =[V 00 In

] [1 00 Rn−1

] [U∗ 00 In

], (6)

where U and V are unitary (p+ 1)× (p+ 1) complex matrices depending on u, v, w as follows

U =[ξu Ip − (1 + wη)uu∗

wη ξu∗

], V =

[ξv Ip − (1− η) vv∗

‖v‖2

η −ξv∗

],

with

ξ =

√1− |w|2√

1− |w|2‖v‖2, η =

√1− ‖v‖2√

1− |w|2‖v‖2.

The tangential Schur algorithm consists in repeating this process, and thus provides a sequence of losslessfunctions Gk(z) of degree k , satisfying the interpolation condition

Gk(1/wk)uk = vk, ‖vk‖ < 1,

untilG0 which is a constant unitary matrix. As in the scalar case, the interpolation values vn = v, vn−1, . . . , v1,can be taken as parameters, together with the unitary matrix G0. But in the matrix case, they only describean open subset of the manifold, to be precise, associated with the sequences w = (wn = w,wn−1, . . . , w1),u = (un = u, un−1, . . . , u1), of interpolation points and interpolation directions, the set

V(w,u) = {G ∈ Lpn / ‖Gk(1/wk)uk‖ < 1, },

endowed with the coordinate map

ϕ(w,u) : G→ (v1, v2, . . . , vn, G0).

3

The family (V(w,u), ϕ(w,u)) defines a C∞ atlas on Lpn. An atlas of the quotient space is obtained by fixing

the last unitary value G0 in each chart (V(w,u,D), ϕ(w,u,D)), where

V(w,u,D) = {G ∈ Lpn / ‖Gk(1/wk)uk‖ < 1, G0 = D},

ϕ(w,u,D) : G→ v = (v1, v2, . . . , vn).

In such a chart, a state-space representation of the current matrix G can be computed by iterating formula(6) which presents a very nice numerical behavior since it only involves multiplications by unitary matrices.

3 State-space formulas for the criterion.

Let F be the transfer function of a given p×m system of order N . It is well-known (see [1]) that the manifoldLp

n/Up is diffeomorphic to the set Obs−p (n) of observable pairs (C,A) for which the spectrum of A is in theopen unit disk, quotiented by the usual equivalence relation.

We shall compute the L2 criterion Ψn(G), where

G(z) = C(zIn −A)−1B +D,

from the observable pair (C,A). First remark that the projection H = πn(G) must have realization of theform

H(z) = C(zIN −A)−1β. (7)

(1) The function F is given by a realization

F (z) = C(zIN −A)−1B. (8)

Then, F −H has realization

Ag =[A 00 A

],

Bg =[

B−β

], Cg =

[C C

].

Let Wg be the solution to the Lyapunov equation

A∗gWgAg + C∗gCg = Wg.

It is well-known that the L2 norm is given by

‖F −H‖22 = Tr

(B∗gWgBg

).

Let Wg be partitioned accordingly to the partitioning of Ag:

Wg =[W11 W12

W21 In

](the observability gramian W2,2 of the pair (C,A) being the identity matrix). It can be block-diagonalizedas follows

Wg =[IN W12

0 In

]∆

[IN 0W21 In

],

with

∆ =[W11 −W12W21 0

0 In

]

4

The pair (C,A) being given, the error norm is thus minimal for

β = W21B (9)

and the criterion given byΨn(C,A) = Tr (B∗W11B)− Tr (β∗β) . (10)

(2) The function F is given by its Fourier coefficients up to some order K:

F (z) =K∑

k=1

Fkz−k.

Assume that F has finite order and realization (A,B, C), then the computation in (1) is valid and Fk =CAk−1B. Moreover W21 is given by

W21 =∑k≥0

(A∗)kC∗CAk,

and (9) can be rewrittenB =

∑k≥0

(A∗)kC∗Fk+1, (11)

so that

Ψn(C,A) =K∑

k=1

Tr(F ∗kFk)− Tr (β∗β) . (12)

These expressions will be used when p ≤ m. If p > m, as previously mentioned, we shall replace the Douglas-Shapiro-Shields factorization (1) by its right analogous and work with the reachable pair (A,B) instead ofthe observable pair (C,A).

4 The algorithm

The algorithm receives as inputs

• The unknown transfer function F which can be given in one of the forms

(i) a realization (A,B, C,D) at order N .

(ii) the K + 1 first Fourier coefficients F0, F1, F2, . . . , FK .

• the degree n of the approximant

• an initial lossless function of degree n given by some unitary realization (may be supplied by a randomprocess).

It proceeds as follows: an adapted chart for the given initial lossless functions is constructed (see A.). Aminimization process is started which minimizes the local criterion ψn(v1, . . . , vn), computed as explained insection 3 and 2, and subject to the constraints ‖vi‖ < 1, i = 1 . . . n. If the norm of some Schur parameterstends to 1 (see B.), either the chart is no more available and we must find a new adapted chart, or we havereached a boundary point of the manifold, of degree less that the current degree,say k. This case is unusual,and if it does not happen, at the end a local minima is reached.

A. Choice of an adapted chart.The point is how, a lossless function being given by some unitary realization, could we find an adapted

5

chart? The present version of our algorithm uses a selection strategy which only works in the complex case.Observe that if we choose a Schur parameter equal to zero, the recursion (6) becomes D

√1− |w|2Du C√

1− |w|2u∗ w 0B(Ip − (1 + w)uu∗)

√1− |w|2Bu A

where D = D(Ip − (1 + w)uu∗). This observation suggests the following strategy which leads to a chart inwhich all the Schur parameters are all equal to the null vector and corresponds to a Potapov factorization[15]. Starting from any balanced realization of G(z) ∈ Lp

n, compute a new realization (An, Bn, Cn, Dn), stillbalanced but in which An is lower triangular. Let

An =[wn 0 · · · 0βn An−1

], Bn =

[bn...

],

Cn =[cn Cn−1

],

where wn is a complex number, bn a row vector of size p, and cn a column vector of size p. Put

un = b∗n/‖bn‖,

and

Bn−1 = Bn +1 + wn

1− |wn|2βnb

∗n,

Dn−1 = Dn +1 + wn

1− |wn|2cnb

∗n.

Repeating this process yields a sequence of complex points of the open unit disk (wn, . . . , w1), a sequenceof unit vectors (un, . . . , u1) and a unitary matrix D0 that index a chart in which G has Schur parametersvn = . . . = v1 = 0.

B. The norm of some Schur parameters tends to 1.First remark that if the vector v as norm 1, formula (6) keeps sense, and applied to some balanced realization(A,B,C,D) gives the realization matrix D (Ip − vv∗)Du (Ip − vv∗)C

−v∗D(Ip − uu∗) −v∗Du −v∗CB(Ip − uu∗) Bu A

where D = vu∗ + (Ip − vv∗)D(Ip − uu∗). Assume that ‖vi‖ = 1, then,- either Ai+1 has some eigenvalue of norm one, the realization is not minimal and the lossless functionG(v1, . . . , vn) has degree less than the current degree, say degree k. This point G(v1, . . . , vn) belongs to theboundary of the manifold. In that case, we stop the procedure.- either all the eigenvalues of Ai+1 have norm less than one and we must change chart before restarting theminimization procedure.

The main drawback of the L2 norm is that it possesses many local minima while our final goal is to findthe global one. It may be necessary to run the procedure many times before we reach it. Some strategiesare under study that could warrant a rapid convergence to the global minimum. Some iterative search onthe degree is being investigated, as in (see [7]) where a minimum at order k provides a starting point forthe minimization procedure at degree k + 1, which ensure that the criterion will be improved as the degreeincrease. This also allows to restart the minimization process when a boundary point of degree less than thecurrent degree is reached.

6

5 Application to continuous-time systems

This approximation procedure applies to continuous-time systems by means of an isometry between themand discrete-time systems. Continuous-time systems that we consider are those whose transfer functionsbelong to the Hardy space H2

p×m of the right-half plane endowed with the L2-norm

‖F‖c =12π

Tr∫ ∞

−∞F (iy)F (iy)∗ dy. (13)

Then

F (z) =1

z − 1F

(z + 1z − 1

), (14)

belongs to Hp×m2,0 and we have

‖F‖2c = 2‖F‖2

2.

If F (s) = C(sIn − A)−1B is a realization of F , then F (z) = C(zIn −A)−1B, where

C = −C(A− In)−1,

A = (A+ In)(A− In)−1,

B = B,

(15)

is a realization of F (z). This process can be inverted proving that the isometry also preserves McMillandegree.

6 Examples

The software matlab was used for the implementation. The function fmincon is used in the optimizationstage and the function schur in the construction of an adapted chart. The two first examples are classicalexamples from the literature. They are in continuous-time and have real coefficients. Even if our procedureis ignorant of this information, starting from a real initial system we could found almost real (imaginarypart less than 10−3) approximants and thus compare our results with previous ones. Example 4 is a realdiscrete-time system whose best approximant is complex. Real-data simulations are presented in example 3.

Example 1. This is an example in scalar case and continuous time. It was studied, among others, in [13].It is given by a state-space representation

a =

0 0 0 −1501 0 0 −2450 1 0 −1130 0 1 −19

, b =

4100

,c =

[0 0 0 1

], d = 0.

The results are summarized in the following table and are in accordance with that of [13].

Deg Relative Error1 4.268250 10−1

2 3.929044 10−2

3 1.304723 10−3

7

Example 2. We consider a MIMO example, the automobile gas turbine model with 2 inputs, 2 outputs,and 12 states from [12, p.168], [8] and [16].

Figure 1: Nyquist diagram of the automobile gas turbine model and its order 6 approximant.

The results are summarized in the following table and they slightly improve that of [16] until degree 6 andprovides a very good approximant at degree 8. The order 6 approximant is plotted on figure 1.

Deg Relative Error Relative Error in [16]4 0.13481 0.13545 0.07799 0.07956 0.05248 0.05418 8.64 10−4 -

Example 3. These data are obtained by completion by the software Hyperion, also developed at INRIA,

Figure 2: Nyquist diagram: CNES data and approximant at order 8.

Figure 3: Bode diagram: CNES data and approximant at order 8.

from experimental pointwise frequency values of an hyperfrequency filter, provided by the CNES (frenchspace agency )[4]. The data consist in the 800 first Fourier coefficients of the matrix transfer function whichrepresents the filter. The filter is 2-inputs and 2-outputs, and is relatively close to an ideal filter of order 8.The results are plotted on figures 2 and 3.

8

Example 4. This example shows that a ”real” system may have a complex global minimum. It is thediscrete-time system whose transfer function

f(z) =1− z2

z3,

has realization

a =

0 1 00 0 10 0 0

, b =

−101

,c =

[1 0 0

], d = 0.

Figure 4: Nyquist diagram of example 4, one of its complex approximants (at right) and its real approximant(at left).

The L2 criterion at order 1 admits tree minima: a real and two complex ones symmetric with respect to thereal axis (see figure 4). It happens that the relative error at the real minimum is 0.7071068 while at bothcomplex ones it is 0.6382847, so that there are two complex best approximants in that case.

References

[1] D. Alpay, L. Baratchart, and A. Gombani, On the differential structure of matrix-valued rationalinner functions, Operator Theory : Advances and Applications, 73 (1994), pp. 30–66.

[2] L. Baratchart, Existence and generic properties for L2 approximants of linear systems, I.M.A. Journalof Math. Control and Identification, 3 (1986), pp. 89–101.

[3] L. Baratchart, M. Cardelli, and M. Olivi, Identification and rational L2 approximation : agradient algorithm, Automatica, 27, No.2 (1991), pp. 413–418.

[4] L. Baratchart, J. Grimm, J. Leblond, M. Olivi, F. Seyfert, and F. Wielonsky, Identificationd’un filtre hyperfrequence. Rapport Technique INRIA No 219.

[5] R. Douglas, H. Shapiro, and A. Shields, Cyclic vectors and invariant subspaces for the backwardshift operator, Annales de l’Institut Fourier (Grenoble), 20 (1970), pp. 37–76.

[6] H. Dym, J-contractive matrix functions, reproducing kernel spaces and interpolation, CBMS lecturenotes, vol. 71, American mathematical society, Rhodes island, 1989.

[7] P. Fulcheri and M. Olivi, Matrix Rational H2-Approximation: a Gradient Algorithm Based onSchur Analysis, SIAM J. Contr. and Opt., 36 (1998), pp. 2103–2127.

[8] K. Glover, All optimal Hankel-norm approximations of linear multivariable systems and their L∞-error bounds,Int. J. Control 39, 1115-1193 (1984).

[9] B. Hanzon and R.L.M. Peeters, Balanced Parametrizations of Stable SISO All-Pass Systems inDiscrete-Time, MCSS, 13, pp. 240–276, (2000).

[10] B. Hanzon and R.J. Ober, Overlapping block-balanced canonical forms for various classes of linearsystems, Lin. Alg. and its Appl., 281 (1998), pp. 171–225.

9

[11] K. Hoffman, Banach spaces of analytic functions, Dover, 1988.

[12] Y.S. Hung and A.G.J. MacFarlane, Multivariable feedback: a quasi-classical approach Lect. Notesin Control Sci. 40, Springer- Verlag (1982).

[13] D. Hyland and D. Bernstein, The optimal projection equations for model reduction and the re-lationships among the methods of Wilson, Skelton, and Moore, IEEE Trans. Autom. Control AC-30,1201-1211 (1985).

[14] R.L.M. Peeters, B. Hanzon and M. Olivi, Balanced realizations of discrete-time stable all-passsystems and the tangential Schur algorithm, in Proceedings of the ECC99, Karlsruhe, Germany, August31-September 3.

[15] V.P. Potapov, The multiplicative structure of J-contractive matrix functions, Amer. Math. Soc. Transl.(2), 15, 131–243, 1960.

[16] W.-Y. Yan and J. Lam, An approximate approach to H2 optimal model reduction, IEEE Trans.Autom. Control 44, No.7, 1341-1358 (1999).

10