A self-validating numerical method for the matrix exponential

Computing 43, 59-72 (1989) Computing E. by Springer-Verlag 1989

A Self-Validating Numerica l Method for the Matr ix Exponential*

Pavel Bochev and Svetoslav Markov, Sofia

Received October 12, 1988; revised April 17, 1989

Abstract - - Zusammenfassung

A Self-Validating Numerical Method fur the Matrix Exponential. An algorithm is presented, which produces highly accurate and automatically verified bounds for the matrix exponential function. Our computational approach involves iterative defect correction, interval analysis and advanced computer arithmetic. The algorithm presented is based on the "scaling and squaring" scheme, utilizing Pad6 approximations and safe error monitoring. A PASCAL-SC program is reported and numerical results are discussed.

AMS Subject Classification: 65G10, 65L05, 65F30, 41A21

Key words: matrix exponential, Pad6 approximations, iterative defect correction

Eine selbstverifizierende numerische Methode fuer die Exponentialfunktion einer Matrix. Es wird ein Algorithmus vorgestellt, der hochgenaue und automatisch verifizierte Grenzen fuer die Exponential- funktion einer Matrix liefert. Unser Verfahren benuetzt iterative Defektkorrektur, Intervall-Analysis und eine erweiterte Rechnerarithmetik. Der dargestellte Algorithmus basiert auf dem "scaling and squaring" Schema und benutzt Padr-Approximationen und safe-error-monitoring. Es werden ein PASCAL-SC Programm vorgestellt und numerische Resultate diskutiert.

1. Introduction

In this paper we consider the problem of the numerical computation of the matrix

function exp(A)= ~ Ak/k!, where A is a real n • n matrix. A comprehensive k=O

review of the existing methods is given by Moler and van Loan in [7-1. However, the computer implementation of these methods with the usual floating-point arithmetic can be unreliable and may lead to erroneous results. We propose a numerical method built upon the new computational methodology I-5, 6-1 involving a strictly defined (referred as "advanced") computer arithmetic [5], interval analysis tech- niques I-8] and the iterative defect correction principle I-3]. This methodology allows us to formulate a reliable and highly accurate numerical algorithm for the computation of exp(A), delivering self-validated results. This means, that the computer automatically proves the inclusion of the true result of the specified problem within

* The present research is partially supported by the Committee of Science, Sofia, Bulgaria, according to contract No. 755/87

60 Pavel Bochev and Svetoslav Markov

the computed bounds. Our algorithm can be realized in any environment sup- porting advanced computer arithmetic, e.g. PASCAL-SC [9], H I F I C O M P [4], ACRITH [1].

The paper is organised as follows. The rest of this section concentrates on some necessary notations from interval analysis and advanced computer arithmetic. In Section 2 we give an insight of how Pad6 approximations work in our algorithm. Then the algorithm, consisting of 6 basic steps is outlined and the first two of them are discussed. In Section 3 we develop and present a procedure for verified computation of the value of a matrix polynomial which corresponds to the fourth step of the algorithm. The next section covers the fifth and sixth steps of the algorithm. We consider solving of a block linear system with coefficients represented in "staggered correction format" [11] and multiplication of matrices with such elements. In Section 5 we present estimates for the rate of convergence of the procedure corresponding to step 4 and for bounds widening by interval matrix multiplications. The section completes with a discussion of the third step of the algorithm based upon the last estimate. The last, sixth section is devoted to the numerical results obtained by PASCAL-SC implementation of our algorithm.

We denote the spaces of all real numbers, n-dimensional vectors and n • n matrices with R, VR and MR respectively and let T ~ {R, V•, MR}. Operations in -[ will be denoted by s e.g. Q = { + , - , x , /} for R; s = { + , - , .} for VR and MR. We consider also the power set PT with operations A �9 B := {a �9 bla ~ A, b ~ B} for A, B E IZT; * ~ s where s corresponds to T. If < is an order relation in T, then IT = {A = [_A,A]IA, A_ ~ T;A_ < ~,} is the set of all intervals over T. For every A e IT we define the elements o f t w(A) := (A - _A)/2 and re(A) := (_A + A)/2 called respectively width and midpoint of A.

I f~ is the floating-point system [5, p. 150] with a base ~ and a mantissa length l, i.e.

= {0} u {x = *m'del* a { + , - } , ~ a N,~ > 1, e ~ Z ,

l m = 2 Xi~-i , xi ~ {0 . . . . . ~ - - 1}, x I ~;~ 0}

i=1

we shall consider the sets $, ~/~ and M~ as the corresponding floating-point analogues of R, V ~ and ~ . Similary, if U e {~, VS, MS} corresponds to T { R, V R, ~ R} we shall consider the set 1U of all intervals over 15 as a floating-point analogue of IT.

In order to equip 1 �9 with arithmetic operations we first define arithmetic operations in the set IT. Let O: PT ~ IT be a rounding operator with domain IT i.e. O is monotone projection w.r.t the inclusion relation _ (this means O(O(X)) = O(X) and X _~ Y implies O(X) ___ O(Y) for X, Y ~ PT). Then, according to [5] interval arithmetic operations in IT are defined as follows:

VX, Y~ IY X | Y : = O ( X . Y) for �9 ~ s (1)

Consider now the rounding operator ~ : IT --+ I U defined in IT. Computer arithmetic operations in I U are defined by

A Self-Validating Numerical Method for the Matrix Exponential 61

VX, YeDU X ~ Y : = ~ ( X | f o r , e ~2 (2)

Definition (2) provides the best possible inclusion for the result of all operations in f2; e.g. for dot products in VS or UVS etc.

In this work we shall use mainly the rounding ~ and operations (2) and occasionally the rounding to the nearest [] (see [5]).

It is sound to consider best possible inclusions not only for binary operations as in (2) but in the following situation as well: let X1, . . . , X n be some floating-point data set and F E ql- (F ~ U-C) be the true result of a well defined function (operation) F(XI . . . . . X,,) of this data set. Then the best possible inclusion of F in the corresponding set U (or ~U) is O F which, following [11] we shall call ideal result in q3 (or I U). The ideal result always refers to a particular floating-point system S. The purpose of this work is to formulate an algorithm which will provide this best possible inclusion for the matrix function exp(A).

2. A General Description of the Algorithm

Our algorithm for self-validating and highly accurate numerical computation of the matrix exponential is based on the "scaling and squaring" approach [7] realized with the Pad6 approximations and safe error monitoring using interval analysis. This choice is motivated by the fact that the "scaling and squaring" is a well established computational scheme [7, 12] and employing the advanced computer arithmetic we can overcome the usual obstacles arising at the "squaring" step. Pad6 approximations are much more competitive in accuracy then the power series method [7] and in addition allows effective treatment within the frames of the new computational approach.

To begin with let us recall that the (p, q)-Pad6 approximation to exp(A) is defined by [7]:

Rpq(A) : : [Dpq(A)]-i " Npq(A), (3)

(p + q - j)! q! 1 Dpq(A) j=o -(p + q).; (q - j)! f l "(-A)~' (4)

Npq(A) = ~ (p + q - j)! P! 1. At" j=o (P+q) ! " ( P - j ) ! ' j ~ (5)

We assume that A ~ ~ S . If [[A[] > 1 then the input matrix is multiplied ("scaled") with a factor ~-k such that HA'd-k[[ < 1 where d is the base of the floating- point system S chosen for computations. From the remainder theorem for Padr- approximants [7] we have

R,q(A) = exp(A) - - Opq(A)-l'Spq, (6)

Ap+q +1 1 Spq = ( - 1 ) q (p + q)! "! e(1-u)auP(1 -- u)qdu.


From (3) and (6) it follows the equality

D,q(A)-I"(Nm(A) + Spa ) = exp(A)

and thus

Dpq(A)'exp(A) = (Npa(A) + Spa ) (7)

Considering (7) as a linear system with an exact solution exp(A), we can compute the desired bounds for the scaled exponential using iterative defect correction. The matrices Dpq(A), Npa(A ) and Spa from (7) cannot be computed exactly in general. In practice, in order to obtain safe bounds for the exponential, we have to solve an interval version of (7) where Dpq(A), Npq(A) and Spq are replaced by their inclusions. When the inclusion for the scaled exponential is found it has to be "squared", i.e. to be raised to the dkth power. Since we deal with an interval matrix this process is accompanied with a widening of the result at each multiplication (cf. estimates in sec. 5). Hence, in order to retrieve the ideal result for the original matrix exponential, we should require more accurate bounds, depending on rig, for the scaled one. This will restrict the propagation of the blowing effect beyond the necessary limits.

Accuracy of the bounds for the scaled exponential is controlled by means of the bounds for the remainder term Spq. Thus, we must first choose appropriate values for p and q, that is to bound Svq within the accuracy required by the particular scaling factor ~k. Then bounds for Dpq(A) and Npa(A ) should be computed with an accuracy consistent with that of the bounds for Spq.

The main steps of the algorithm are as follows:

1. if IIA]I < 1 then k := 0 else determine k > 0: II~-k'all < 1; A := ~-k'A

2. compute rough inclusion U e DMN for the set

= {exp(At)lt E [0, 13}

3. select values for p and q depending on ~k

compute (-- 1) a P! q! A v+a+a" U (p + q + 1)! (p + q)l

in interval arithmetic; set the value to S E 0MS

4. Compute the sets of matrices

D=(Do,...,Dr-I,Dr); D i ~ M N f o r O < i < r ; DreHMN

N = (N O . . . . . Nt_~,Nt); N~ ~ MS for 0 < i < t; N v ~ DMS

t

with the property: Ovq(A ) ~ ~, Di; Nea(A ) ~ ~ N~ i = 0 i = 0

5. compute the set of matrices

E = (Eo,. . . , Es_l, E,); Ei~ ~ S f o r O _ < i < s; E, ~ BMS

A Self-Validating Nume r i c a l M e t h o d for the Ma t r ix Exponent ia l 63

w i t h t h e p r ~ 1 7 6 i=o

else EXP_BOUNDS := ~ E i i=O

The self-validating property of the algorithm is a result of verified computation of the bounds for (4) and (5), self-validated solving of (7) and bounding of the remainder term. Realisation of the first step is obvious so we begin the discussion of the algorithm with the second step, assuming that [[ A [[ < 1.

Recall that exp(At) is a fixed point of the operator t

P(X)(t) = I + [. A'X(s)ds. 0

Computation of the matrix U in step 2 with the aid of interval analysis technique is based upon this fact: We define the interval extension P of P for continuous interval valued matrix functions X: [0, 1] - . ~ ; X(t) = [X_ (t), X(t)], X_(t) _< X(t) Vt ~ [0, 1] as follows:

e(x)(t) := I (~ i A (D X(s)ds, 0

Assume now that we have found an interval matrix U E 1 ~ with the properties P(U)(t) a U Vt E [0, 1] and sup w(P(X(t))) < 7 sup w(X(t)); ~ < 1 for X(t) _ U.

t 1

Then, according to Theorem 5.7 in [8], U will contain the set ~~ It is convenient to take U in the form U := I Q [ - 6 , 5 ] | "[, where ~i = 1, since if 5 > (max lA~i])/ (1 - IIAII) then P(U) c U, provided HAIl < 1. Let now X(t) _ U and denote by IA] the matrix ]Alli = IAiil. Then

t

sup w(P(X(t)) = sup ~ ]A] "w(X(s))ds <_ ][A]] "sup w(X(t)), where I]A]I < 1 t t 0 t

The explicit bound for 3 in practice will provide a rather rough bound for the set ~ and in step 2 we proceed in a different manner. First, note that if I ~ [0, 1] ,~/A v~/U ~ U, then also P(U)(t) ~ U, Vt ~ [0, 1]. The last follows from the inclusion relation P(U)(t) _~ I Q [0, 1] Q A Q U __ I ~ [0, 1] ~ A @ U c U. Then a narrower matrix U can be found by the following iteration scheme

1. initialisation: set values for EPS > 0; H > 0

2. 5 := EPS;

3. repeat

3 := 6[]H

until I S [0,1] A U = U


This procedure is finite due to the existence of an apriori value for ,~ for which the inclusion relation holds true. The parameters EPS and H can be used to control the overestimation/time ratio. In practice, a good choice is E PS = 0.1 and l < H < 2 .

The second part of step 3 is trivial. However, the selection of p and q which will guarantee ideal result is in no way an easy task. We shall return to this problem in section 5.

3. Computation of Dpq(A) and Npq(A)

In this section the fourth step of the algorithm, that is the computation of self- validated, highly accurate bounds for Dvq(A) and Nw(A ) is considered in some detail. Denote

(n - j + 1) 1 Gn,(H) = ~ ~j 'HJ; ~o : : 1, % : : %-1 (/I + r -- j + 1) j ' (8)

j=O

then Dpq(A) = G~p(--A); Npq(A) = Gpq(A). Hence, it suffices to consider only the case of computation of accurate bounds for Gn,(H). Generalising the ideas from [ 10] for matrix polynomials we shall evaluate (8) by solving the block linear system

[ , 0 . 01 - H ~ 1 I "'" 0 /

HT( = B, H = . . . . . . . . . . . . . . . . . . . . . . . . ,

I k o o -Hoo I]

f ( j + 1) 1 oj ( r + j + 1) ( n - j ) j = O . . . n - 1 = (9)

1 j = n

B = ( I ' 6 o , I ' o I . . . . . I 'on)r ; ~( = ( X o , . . . , x , ) T ; X i ~ ~r

If X* is the exact solution of (9), then X* is the value of (8) we seek. This exact solution is given by

~* -- H-l" B (10)

or explicitly: X~' = 1"%; X* = I-oj + XT_ 1 �9 H-@ j = 1 . . . . . n. Although ~ - t can be easily written down explicitly, the computer arithmetic realisation of (10) will not produce ~*, but a certain approximation N ~ to ~*. In order to improve N ~ we shall apply the iterative defect correction principle [3] as follows:

Mo := E . B ; ~ ~ N -~

d(M j) := B - H.XJ

A(X j) := ~ . d ( ~ j)

~ j + l : = ~ j + A(~ j )

(initial approximation);

(defect of the current approximation);

(correction to the current approximation);

(new approximation).

(11)


The approximate inverse R, which appeares in (11), will be considered in the sequel as a result of roundoff error accumulation during the computations to the actual inverse H-1. Below we present the procedure for evaluating the bounds for G.r(H) according to (11), realized with advanced computer arithmetic:

Accurate Matrix Polynomial Evaluation (AMPE)

(1). k := 0;X~ := I <~ s0; for j := 1 to n do

x ~ := 0 ( I + X _I.H) )sj (2). repeat

X k := E]m(X~); G k := X~; k = k + l ;

I . s o - 2 X ; 1=0

for j := 1 to n do i f j < n then dl := (j + 1); d2 := (r + j + 1)(n - j )

else d 1 := 1; d 2 := 1; ( k-~ k-1 )

k . _ l . d 2 I-a + E xJ_ .u.al /=0 /=0

P := ~ \ l=o G, + X k ; Gk := X~

OK := w(X k) < w(S) until OK or k > LIMITER

(3). if OK then (P ~ G.r(H) and w(X k) < w(S)) else (P ~ G.,(H))

For convenience, the correction zI(• k) is denoted above by X k = (Xo,k x~k . . . . , x~)k and the interval bounds for G,,(H) by P. We note that the computation of the defect d(X k) and of the correction X k is combined in one long scalar product in order to avoid the intermediate rounding of d(Xk).

Let us now turn for a moment to the algorithm we have outlined in Section 2. The verified bounds for exp(A) appear as a result of the following sequence of related problems:

- computation of bounds for Dpq(A) and Npq(A), - computation of bounds for the exponential of the scaled matrix, - computation of bounds for the exponential of the original matrix (steps 4, 5 and 6 of the algorithm).

But, even if the ideal result for each of these individual problems is feasible, this does not automatically provide the ideal final result. This effect, pointed out by Stetter in [11], is unavoidable if "the information about the dependence on the primary data is lost on the way" [11]. An effective counteraction we shall undertake is the appropriate coupling of the problems in the sequence by organising the data types and their transfer in such a manner, that the producing of the ideal final result becomes possible. Generally speaking we shall treat the sequence as a single task, not as a sequence of separate tasks.


Consider now again the AMPE-procedure. By using the rounding operator Q on each iteration we get safe bounds for the exact solution X* of the system (9) including its last, n *h component which corresponds to the value of the polynomial (8): G,r(H ) = X* ~ P; P ~ 1 1 5 . From the error estimate presented in section 5 we can expect, that on each iteration we gain not less than 3 correct decimal digits for the value of G,r(H). Assume for convenience that ~ = 10 and thus, after approximately [//3] iterations P will represent an ideal result for G,r(H). As we have noted above, this does not assure in general the same accuracy for the final result. On the other side, continuing the iterations we obtain new corrections X, k which do not contribute any more to the evaluation of P, but still contain information about the value of G,,(H). In order to make use of all this information, corrections obtained on each iteration are stored separately. More precisely, the output of the AMPE-procedure (i.e. the input for the next step) consists not only of P, but also of the set (Go, G1,...,G=), G i ff 1 / 5 , 0 < i < m, G= ~ 1 1 5 with the property

G,,(H) ~ ~ Gi, where m is the number of iterations performed. Thus, first we profit i=0

by using all collected data for the computation of the new correction and second, by removing the limitations for the accuracy caused by the finite mantissa length. We shall call this way of representing the bounds for G,,r(H) STC-format, STaggered Correction representation, proposed in [11]. The number of the STC-format components will be called length of the STC-format. This representation can be introduced also in other floating-point sets and is equivalent to a dynamic precision within the fixed floating-point system 5 [11].

To complete with AMPE, let us discuss in brief its termination criteria. In order to make the accuracies of the bounds for Opq(A) and Nvq(A) consistent with the accuracy of the bound S iterations are carried out until the width of the new iterate X / becomes not greater then the width of S. That means we wish to bound the value of (8) with the relative accuracy of 8 = Iw(S)[I/IlX*I. In the worst case we can expect to achieve this accuracy after Ilog 81/3 iterations (cf. section 5) and therefore AMPE is a finite procedure. Since ~ = Hw(S) ll/llX ~ ii -< 6", I log ~1/3 can be used as a limiter for the number of iterations. Also the length of the STC format for the value of (8) will be not greater then [llog ~[/l] + 1.

4. Evaluation of Safe and Accurate Bounds for the Matrix Exponential

This task is a sequence of problems, corresponding to steps 5 and 6 of the algorithm. In step 5 we implement again the idea of representing the output in STC-format. This is achieved by solving of an interval version of (7) using iterative defect correction. Let, as in section 2, D = (Do, DI . . . . . Or_l, D,) and N = (No, N1 . . . . , N~_I, Nt) denote the STC-formats for the values of Dpq(A) and Npq(A) obtained at step 4 and let S ~ I M5 be the interval matrix computed on step 3. Then the inclusion relation


holds true. Note that originally S has been delivered independently from N, but the above expression suggests very naturally to consider it as a part of N by setting Nt+l := S. This, however, changes in principle the type of information stored in D and N: from a data providing highly accurate bounds for a rational approximation to exp(A) it becomes an information about a highly accurate and safe inclusion for exp(A). We obtain this inclusion in the form of STC-format by solving the following linear system:

System (12) is a linear system with matrix and right hand side coefficients represented in STC-format. An algorithm for solving such a system has been proposed by Auzinger and Stetter in [2]. Below we formulate another procedure for the same purpose.

Exponent Safe Bound Evaluation (ESBE)

solve D" X = N using high accuracy routine; Eo := computed solution; WIDTH := (w(Np)[] w(Np+l) } [] exp(1); k = - 1 ;

(2) repeat k := k + 1; E k := [[]m(Ek)

/ e t+ l

solve D. X = d(Ek) using high accuracy routine Ek§ 1 := computed solution;

R := ~ E s + Ek+ 1 ; OK := W(Ek+l) < W ID TH

until k > LIMITER or OK

(3) if OK then (R ~ exp(A) with expected WIDTH) else R ~ exp(A).

If the original matrix has been scaled, then after computing the STC-format E = (Eo,E 1 . . . . ,E~_I,Es) for exp(A) follows the "squaring" (step 6). Taking sufficiently long formats for the scaled exponential we are able to keep the propagation of the bounds widening beyond the first mantissa of the final result which means that it will be the ideal one. The length of E depends on the W I D T H parameter, which itself depends on the bound for Spq, i.e. from the scaling factor ~k Using the whole E in the "squaring" requires a special operation for the product A. B, where A and B are interval matrices represented in STC-format. This operation should deliver the result in the same form: (Co . . . . , Ct-1, Ct); Ci ~ MS,

l 0 _< i < l, C z ~ DMS; A" B ~ ~ C i. This STC-format is then fed to the next squaring

cycle providing optimal information about the computed bounds. In the PASCAL-SC realisation of our algorithm this operation is based on a routine for


matrix dot products of the form ~ A~B~ together with some means for manipulating the content of the long accumulator.

If the binary representation of the scaling factor dk is d~d~_~ ... do: ~k = d~2 ~ + ... + d~2 ~ + do2~ d~ E {0, 1}, then the number of matrix dot products to be evaluated

is s + g, where g is the number of all nonzero digits in d~d+_~ ... do, i.e. ~ = ~ d~. i = 0

Remarks. 1. We have outlined an algorithm which can compute the ideal result for the exponential of a real matrix. It is quite possible that this task itself will appear in another sequence of problems (e.g. solving of linear initial value problem). To this end the algorithm should be modified in order to accept interval input data (and in particular STC-format data). Such an algorithm will be reported elsewhere. We would like to mention that the algorithm from section 2 provides together with the ideal result for exp(A) also an inclusion for exp(A) in the form of STC-format. This makes our algorithm compatible with algorithms which require the matrix exponential as an input data and can accept it in the form of STC-format.

2. The algorithm is formulated for arbitrary Padr-approximations. However, the optimal results with respect to the accuracy and computational time can be achieved by using diagonal approximations (p = q)(cf. [7], [12]).

5. Error Estimates

In this section we give estimates which confirm the finiteness of the AMPE procedure. This error analysis is performed for the case of diagonal (p = q, corresponding to n = r in AMPE) approximation. Then we estimate the bounds widening by the interval matrix multiplications at step 6 of the algorithm and discuss on this base the choice ofp and q. In what follows the notation [/~] will stand for block interval matrices and vectors.

For the rounding operator G we have (cf. [5] Theorem 5.10)

G x = [_x+~(1 - ~,~),~dl - ~)3 ; I~l, I~I < ~ = ~ - ' ; x ~ u ~

x ~ Y = [_zij(1 - ~),2~i(1 - e ~ ) ] ; I~1,1~1 <~ ; x , Y ~ n M S ; z : = x | (13)

Denote with W~ := d iag( [ -e , e]) ~ DMS. From (13) we find, that:

G x = _ ( 1 @ w 3 | x ~ v =_ ( i@ w~) | @v) . (14)

On the p,h iteration step of AMPE we obtain the block vector

[ Z P ] = [ R ] G ~ - ~ HZ ~ , (15) s=O

where all roundoff errors are assumed to be included in [~].

In order to derive our estimate we proceed as follows. First we find an inclusion [~] for [R]. Since H - ~ [~] ___ [~] the last matrix can be represented as [~] = H -x �9 [G]. Substituting this expression for [~] in (15) we get for [~P]


the inclusion

and thus:

p-1 / [ • - (n -1 E) [G]) | B - y, H x ' ,

s=O

p, ( p l ) x ~ �9 r • e x * ___ [G ] (~ H (D x * - y ' x ~ (16)

s=O s=O

where X* denotes as before the exact solution of the system (9). Taking norms in (16) we obtain the estimate

P-~ @ X , ( P-~ ) y~ XS@E• < l l E q | • ~ X' �9 (17) s=0 s=0

In order to get an upper bound for the norm Ill-G] Q) H [I we find an inclusion [E] for I-G] Q) H. Then It[g:]ll > Ill-@] Q) H[I. If we set formally l-X -1] := O, by induction from (17) it follows that

p-1 x ' | * _< Ill-~:]llP.ll~*ll f o r p > 0 (18)

s=O

The explicit form for [~] and [El can be extracted by tracing the roundoff error accumulation in the components of [XP] in AMPE; e.g. starting from [Xo] one finds by means of (14), that

Xo ~ := 0 ( I ' oo) _~ (I �9 W~) 1 (S) B1 - (I �9 W,) 2 0 B1,

X ~ := O(Bz "d2 + X ~ H ' d l ) ~) dE ~-- B2 G (I �9 W~) 2 G BI(I G W~)*Hol ;

and etc. We omit technical details of the whole process of estimating and complete with an estimate for III -n:] ][:

} IIEn:]lt---fl" ;=1 2 "+l-j (n--j)[ I IHIWI-J+ 1 <fl'exp(llHII/2)

2 + 4n - 16e where fl = e. 1 _ 2(2n + 1)e + 8e 2" Recall that H = A or H = - A , where A was

the scaled input matrix (cf. section 2) i.e. IIHII < 1 and thus II[~:]ll _< fl'exp(1/2). Now by means of (18) we obtain

[81 X" (~ l- xp] @ <- (fl" exp(1/2)) p" I1• (19) X*

In practice for the really existing computers e ___ 10 -6 and taking in account that we shall most probably use Pade approximations with p, q < 10 the "worst case" estimate for II [n:] II is II l-~:] II -< 5.10 -4, i.e. each iteration gives an improvement of about 3 decimal digits in the approximation of the value of G,,(H). For the PASCAL-SC (e = 10 -12) implementation of AMPE this number is about 8-9 digits.

Next we turn to the estimate of the bounds widening in step 6. This will provide some reasoning for the selection of p and q required for obtaining the ideal final


result. For convenience we shall consider implementation of the STC-formats as performing calculations in an "extended" floating-point system 5 ' with a mantissa length l' > 1.

Let now E (the same symbol will be used for the corresponding element from 1 ~5 ' ) be the STC-format for the scaled matrix exponential with the relative accuracy of IIw(E)ll/llE]] <- d -rl where r depends on p and q. Since the ideal result corresponds to a relative accuracy of g-t, requiring such an accuracy for the product E ~ . . . . . ~ E s I MS' (M = d k times) we can derive a condition on r, which permits the choice of p and q. The width of E z := EC) E is estimated by ]Iw(EZ)H < [Iw(E)ll(l[E[I + IlEll). Applying (13) with ~ corresponding to 5 ' we get for the width of E ~) E the bound

IIw(E <~ E)II - (1 + ~)" [Iw(E | e)ll -< (1 + ~)ltw(E)ll(liEII + IIEil); (20)

For the matrix w(E ~ . . . . . ~ E) ~ MS' we get by induction from (20):

IIw(E ~ E ~>... ~> E)[I -< M(1 + e) M-x" IIEIl~-l. IIw(E)II (21)

since liE ~ E, . . ~> Ell < (1 + ~)M-1. I/Eli ~. The value of IIw(E ~> E ~ . . . ~ E)I[ is a measure for the absolute error of the computed bounds to the exponential. Finally, using (21) we get for the relative error 3 = I[(w(E ~> E ~ . . . ~> E))II/[I(E" E . . . . "E)II of E ~ . . . . . ~ E the estimate

3 < M(1 + z)M-1 "(IIEII' IIE-'I[) M IIw(E)ll (22) IlEII

Norms of E and E -1 can be estimated by 0 = 1 + (n - 1)(max IAijl~/(1 IlZll) \ i , j /

(cf. section 2) but this bound is in no way the best one. Better bounds can be obtained if some information about the spectral radius of A is available. Unfortunately, such an information is not effectively computable in general and the problem for adequate estimating of the bounds widening is still open. However, for moderate norms ofA (IIAII < 10 3) (22) can provide some hints about the order of the remainder term required for the ideal result. To guarantee the ideal result for the exponential it is sufficient to require 3 < d k . ( l -4- e)M-~O2M-2"(IIw(E)I[/IIEI[) < g-z, or equiv- alently, k + (2M - 2)logb 0 + (M - 1)lOgb(1 + e) < l(r -- 1). In particular, within the PASCAL-SC system (l = 12 and d = 10) for HAll = 10 and IIAII = 100 we get r > 1.7 and r > 2.5 respectively. This corresponds approximately to STC-formats of length 2 and 3 for the scaled matrix exponential E and requires diagonal approximation with p = q < 10. In practice, if the computed result is not the ideal one, it will still represent a verified inclusion to exp(A).

6. Numerical Results

The algorithm proposed in Section 2 has been coded in the PASCAL-SC system for IBM PC, [9]. In order to compare the numerical results by our algorithm with results by a conventional one we have also prepared a program for Pad6-


20

15

12

10

test case I . . . . . . . . . . . . . 0 . . . . .

/ /

o

0

I ~ I I I f l l 2 3 ~ 5 6 7 8 9

test case 2

I 2

,.0,...*0 ' 'O~176

I I f I I I I I 3 / . 5 6 7 8 9

test case 3

. . . . . . . . . . . . . . . . . . . . . . . . O / . . . . . / . . . . . _ . . . . . . . . . .

? - - l I I I I I I I 2 3 ~ 5 6 7 8 p=q

Fig. 1

approximations to exp(A) which employs standard floating-point arithmetic with the same mantissa length (12 decimal digits). In this program standard floating- point computations are used to evaluate approximations to Dpq(A) and Np~(A) and to square the approximation to Rp~(A). The floating-point approximation to Dpq(A) is inverted by means of the high accuracy routine INV15, since the conventional Gaussian elimination routine used first for this task gives rather poor results for some of the test cases. We ran the following test cases (see [7], [12]):

4 2 0 -131 19 18 116+10 -7 10 -7 1. A = 1 4 1 2. A = - 3 9 0 56 54 3. A = I 0 16

1 1 4 -387 57 52

The results by standard floating-point computations (dotted lines) and those by our method (solid lines) for various diagonal approximations are presented in Fig. 1. Compared is the minimal number of correct digits of the components in the computed floating-point approximation, and inclusion for exp(A) resp. Ideal results for the particular floating-point system were obtained by our algorithm for all test cases in contrast to the best standard result of 10 correct digits for the easy test case 1.

The comparison of the results also illustrates a typical pitfall of the standard floating-point implementations of the algorithms: by increasing the order of Pad6- approximation, which theoretically has to improve the accuracy of the result, we do not necessarily obtain better numerical results. In practice, the behaviour of the accuracy is unpredictable. It can be relatively stable as in test case 3, or fluctuating as in test cases 1 and 2, and there is no way to understand this, unless the exact result for exp(A) is available.

Actually, the accuracy achieved by our algorithm is much higher, which is demon- strated by evaluating the sum of the STC-format for exp(A) in a LONGREAL matrix (20 decimal digits). In practice even higher accuracy can be obtained by taking longer STC formats. This, of course, requires more computational time.


The experiments with the program show that for the worst test case 2 (cf. [-12]) it suffices to take p = q = 4, STC-formats of length 3 for the bounds of Dpq(A) and Npq(A), and of length 2 for the scaled matrix exponential in order to deliver ideal result in g (~ = 10, l = 12). Due to the large scaling factor (~k = 103) the inclusion computed with small (less then 3) p and q is rather wide. However, using the 4 th diagonal entry we get 20 correct digits. A possible explanation for achieving this very high accuracy can be in the fact that the eigenvalues of the matrix from test case 2 are - 1 , - 2 , - 20 . For the other test cases the same accuracy has been attained with 7 th or 8 th diagonal approximation.

Acknowledgements

The authors are grateful to the referee for the suggestions and improvements of the paper.

References

[1] ACRITH GeneralInformation Manual IBM Publication No. GC33-6163-01 (Version 1, Release 2), 1984.

[2] Auzinger, W., Stetter, H., J.: Accurate arithmetic results for decimal data on non-decimal computers. Computing 35, 141-151, 1985.

[3] Boehmer, K., Hemker, P., Stetter, H.: The Defect Correction Approach. Computing Suppl. 5, 1984, p. 1-32.

I-4] HIFICOMP Subroutine library for highly efficient numerical computations. Methodological Guide, ed. S. Markov. Bulgarian Academy of Sciences, Sofia 1987.

[5] Kulisch, U., Miranker, W.: Computer Arithmetic in Theory and Practice. Academic Press, 1981. [6] Markov, S.: Mathematical fundamentals of numerical computation. Proc. 17th spring conference

of UMB "Mathematics and education in mathematics", Publ. house of the Bulg. Acad. of Sci., Sofia 1988.

[7] Moler, C., van Loan, C.: Nineteen Dubious Ways to Compute The Exponential of a Matrix. SIAM Review, 20, 4, 1978, p. 801-836.

I-8] Moore, R., E.: Methods and Applications of the Interval Analysis. SIAM, Phyladelfia, 1979. I-9] PASCAL-SC Information Manual and Floppy Disks. U. Kulisch--editor, John Willey & Sons,

1987. [10] Rump, S., Boehm, H.: Least Significant Bit Evaluation of Arithmetic Expressions in Single Preci-

sion. Computing 30, 1983, p. 189-199. [11] Stetter, H.: Sequential Defect Correction for High-accuracy Floating-point Algorithms. Lecture

Notes in Mathematics 1006, 1984, p. 186-202. [12] Ward, R.: Numerical Computation of the Matrix Exponential With Accuracy Estimate.SIAM J.

Num. Anal. 14, 4, 1977, p. 600-610.

Pavel Bochev, Centre for informatics and computer technology, Bulgarian Academy of Sciences, "Acad. G. Bonchev" str. Block 25/A, 1113-Sofia, Bulgaria Svetoslav Markov Institute of Mathematics with Computing Center Bulgarian Academy of Sciences, P.O. Box 373, 1090-Sofia, Bulgaria

A self-validating numerical method for the matrix exponential

Documents

Transcript of A self-validating numerical method for the matrix exponential