Geometric Integration on Manifold of Square Oblique Rotation Matrices

16
GEOMETRIC INTEGRATION ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES N. DEL BUONO AND L. LOPEZ SIAM J. MATRIX ANAL. APPL. c 2002 Society for Industrial and Applied Mathematics Vol. 23, No. 4, pp. 974–989 Abstract. In recent years there has been a growing interest in the dynamics of matrix differen- tial systems on a smooth manifold. Research effort extends to both theory and numerical methods, particularly on the manifolds of orthogonal and symplectic matrices. This paper concerns dynamical systems on the manifold OB(n) of square oblique rotation matrices, a constraint appearing in some minimization problems and in multivariate data analysis. Background and theoretical results on dif- ferential equations on OB(n) are provided. Moreover, numerical procedures preserving the structure of the solution are found among known quadratic invariant preserving methods. Numerical tests and simulations on the oblique Procrustes problem are also reported. Key words. manifolds, square oblique rotation matrices, condition number function, Runge– Kutta methods, oblique Procrustes problems AMS subject classifications. 65F, 65L PII. S089547980037768X 1. Introduction. In recent years there has been a growing interest in study- ing matrix differential systems whose solutions evolve on a smooth manifold such as the manifold of orthogonal or symplectic matrices (see, for instance, [3], [4], [8], [9], [19]). In this paper we consider, both theoretically and numerically, matrix differential systems on the manifold of square oblique rotation matrices OB(n)= {Y R n×n |diag(Y T Y )= I n , and det(Y ) =0}, i.e., the open set of nonsingular matrices Y satisfying the constraint diag(Y T Y )= I n , where I n is the n × n identity matrix. An example of a matrix differential system on OB(n) is that associated with the minimization problem min αAY BF + βXY T Z F subject to Y ∈ OB(n), (1.1) where A,B,X,Z are given matrices of dimension n, α and β are known nonnegative weights, and ‖·‖ F denotes the Frobenius norm on matrices (see [17], [21]). This prob- lem is known as the oblique Procrustes problem (hereafter ObPP(α, β)). If α =1 and β = 0, we obtain the so-called classical ObPP(1, 0), which is a frequent prob- lem in different areas of multivariate data analysis as, for example, factor analysis for common-factor extraction and multidimensional scaling techniques (see [7], [12], [18]). An advantageous feature of ObPP(1,0) is that it is equivalent to n independent minimization problems on the unit sphere S n1 in R n , i.e., it can be transformed into n separate problems for each column of Y . Similarly, ObPP(0, 1) is known as the oblique Procrustes rotation problem to the specified factor-pattern matrix (see [1], [18]). With α and β both different from zero, ObPP(α, β) is a generalization of * Received by the editors September 4, 2000; accepted for publication (in revised form) by M. Chu October 11, 2001; published electronically March 27, 2002. http://www.siam.org/journals/simax/23-4/37768.html Dipartimento Interuniversitario di Matematica, Via E. Orabona 4, I-70125 Bari, Italy (delbuono @dm.uniba.it, [email protected]). This research was supported in part by CNR contract 98.01013.CT01. 974

Transcript of Geometric Integration on Manifold of Square Oblique Rotation Matrices

GEOMETRIC INTEGRATION ON MANIFOLD OF SQUAREOBLIQUE ROTATION MATRICES∗

N. DEL BUONO† AND L. LOPEZ†

SIAM J. MATRIX ANAL. APPL. c© 2002 Society for Industrial and Applied MathematicsVol. 23, No. 4, pp. 974–989

Abstract. In recent years there has been a growing interest in the dynamics of matrix differen-tial systems on a smooth manifold. Research effort extends to both theory and numerical methods,particularly on the manifolds of orthogonal and symplectic matrices. This paper concerns dynamicalsystems on the manifold OB(n) of square oblique rotation matrices, a constraint appearing in someminimization problems and in multivariate data analysis. Background and theoretical results on dif-ferential equations on OB(n) are provided. Moreover, numerical procedures preserving the structureof the solution are found among known quadratic invariant preserving methods. Numerical tests andsimulations on the oblique Procrustes problem are also reported.

Key words. manifolds, square oblique rotation matrices, condition number function, Runge–Kutta methods, oblique Procrustes problems

AMS subject classifications. 65F, 65L

PII. S089547980037768X

1. Introduction. In recent years there has been a growing interest in study-ing matrix differential systems whose solutions evolve on a smooth manifold such asthe manifold of orthogonal or symplectic matrices (see, for instance, [3], [4], [8], [9],[19]). In this paper we consider, both theoretically and numerically, matrix differentialsystems on the manifold of square oblique rotation matrices

OB(n) = Y ∈ Rn×n|diag(Y TY ) = In, and det(Y ) 6= 0,

i.e., the open set of nonsingular matrices Y satisfying the constraint diag(Y TY ) = In,where In is the n× n identity matrix. An example of a matrix differential system onOB(n) is that associated with the minimization problem

min α‖AY −B‖F + β‖XY −T − Z‖Fsubject to Y ∈ OB(n),

(1.1)

where A,B,X,Z are given matrices of dimension n, α and β are known nonnegativeweights, and ‖·‖F denotes the Frobenius norm on matrices (see [17], [21]). This prob-lem is known as the oblique Procrustes problem (hereafter ObPP(α, β)). If α = 1and β = 0, we obtain the so-called classical ObPP(1, 0), which is a frequent prob-lem in different areas of multivariate data analysis as, for example, factor analysisfor common-factor extraction and multidimensional scaling techniques (see [7], [12],[18]). An advantageous feature of ObPP(1,0) is that it is equivalent to n independentminimization problems on the unit sphere Sn−1 in R

n, i.e., it can be transformedinto n separate problems for each column of Y . Similarly, ObPP(0, 1) is known asthe oblique Procrustes rotation problem to the specified factor-pattern matrix (see[1], [18]). With α and β both different from zero, ObPP(α, β) is a generalization of

∗Received by the editors September 4, 2000; accepted for publication (in revised form) by M. ChuOctober 11, 2001; published electronically March 27, 2002.

http://www.siam.org/journals/simax/23-4/37768.html†Dipartimento Interuniversitario di Matematica, Via E. Orabona 4, I-70125 Bari, Italy (delbuono

@dm.uniba.it, [email protected]). This research was supported in part by CNR contract98.01013.CT01.

974

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 975

ObPP(1, 0) and ObPP(0, 1) that does not possess an explicit solution. Because of theinverse in the second term of (1.1), separation into n independent minimization prob-lems is not possible. Therefore, as pointed out in [21] and [22], for solving ObPP(α, β)the use of a matrix approach is suggested.

In this paper we shall consider the general case where problems on OB(n) cannotbe separated into a set of n independent problems on the unit sphere. We propose aflow approach to tackle the problems.

The remainder of this paper is organized as follows. In section 2, we provide somebackground information on differential systems on OB(n), emphasizing the existenceof the solution and its conditioning. Note that it is possible that the solution of aflow on OB(n) will converge to a singular matrix. We define the conditioning of aproblem on OB(n) and derive an upper bound on the associated condition numberas a function of time. In section 3, we focus our attention on the numerical solutionof problems on OB(n). The constraint diag(Y TY ) = In is equivalent to a set of nquadratic conservation laws on the column of Y ; hence any numerical method thatpreserves the obliqueness must first preserve the quadratic. For this reason we lookfor obliqueness preserving methods among known quadratic integrators. We pointout that quadratic preserving methods such as Lie group schemes in [9] and [16] donot correctly solve equations on OB(n) with respect to the constraint. However,Gauss–Legendre Runge–Kutta methods and projection on the manifold OB(n) ofevery explicit one-step or multistep method have good preservation properties. Inthe last section, we present several numerical tests together with numerical results forproblems in ObPP(1, 1).

2. Background. Assume Y ∈ OB(n) and denote by TY OB(n) the tangent spaceat Y . Clearly

TY OB(n) = H ∈ Rn×n|diag(Y TH) = 0 ⊂ R

n×n.

Observe that the linear space

SK(n) = F ∈ Rn×n|diag(F ) = 0

is the tangent space at the identity matrix.Let G : R×R

n×n → Rn×n be a continuous and locally Lipschitz matrix function

on the set D = (γ, ω)×W, where (γ, ω) is an open interval in R and W is a subdomainof OB(n), and let (t0, Y0) ∈ D. Then the differential system

Y ′(t) = G(t, Y (t)), Y (t0) = Y0 ∈ OB(n)(2.1)

has a unique solution Y (t) defined in a neighborhood (τ−, τ+) of t0. It is knownthat Y (t), τ−, and τ+ depend on (t0, Y0) (see [23]). In the following we assume that(τ−, τ+) denotes the maximal interval of existence of the solution Y (t).

Theorem 2.1. Let Y (t) be a solution of the system (2.1). Suppose that Y (t) ∈OB(n) for all t ∈ (τ−, τ+). Then the matrix function G(t, Y (t)) may be written as

G(t, Y (t)) = H(t, Y (t)) − Y (t)diag[Y T (t)H(t, Y (t))],(2.2)

where H : R × Rn×n → R

n×n is a suitable matrix function.

Proof. To prove the relation (2.2) we essentially use a result derived in [5]. Theconstraint diag(Y TY ) = I is equivalent to n constraints yTi yi = 1 for i = 1, . . . , n onthe columns of Y . This means that, if Y (t) ∈ OB(n), then each column Gi(t, Y (t))

976 N. DEL BUONO AND L. LOPEZ

of the matrix function G(t, Y (t)) is necessarily tangent to the unit sphere Sn−1, thatis, 〈Gi(t, Y (t)), yi(t)〉 = 0. This feature may be exploited in order to transform a flowon OB(n) into a sequence of flows on Sn−1. In particular, the ith column yi(t) of thematrix solution Y (t) must satisfy the differential equation

y′i(t) = Hi(t, Y (t)) − 〈Hi(t, Y (t)), yi(t)〉yi(t), i = 1, . . . , n,(2.3)

where the vector function Hi(t, Y (t)) is given by

Hi(t, Y (t)) = Gi(t, Y (t)) + αiyi(t)(2.4)

with αi ∈ R. Since 〈Gi(t, Y (t)), yi(t)〉 = 0 and 〈yi(t), yi(t)〉 = 1, it follows thatαi = 〈Hi(t, Y (t)), yi(t)〉 (see [5]) and this proves the theorem.

If the ith column of the matrix H(t, Y (t)) depends only on the vector yi(t), i.e.,Hi(t, Y (t)) = Hi(t, yi(t)), then (2.3) becomes a set on n independent ODEs on Sn−1.When (2.3) is not separable, one can approximate Hi(t, Y (t)) with a vector functiondepending on yi(t) only, but this approximation provides a differential problem witha larger condition number function.

Observe that since Y ∈ OB(n) is a nonsingular matrix, G ∈ TY OB(n) if andonly if G = Y −TF with F ∈ SK(n). Then the following characterization of ma-trix differential systems on the manifold of square oblique rotation matrices may bederived.

Theorem 2.2. Let Y (t) be the solution of (2.1) on the existence interval (τ−, τ+).Then Y (t) belongs to OB(n) for all t ∈ (τ−, τ+) if and only if

F (t, Y (t)) = Y T (t)G(t, Y (t)), t ∈ (τ−, τ+),(2.5)

is a continuous and locally Lipschitz matrix function mapping OB(n) onto SK(n).Proof. Let Y (t) be the solution of (2.1); then

d

dt[Y T (t)Y (t)] =

(

d

dtY T (t)

)

Y (t)+Y T (t)

(

d

dtY (t)

)

= GT (t, Y (t))Y (t)+Y T (t)G(t, Y (t)),

and, therefore,

d

dtdiag(Y T (t)Y (t)) = diag[GT (t, Y (t))Y (t) + Y T (t)G(t, Y (t))], t ∈ (τ−, τ+).

Thus, if Y (t) ∈ OB(n), then

diag[GT (t, Y (t))Y (t) + Y T (t)G(t, Y (t))] = 0, t ∈ (τ−, τ+).

The matrix G(t, Y (t)) must belong to TY (t)OB(n), i.e., the matrix function F (t, Y (t)) =Y T (t)G(t, Y (t)) is such that diag(F (t, Y (t))) = 0. It follows that F (t, Y (t)) ∈ SK(n)for all t ∈ (τ−, τ+).

Conversely, consider the differential system

Y ′(t) = G(t, Y (t)), Y (t0) = Y0 ∈ OB(n)(2.6)

and the associated differential system

Y ′(t) = G(t, Q(Y (t))), Y (t0) = Y0 ∈ OB(n),(2.7)

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 977

where F (t, Y (t)) = Y T (t)G(t, Y (t)) is a continuous, locally Lipschitz matrix functionmapping elements of OB(n) into SK(n) and Q : R

n×n → OB(n) is a continuousprojection of Y on OB(n). Let Y (t) be the solution of (2.7); then

d

dt[Y T (t)Y (t)] = GT (t, Q(Y (t)))Y (t) + Y T (t)G(t, Q(Y (t))).

Since F (t, ·) maps elements of OB(n) into SK(n), the solution of (2.7) belongs toOB(n) for all t in the existence interval. Therefore Q(Y (t)) = Y (t) and the differentialsystem (2.7) is equivalent to (2.6).

A dynamical system on OB(n) can therefore be written in the following form:

Y ′(t) = Y −T (t)F (t, Y (t)), Y (t0) = Y0 ∈ OB(n), t ∈ (τ−, τ+),(2.8)

where F is a continuous and locally Lipschitz matrix function, such that

F : R ×OB(n) → SK(n).(2.9)

Remark. We mention that an example of a dynamical system on OB(n) has arisenfrom the oblique Procrustes problem. It has been proven in [21], using projectedgradient flow theory, that the solution of ObPP(α, β) can be computed as the limitpoint of a matrix differential system on OB(n) of the form (2.1) with

G(Y ) = Y −T off[β(XY −T − Z)TXY −T − αY TAT (AY −B)],(2.10)

where off(·) is the matrix operator defined as off(A) = A− diag(A).Note that when the matrix solution Y (t) of (2.1) evolves on OB(n), it is a bounded

matrix function for all t in the maximal existence interval (τ−, τ+). Thus if the escapepoint τ+ is a finite value, then Y (t) tends to the boundary of the manifold for t → τ+,i.e., Y (t) converges to a singular matrix for t → τ+.

The value of the escape point τ+ depends on the matrix function G(t, Y ) =Y −TF (t, Y ). If G(t, Y ) is well defined at all matrices Y of R

n×n, then τ+ = ∞;otherwise τ+ is a finite value. For instance, if F (t, Y ) is constant for all (t, Y ), thenG is not well defined at all singular matrices. In this case G will be a continuous andlocally Lipschitz matrix function only in neighborhoods of nonsingular initial matrices,so that the solution Y (t) exists only on finite neighborhoods of t0 and approaches asingular matrix for t → τ+.

Example 2.3. The differential system

Y ′ = Y −T

(

0 − δ2

− δ2 0

)

, Y (0) =1√2

1 −1

1 1

, with δ 6= 0,(2.11)

has a solution given by

Y (t) =1√2

√1 + δt −

√1 + δt

√1 − δt

√1 − δt

,

which exists and belongs to OB(n) in the neighborhood (−1/δ, 1/δ) of t0 = 0. Inthis case the matrix function G(Y ) exists and is a continuous and Lipschitz matrixfunction only in a neighborhood of Y (0).

978 N. DEL BUONO AND L. LOPEZ

As shown in Example 2.3 differential problems on OB(n), the condition numberof the solution of a certain differential equation on OB(n) may become unbounded.For this reason, the following result may be very useful.

Theorem 2.4. Let Y (t) be the solution of the differential system (2.8) and

suppose that

∫ t

t0

[FT (s, Y (s)) + F (s, Y (s))]ds

≤ g(t), t ∈ [t0, τ+),(2.12)

where g is a continuous nonnegative function such that g(t0) = 0 and ‖ · ‖ is the 2-norm on matrices. Let λmax(t0) and λmin(t0) be, respectively, the largest and smallest

strictly positive eigenvalues of Y T0 Y0. Let σ be the largest value of the interval [t0, τ

+)such that

λmin(t0) − g(t) > 0, t ∈ [t0, σ).(2.13)

Then the condition number function µ(t) in the 2-norm of the solution Y (t) satisfies

µ(t) ≤√

λmax(t0) + g(t)

λmin(t0) − g(t), t ∈ [t0, σ).(2.14)

Proof. The condition number function µ(t) in the 2-norm of Y (t) is the squareroot of the ratio between the largest and smallest eigenvalue of the symmetric positivedefinite matrix Y T (t)Y (t). This matrix function satisfies the differential equation

d

dt[Y T (t)Y (t)] = [FT (t, Y (t)) + F (t, Y (t))], Y T (t0)Y (t0) = Y T

0 Y0,(2.15)

whose solution may be written as

Y T (t)Y (t) = Y T0 Y0 +

∫ t

t0

[FT (s, Y (s)) + F (s, Y (s))]ds, t ∈ (τ−, τ+).(2.16)

From (2.16) and the Bauer–Fike theorem (see [11]), it follows that

min1≤i≤n

|λi(t0) − λ(t)| ≤∥

∫ t

t0

[FT (s, Y (s)) + F (s, Y (s))]ds

, t ∈ [t0, τ+),(2.17)

where λi(t0) for i = 1, . . . , n are the eigenvalues of Y T0 Y0 and λ(t) is a generic eigen-

value of Y T (t)Y (t). Hence, there exists j ∈ 1, . . . , n such that

|λj(t0) − λ(t)| = min1≤i≤n

|λi(t0) − λ(t)|, t ∈ [t0, τ+),

and consequently

λmin(t0) − λ(t) ≤ λj(t0) − λ(t) ≤ |λj(t0) − λ(t)|.

Therefore, from (2.12) and (2.17) we have

λmin(t0) − λ(t) ≤ g(t), t ∈ [t0, τ+).

In particular, for λ(t) = λmin(t) the smallest eigenvalue of Y T (t)Y (t), we have

λmin(t0) ≤ λmin(t) + g(t), t ∈ [t0, τ+),(2.18)

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 979

and hence

λmin(t0) − g(t) ≤ λmin(t), t ∈ [t0, τ+).(2.19)

In the same manner, if λ(t) is replaced by λmax(t) the largest eigenvalue of Y (t)TY (t),we obtain

λmax(t) ≤ λmax(t0) + g(t), t ∈ [t0, τ+).(2.20)

Thus, if t ∈ [t0, σ), then the lower bound on λmin(t) in (2.18) becomes strictly pos-itive. Finally, since µ(t) =

λmax(t)/λmin(t), by using (2.18) and (2.20), (2.14)follows.

Observe that, when F (t, Y ) is a skew-symmetric matrix function, Theorem 2.4implies that the condition number function µ(t) is constant on the time interval. Thedifferential problem (2.8) is said to be ill-conditioned when µ(t) is unbounded on theinterval [t0, τ

+); otherwise, it is said to be well-conditioned.The value σ, given by (2.13), provides a lower bound of the escape point τ+.

The conditioning of a differential problem on OB(n) depends on the function g(t). Inparticular, the problem will be ill-conditioned when g(t) approaches λmin(t0) at somet in [t0, σ).

Example 2.3 has a general implication: Differential systems where F is constantare typically ill-conditioned. In fact, in this case we may choose g(t) = (t−t0)‖F+FT ‖for all t ≥ t0, and so g(σ) = λmin(t0) for σ = (t0+λmin(t0))/‖F+FT ‖. Thus the escapepoint τ+ may be estimated by σ. Instead, examples of well-conditioned problems arethose satisfying (2.13) for all t in [t0,+∞). These remarks can be summarized in thefollowing result.

Proposition 2.5. If F (t, Y ) is a constant matrix function, then (2.8) will

be ill-conditioned on [t0, τ+). Furthermore, if the differential system (2.8) is well-

conditioned on [t0,+∞), then F (t, Y ) is a skew-symmetric or not constant matrix

function.

3. Numerical methods. For the sake of simplicity, we concentrate our atten-tion on autonomous differential systems, even though the main results derived in thispaper may also be applied to the nonautonomous cases. Let h > 0 be the time step.Consider a partition of the time interval given by tk+1 = tk + h, and denote by Yk anapproximation of Y (tk) at tk, for k ≥ 0.

As pointed out previously, the constraint diag(Y TY ) = In is equivalent to a setof n quadratic conservation laws on the columns of Y . We thus look for obliquenesspreserving schemes among integrators on Sn−1. A first negative result is that thequadratic preserving methods based on Lie group theory (see [9] and [16]) do not pre-serve the constraint diag(Y TY ) = In. Instead, positive results may be obtained by us-ing Gauss–Legendre Runge–Kutta methods and projection techniques. We also showthat Gauss–Legendre Runge–Kutta schemes preserve the condition number functionof the solution, i.e., they satisfy a relation like (2.16) at each time step tk. However,they require very small time steps of integration when Y (t) converges to a singularityor F (t, Y ) depends on the inverse of Y (t).

3.1. Runge–Kutta methods. Consider the v-stage Runge–Kutta method de-fined by the Butcher array

c Ab

,(3.1)

980 N. DEL BUONO AND L. LOPEZ

where cT = (c1, . . . , cv), A = (aij), b = (b1, . . . , bv) (see [14]). Applying (3.1) to (2.8)we get

Yk+1 = Yk + h

v∑

i=1

biY−Tki F (Yki),(3.2)

with

Yki = Yk + h

v∑

j=1

aijY−Tkj F (Ykj), i = 1, . . . , v, k ≥ 0.(3.3)

Definition 3.1. A Runge–Kutta method (3.1) applied to (2.8) is said to be

diagonal preserving if and only if it provides a sequence Yk for k ≥ 0 such that

diag(Y Tk+1Yk+1) = diag(Y T

k Yk), k ≥ 0.(3.4)

Furthermore, it is said to be condition number function preserving if and only if

Y Tk+1Yk+1 = Y T

k Yk + h

v∑

i=1

bi[F (Yki) + FT (Yki)], k ≥ 0.

Finally, it is said to be obliqueness preserving when it is both diagonal and condition

number function preserving.

Concerning the diagonal and conditioning preserving feature of Runge–Kuttamethods, the following result may be trivially derived.

Lemma 3.2. For differential system (2.8) where (2.9) is replaced by

F : Rn×n → SK(n),(3.5)

the Runge–Kutta scheme (3.2)–(3.3) with coefficient matrix

M = (bibj − biaij − bjaji)(3.6)

identically zero preserves both the diagonal and the condition number function along

the solution.

Proof. Consider the numerical solution Yk+1 given by (3.2)–(3.3). By usingtechniques similar to those used in quadratic or symplectic methods (see [2], [6], [8],[10], [20]), we obtain

Y Tk+1Yk+1 = Y T

k Yk + h∑v

i=1 bi[FT (Yki) + F (Yki)]

+h2∑v

i=1

∑vj=1(bibj − biaij − bjaji)F

T (Yki)Y−1ki Y −T

kj F (Ykj).(3.7)

If the coefficient matrix M is equal to zero, then the Runge–Kutta scheme preservesthe condition number function. Furthermore, since F maps all matrices of R

n×n intoSK(n), then diag[FT (Yki) + F (Yki)] = 0, and hence

diag(Y Tk+1Yk+1) = diag(Y T

k Yk), k ≥ 0;

that is, the Runge–Kutta method is diagonal preserving.From Lemma 3.2 it follows that no explicit Runge–Kutta method is diagonal pre-

serving, while all v-stage Gauss–Legendre Runge–Kutta methods (denoted by GLv)possess this property (see [14]).

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 981

Remark. Note that the intermediate values Yki do not lie on OB(n). Thus, ifin Lemma 3.2 the condition on the function F is replaced with (2.9), it follows thatF (Yki) does not belong to SK(n) and hence GLv are not diagonal preserving. In thiscase we can apply the following projected version of GLv (denoted by PGLv):

Yk+1 = Yk + h

v∑

i=1

biY−Tki P[F (Yki)],(3.8)

Yki = Yk + h

v∑

j=1

aijY−Tkj P[F (Ykj)], i = 1, . . . , v,(3.9)

where the projection of F onto SK(n) is given by

P(F ) = F − diag(F ).

Obviously the PGLv method is diagonal preserving and it has the same order ofaccuracy of GLv. In fact, PGLv is equivalent to GLv applied to the differentialproblem

Z ′ = Z−TP[F (Z)], Z(t0) = Y0 ∈ OB(n),

which is equivalent to (2.8). A cheaper procedure may be derived by substituting(3.9) into (3.3). This method will be denoted by WPGLv.

It is known that a cheap way to solve the nonlinear system associated with (3.3)is the functional fixed-point iteration scheme, yielding

Y(m+1)kl = Yk + h

v∑

j=1

alj(Y(m)kj )−TF (Y

(m)kj ), l = 1, . . . , v,(3.10)

with initial guess given by Y(0)kl = Yk for l = 1, . . . , v and stopping criteria

Y(m+1)kl − Yk − h

v∑

j=1

alj [Y(m+1)kj ]−TF (Y

(m+1)kj )

< tol, l = 1, . . . , v.

We illustrate the scheme by considering the implicit midpoint rule (GL1) where atevery step k the nonlinear system to be solved is

H(Z) = Z − Yk − h

2Z−TF (Z) = 0,(3.11)

while the functional iteration may be written as

Z(m+1) = Yk +h

2[Z(m)]−TF (Z(m)) for m ≥ 0,(3.12)

with Z(0) = Yk.Theorem 3.3. Let L1 be the Lipschitz constant of F with respect to Y on a

domain D and assume that there exists a constant L2 such that

for all Y ∈ D : ‖F (Y )‖ ≤ L2.(3.13)

982 N. DEL BUONO AND L. LOPEZ

Let Z be the solution of the nonlinear system (3.11) and suppose that the functional

iteration (3.12) provides approximations Z(m) of Z such that

‖[Z(m)]−1‖ ≤ L3‖Z−1‖, m ≥ 0,(3.14)

where L3 is a positive constant independent of m. Then the functional iteration (3.12)converges if the time step h is such that

h <2

L3‖Z−1‖L1 + L2‖Z−1‖ .(3.15)

Proof. From (3.11) and (3.12) we have

Z(m+1) − Z =h

2[Z(m)]−TF (Z(m)) − Z−TF (Z)

=h

2[Z(m)]−T [F (Z(m)) − F (Z)] + ([Z(m)]−T − Z−T )F (Z)

=h

2[Z(m)]−T [F (Z(m)) − F (Z)] + [Z(m)]−T (ZT − [Z(m)]T )Z−TF (Z).

Since the 2-norm of a matrix is equal to the 2-norm of its transpose, it follows that

‖Z(m+1) − Z‖ ≤ h2 ‖Z(m) − Z‖‖[Z(m)]−1‖L1 + L2‖Z−1‖

≤ h2 ‖Z(m) − Z‖L3‖Z−1‖L1 + L2‖Z−1‖,

and if (3.15) is satisfied, the convergence follows.From (3.15), it follows that the functional iteration needs very small time steps

of integration near a singularity of the solution Y . Furthermore, even if the matrixsolution Y does not approach a singularity, but if F (Y ) depends on the inverse ofthe matrix solution Y , then the constants L1 and L2 may grow with ‖Y −1‖. Thismeans that h may be very small also for problems with ‖Y −1‖ of moderate size. Forinstance, in the case of ObPP(1, 1), the quotient appearing in (3.15) is O(‖Z−1‖4).

Remark. A step-size selection strategy may be considered where the value of his reduced until the functional iteration converges, that is, until the new value hnew

satisfies (3.15). This reduction may be stopped when hnew becomes smaller than aprefixed lower bound.

A more accurate reformulation for the fixed-point iterations in the implicit Runge–Kutta methods may be used in order to reduce the influence of round-off errors andimprove convergence (see [15] for details).

3.2. Projected methods. In the same spirit as the work in [8] where projectedmethods on the orthogonal manifold have been proposed, we can consider numericalprocedures based on projecting on OB(n) of the numerical solution of (2.8) obtainedby any explicit Runge–Kutta or multistep method onto OB(n). A projection Y of amatrix Q on OB(n) is given by the closest oblique rotation in the least square sense,i.e.,

Y = Qdiag(QTQ)−12 ,(3.16)

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 983

with

‖Q− Y ‖ ≤ ‖Y ‖‖I − diag(QTQ)12 ‖.

This projection may be used in conjunction with one-step or multistep explicit schemesin order to obtain a semi-implicit procedure where no iteration is required. In partic-ular, given Yk ∈ OB(n), first we compute Yk+1 by the explicit Runge–Kutta method

Yk+1 = Yk + h

v∑

i=1

biY−Tki F (Yki),

Yki = Yk + h

i−1∑

j=1

aijY−Tkj F (Ykj), i = 1, . . . , v.

(3.17)

Then we project Yk+1 on OB(n); that is, we compute

Yk+1 = Yk+1diag(Y Tk+1Yk+1)

− 12 .(3.18)

If the basic Runge–Kutta method is of order p, we have

diag(Y Tk+1Yk+1) = I + D(hp),(3.19)

where D(hp) is a diagonal matrix with elements that are O(hp). Therefore, we have

Yk+1 − Yk+1 = Yk+1[I − diag(Y Tk+1Yk+1)

12 ] = Yk+1[I − (I + D(hp))

12 ].

Hence, ‖Yk+1−Yk+1‖ = O(hp) implies that the projected method is of the same orderas that of the basic Runge–Kutta scheme. Furthermore, from (3.18) it follows that

Y Tk+1Yk+1 = [diag(Y T

k+1Yk+1)− 1

2 ]T Y Tk+1Yk+1diag(Y T

k+1Yk+1)− 1

2 .

By (3.19), we have

Y Tk+1Yk+1 = Y T

k+1Yk+1 + O(hp),

with

Y Tk+1Yk+1 = Y T

k Yk + h

v∑

i=1

bi[FT (Yki) + F (Yki)]

+h2

v∑

i=1

v∑

j=1

(bibj − biaij − bjaji)FT (Yki)Y

−1ki Y −T

kj F (Ykj).

Since the matrix M of an explicit Runge–Kutta method cannot be the zero matrix,the last term in the previous equality does not vanish. Therefore a projected Runge–Kutta method is not condition number function preserving.

Observe that in a step-size selection strategy to control the local truncation error,the new time step hnew must satisfy the relation

hnew ≤(

θǫ

h‖r(tk, Yk)‖

)1

p+1

,(3.20)

984 N. DEL BUONO AND L. LOPEZ

where θ is a safety factor, ǫ is the local truncation error bound, and

r(tk, Yk) =

v∑

i=1

(bi − bi)Y−Tki F (Yki),

where the bi denote the coefficients of a Runge–Kutta method of order p + 1. From(3.20) it follows that very small time steps will be required for solving differentialproblems which are ill-conditioned or where F (Y ) depends on Y −1.

4. Numerical tests. All the numerical tests have been obtained by Matlabcodes implemented on a scalar computer Alpha 200 5/433 with 512 Mb RAM. Wecompare obliqueness preserving methods on different problems. Comparisons havebeen performed in terms of accuracy (measured by ‖Y (tk) − Yk‖∞ with ‖ · ‖∞ theinfinity norm on matrices), deviation from the manifold OB(n) (measured by Ω(Yk) =‖I − diag(Y T

k Yk)‖F ), and CPU time. The theoretical solution Y (tk), if unknown, hasbeen estimated applying the numerical method with a half step-size. We denote byPRKv and PABv the projected methods of section 3.2 based on explicit Runge–Kuttaand Adams–Bashforth methods of order v, respectively. The starting approximationsfor PABv have been obtained by a Runge–Kutta method of the same order. GLvhave been implemented solving the nonlinear system (3.3) by functional iterationwith tolerance tol = 10−15.

Example 4.1. First, we consider differential problem (2.8) with constant matrixfunction

F =

0 2/3 11 0 8/5−3 5/4 0

and initial condition Y0 given by the identity matrix. Figure 4.1 (a) plots the behaviorof the function g(t) = ‖FT + F‖t in the interval [0, σ), where σ ≈ 0.2280 is theintersection point of g(t) and λmin(0).

0 0.05 0.1 0.15 0.2 0.25 0

0.2

0.4

0.6

0.8

1

1.2

1.4

σ

λ min

(t 0 )

g(t)

(a)

0 0.05 0.1 0.15 0.2 0.25 10

0

10 1

10 2

10 3

µ (Y

k )

Time

(b)

Fig. 4.1. Conditioning of Example 4.1.

Figure 4.1 (b) plots the behavior of the condition number function obtained byGL1 applied on [0, σ) with the variable step-size strategy described at the end of

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 985

Table 4.1

Method CPU time Ω(Yk) Global error µ(t)GL1 0.5000 1.1957e-15 0.0016 48.4875

PRK2 0.1167 2.2204e-16 0.0031 50.4054PAB2 0.0667 2.2402e-16 0.0080 52.4978

GL2 0.5500 1.4603e-15 5.4116e-08 49.1144PRK4 0.2000 2.2204e-16 1.0151e-07 49.1144PAB4 0.1060 2.4825e-16 4.1564e-06 49.1161

0 0.5 1 1.5 2 2.5 3 3.510

0

101

102

103

104

Time

Log(

µ(Y

(tk))

)

Fig. 4.2. Condition number function of second order methods.

subsection 3.1, where the value of h is the largest value for which the functionaliteration converges. It can be observed that the condition number function increasesnear the singularity of the solution.

Example 4.2. We consider the nonautonomous differential system Y ′(t) =Y −T (t)F (t) for t ∈ [0.1, 3], where the matrix function F (t) is given by

F (t) =

0t2 − t

√t2 + 3

(t2 + 4)3/2√t4 + 4t2 + 3

t−√t2 + 3

(1 + t2)3/2√t2 + 4

0

and the theoretical solution is the matrix function

Y (t) =

(t2 + 1)−1/2 (t2 + 4)−1/2

t(t2 + 1)−1/2 (t2 + 3)1/2(t2 + 4)−1/2

.

Table 4.1 reports the performance of the methods at t = 3 for h = 0.25. The GLvmethods seem to be more expensive but more accurate than the projected procedures.

Figure 4.2 plots the logarithm of the condition number function of the theoreticalsolution (solid line) and of the numerical solution given by GL1 (dash-dotted line)and PRK2 (dotted line) with step-size h = 0.1. The GL1 method reproduces theexact behavior of the condition number function while PRK2 needs smaller step-sizesh to correctly integrate the problem.

Example 4.3. We consider the system (2.8), where F (Y ) = A+(I−diag(Y TY ))

986 N. DEL BUONO AND L. LOPEZ

maps only matrices of OB(n) into SK(n), with A and Y0 given by

A =

0 −2.28 −0.742.1 0 1.30.8 −1.5 0

, Y0 =

0.9487 0.5392 0.70710 0.5392 0

0.3162 0.6470 0.7071

.

Table 4.2 summarizes the performance at t = 1 of the obliqueness preservingmethods integrating on [0, 1] with h = 0.000625. The GLv methods destroy thediagonal structure of the matrix Y T

k Yk because the matrix function F satisfies only(2.9), while WPGLv and PGLv preserve both the diagonal and the condition numberfunction better than GLv.

Table 4.2

Performance of the methods at t = 1.

Method CPU time Ω(Yk) Global error µ(t)GL1 4.3500 8.9457e-06 5.4614e-05 8.1319

PGL1 4.9833 6.6986e-15 2.7383e-06 8.1314WPGL1 4.7333 2.5762e-09 2.7508e-06 8.1314

GL2 7.7167 9.8231e-13 7.2228e-12 8.1314PGL2 8.8333 3.1480e-15 3.9928e-13 8.1314

WPGL2 8.5000 1.4358e-14 3.2199e-13 8.1314

4.1. Numerical tests for ObPP(1, 1). We now approximate the numericalsolution of ObPP(1, 1) with problem data sets generated randomly. In particular, wegenerate random matrices A, X, and Yin, where Yin is a projection on OB(n) of arandom matrix Q. We then define B = AYin and Z = XY −T

in so that the underlyingproblem has a global solution at Yin. The initial condition Y0 is a projection onOB(n) of a perturbation of Yin by a random matrix generated by Matlab functionrand, that is, Q = Yin + rand(n). In our numerical simulations we intend to observehow frequently the numerical methods reconstruct the matrix Yin either with differentdata sets A, Y0, Yin or with different initial values Y0.

The tables show the number of cases in which Yin is reconstructed (called re-constructions); the number of cases in which the objective function is minimized butthe numerical solution differs from Yin (called deviations); the number of cases inwhich it is not minimized (called failures), and, finally, the cases when the functionaliteration does not converge (called divergences). The solution Yin is considered to befaithfully reconstructed if the local error ‖Yk −Yin‖/‖Yk‖ is less than 10−3 for secondorder methods and 10−6 for fourth order methods. The time step used is h = 0.001.We observe that when the functional iteration diverges a smaller step-size should beemployed.

Table 4.3 reports the results of 100 simulations with different data, while Table4.4 gives the results of 100 solutions of the problem with only different starting matrixY0.

It seems that GLv reach the global solution of ObPP(1, 1) with different data agreater number of times than the projected procedures.

We analyze in more details the problem with the following data:

A =

0.9688 0.7553 0.25120.3557 0.8948 0.93270.0490 0.2861 0.1310

, X =

0.1171 0.8234 0.94920.7699 0.0466 0.28880.3751 0.5979 0.8888

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 987

Table 4.3

Method Reconstructions Deviations Failures DivergencesGL1 66 13 12 9

PRK2 67 23 10 -PAB2 65 21 14 -

GL2 66 13 12 9PRK4 67 23 10 -PAB4 65 21 14 -

Table 4.4

Solutions of 100 different ObPP with α = 1 and β = 1.

Method Reconstructions Deviations Failures DivergencesGL1 63 14 13 10

PRK2 55 23 22 -PAB2 57 23 20 -

GL2 63 14 13 10PRK4 55 23 22 -PAB4 57 23 20 -

and

Yin =

0.6498 0.4124 0.79640.4848 0.8969 0.52590.5855 0.1598 0.2988

, Y0 =

0.5259 0.3992 0.79340.6942 0.9006 0.37710.4915 0.1719 0.4778

.

Figures 4.3 (a), (b), and (c) show, respectively, the semilog plot of the deviation fromthe diagonal structure, the deviation from the expected matrix Yin, and the value ofthe objective function in (1.1) for the numerical solution given by GL1 with h = 0.001.

0 1 2 3 4 5 6 7 8 9 10 10

-16

10 -15

10 -14

(a)

Ω (Y

k )

0 1 2 3 4 5 6 7 8 9 10 10

-15

10 -10

10 -5

10 0

(b)

Glo

bal E

rror

0 1 2 3 4 5 6 7 8 9 10 10

-20

10 -10

10 0

10 10

Time

(c)

Obj

ectiv

e V

alue

Fig. 4.3. Performance of GL1 method.

Tables 4.5 and 4.6 summarize the results obtained integrating 50 ObPP(1, 1) prob-lems by means of the variable step-size version of GL1 (denoted by GL1(vs)) and a

988 N. DEL BUONO AND L. LOPEZ

Table 4.5

Method Reconstructions Deviations FailuresGL1(vs) 28 15 7PODE23 15 26 9

Table 4.6

Method Reconstructions Deviations FailuresGL1(vs) 39 10 1PODE23 37 12 1

projected version of the Matlab function ode23 (denoted by PODE23). In particular,Table 4.5 reports the results of 50 numerical simulations of different ObPP(1, 1) prob-lems, while Table 4.6 shows the results obtained solving the same problem startingfrom 50 different initial matrix Y0. Table 4.5 seems to indicate that GL1(vs) givesbetter results than the variable step-size PODE23 method.

5. Conclusions. Differential systems on the manifold OB(n) arise in severalimportant applications. In this paper we have provided some theoretical results onthese differential systems and studied the conditioning of the solution. We have alsoconsidered different numerical methods for the integration of problems on OB(n).With the expectation that the integrator should preserve both the oblique structureand the conditioning of the theoretical solution, we have found that Gauss–LegendreRunge–Kutta schemes preserve both these properties. Numerical tests, in particularfor the solution of the oblique Procrustes problem, have highlighted the necessity ofusing very small integration steps.

Acknowledgments. The authors wish to thank the anonymous referees for theirmany helpful suggestions.

REFERENCES

[1] M.W. Browne, Oblique rotation to a partially specified target, British J. Math. Statist. Psych.,25 (1972), pp. 207–212.

[2] M.P. Calvo, A. Iserles, and A. Zanna, Numerical solution of isospectral flows, Math. Comp.,66 (1997), pp. 1461–1486.

[3] M.T. Chu, A list of matrix flows with applications, in Hamiltonian and Gradient Flows, Algo-rithms and Control, Fields Inst. Commun. 3, AMS, Providence, RI, 1994, pp. 87–97.

[4] M.T. Chu, Scaled Toda-like flows, Linear Algebra Appl., 215 (1995), pp. 261–273.[5] M.T. Chu, Curves on Sn−1 that lead to eigenvalues or their means of a matrix, SIAM J. Alg.

Disc. Meth., 7 (1986), pp. 425–432.[6] G. Cooper, Stability of Runge–Kutta methods for trajectory problems, IMA J. Numer. Anal.,

7 (1987), pp. 1–13.[7] T.F. Cox and M.A.A. Cox, Multidimensional Scaling, Chapman & Hall, London, 1995.[8] L. Dieci, D. Russell, and E. Van Vleck, Unitary integrators and applications to continuous

orthonormalization techniques, SIAM J. Numer. Anal., 31 (1994), pp. 261–281.[9] F. Diele, L. Lopez, and R. Peluso, The Cayley transform in the numerical solution of unitary

differential systems, Adv. Comput. Math., 8 (1998), pp. 317–334.[10] T. Eirola and J.M. Sanz-Serna, Conservation of integrals and symplectic structure in the in-

tegration of differential equations by multistep methods, Numer. Math., 61 (1992), pp. 281–290.

[11] G.H. Golub and C.F. Van Loan, Matrix Computations, Johns Hopkins University Press,Baltimore, 1983.

[12] J.C. Gower, Multivariate analysis: Ordination, multidimensional scaling and allied topics,in Handbook of Applicable Mathematics, Vol. IV: Statistics, Part B, E. Lloyd, ed., JohnWiley & Sons, New York, 1984, pp. 727–781.

ODEs ON MANIFOLD OF SQUARE OBLIQUE ROTATION MATRICES 989

[13] G.T. Gruvaeus, A general approach to Procrustes pattern rotation, Psychometrika, 35 (1970),pp. 493–505.

[14] E. Hairer, S.P. Norsett, and G. Wanner, Solving Ordinary Differential Equations, Vol. I:Nonstiff Problems, 2nd ed., Springer-Verlag, Berlin, 1991.

[15] E. Hairer and G. Wanner, Solving Ordinary Differential Equations, Vol. II, Springer-Verlag,Berlin, 1991.

[16] A. Iserles, H. Munte-Kaas, S.P. Norsett, and A. Zanna, Lie group methods, in ActaNumerica 9, Cambridge University Press, Cambridge, UK, 2000, pp. 215–365.

[17] H.A.L. Kiers, Joint orthomax rotation of the core and component matrices resulting from a

three-mode factor analysis, J. Classification, 15 (1998), pp. 245–263.[18] S.A. Mulaik, The Foundations of Factor Analysis, McGraw-Hill, New York, 1972.[19] J.M. Sanz-Serna and M.P. Calvo, Numerical Hamiltonian Problems, Chapman & Hall, Lon-

don, 1994.[20] A.M. Stuart and A.R. Humphries, Dynamical Systems and Numerical Analysis, Cambridge

University Press, Cambridge, UK, 1996.[21] N.T. Trendafilov, A continuous-time approach to the oblique Procrustes problem, Behav-

iormetrika, 26 (1999), pp. 167–181.[22] N.T. Trendafilov and R.A. Lippert, The multimode Procrustes problem, Linear Algebra

Appl., to appear.[23] H.K. Wilson, Ordinary Differential Equations, Addison-Wesley, London, 1971.