Some Theorems on Matrix Differentiation with Special Reference to Kronecker Matrix Products...

Some Theorems on Matrix Differentiation with Special Reference to Kronecker MatrixProductsAuthor(s): H. NeudeckerSource: Journal of the American Statistical Association, Vol. 64, No. 327 (Sep., 1969), pp. 953-963Published by: Taylor & Francis, Ltd. on behalf of the American Statistical AssociationStable URL: http://www.jstor.org/stable/2283476 .

Accessed: 05/11/2014 05:39

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserveand extend access to Journal of the American Statistical Association.

http://www.jstor.org

This content downloaded from 14.139.182.242 on Wed, 5 Nov 2014 05:39:21 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=taylorfrancis

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2283476?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


SOME THEOREMS ON MATRIX DIFFERENTIATION WITH SPECIAL REFERENCE TO KRONECKER MATRIX PRODUCTS

H. NEUDECKER* University of Birmingham, U. K.

Middle East Technical University, Ankara

Matrix differentiation, the procedure of finding partial derivatives of the elements of a matrix function with respect to the elements of the argument matrix, is the subject matter of this paper. The method followed here amounts to linking the differentials of the matrix function and the argument matrix, and then identifying matrices of partial derivatives. The basic assumption made is mathematical independence of the elements of the argument matrix. Matrix functions are divided into two categories: Kronecker matrix products and non-Kronecker (=ordinary) matrix products. Three definitions of partial derivatives matrices are used, to be denoted by DI, D2 and D3 in this abstract. D2 and D, are applied to Kronecker products, D1 is applied to ordinary products. This is a matter of efficiency only. DI is developed first. Transforming matrices into column vectors turns out to be very con- venient. D2 and D3 are developed then. D3 is a generalisation of D,.

1. INTRODUCTION

TURNBULL, Dwyer and MacPhail, Deemer and Olkin, Olkin, Bodewig, Roy, Dwyer and others have presented a series of theorems for solving problems

in the field of matrix differentiation. However, there is still a need for develop- ing others, primarily since the theorems available at the moment are not ade- quate for all the matrix derivatives currently needed. In particular, matrix differentiation for Kronecker matrix products is needed in many applications. This category comprises matrix functions whose differentials can be written as A 0B(dX)C or B(dX)C 0 A (or their transposes). The present matrix calculus permits such calculations in a relatively simple manner.

2. SOME MATRIX ALGEBRA THEOREMS

(2.1) Let A = [aij] be an (m, n) matrix and B an (s, t) matrix, then the Kronecker product A OB is defined as the (ms, nt) matrix

A 0 B = [aijB].

We state the following properties of Kronecker matrix products, all of which may be proved in an elementary fashion.

The matrices involved are assumed to be conformable. In (2.7) it is further assumed that A and B are square of order m and s respectively.

(2.2) (A 0 B) (C 0 D) = (AC) 0 (BD)

(2.3) (A 0 B)-1 = A-' 0o B-1

(2.4) (A X B)'= A' B' (2.5) (A+B) (C+D) = (A0C)+(A0D)+ (B 0 C)+(B0D)

* The author is indebted to editors and referees of J.A.S.A. for their most constructive criticism.

953



954 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1969

(2.6) A 0 (B0C) = (A 0 B) 0C (27) A0BI= AI|sBImn (2.8) tr (A 0 B) = (tr A) (tr B).

In the following aij will denote the ijth element of A. Ai. and A.j will denote the ith row and jth column of A, respectively. If A is of order (n, n), we define the (min) column vector vec A:

'A.,

(2.9) vec A f A.nj

In particular, if y is a column vector, then vec y = vec y'= y. We now give five theorems connecting vec A and the Kronecker product.

The proofs are straightforward. It is assumed that the matrices involved are conformable. The unit matrices have appropriate orders.

(2.10) vec ABC = (C' 03 A) vec B.

Proof: The jth subvector of vec ABC equals

ABC.j = cijAB.i = (C. 0 A) vec B.

The conclusion follows.

(2.10.1) vec AB= (I A)veB= (B'0)vec A = (B' 0 A) vec I

B'i 0 A .i

Proof: By substituting C= I; A = 1, B = A, C = B; B = I, C= B respectively, in (2.10) the first three results follow easily. The fourth result follows from the third.

(2. 1 1 ) tr ABC = (vec A')'(1 0 B) vec C (2.11.1) tr AB = (vec A')' vec B

(2.12) tr AZ'BZC = (vec Z)'(A'C' 0 B) vec Z

= (vec Z)'(CA 0 B') vec Z.

Proof: Application of 2.10 and 2.11.1 gives tr AZ'BZC = tr Z'BZCA = (vec Z)' vec BZCA = (vec Z)' (A'C' 0B) vec Z. Transposition obviously does not affect this result; it leads to (vec Z)' (CA 0B') vec Z by (2.4.)

3. DEFINITIONS ON MATRIX DIFFERENTIAL CALCULUS

(3.1) Let

dX = [dxij] , - ~ 1, -

[dY [ dy1 afX drj d~~ d~ ax _axuj

Y= [ being the ijth element of- ax -ax, axi ax



SOME THEOREMS ON MATRIX DIFFERENTIATION 955

where Y is a matrix function of the scalar i, f is a scalar function of the matrix X, and y is a column vector function of the column vector x. The third and the fourth definitions agree if X is a column vector and y is a scalar.

(3.2) If Y is a matrix function of the matrix X, a vec Y/a vec X is a matrix of partial derivatives Oyke/aXii, uniquely ordered. Especially for the treatment of Kronecker matrix products, we define two more partial derivative matrices:

a Y axl dx ay ~ ~ a

(3.3) X bei g of order (n, m).

OXnl &Xnm

a vec Yu, a vec Yiv

d vec X a vec X (4) ~ ~~avec_Yqr]

[a vec X a vec Y &l d vec YUV a vec X a vec X

where Y has the following partitioning:

Y Y [qYqr] q=1*...u; r 1**v.

The matrices [aY/axij] and [a vec Yqr/d vec X] were referred to as D2 and D3 in the abstract.

4. MATRIX DIFFERENTIATION THEOREMS. FIRST PART

In this section we shall list theorems on differentiation of matrix functions not being Kronecker matrix products. Kronecker matrix products will be considered in section 5.

Argument matrices (and vectors) can have any order except (1, 1). Matrix (and vector) functions can have any order. We shall make the assumption: aXUk/aXij =0 for i z k or j H 1, both Xkt and xi being elements of the argument matrix X. With this assumption we can easily identify partial derivatives from relations between differentials. The main reasoning will, therefore, be done in terms of differentials. The three following theorems will enable us to work with matrix differentials:

(4.1) For conformable matrices X and Y: d (XY) = (dX) Y+Xd Y. (4.2) If 1 is a linear scalar function of the matrix X then dl(X) = l(dX). (4.3) d vec X=vec dX.

The following five theorems are straightforward. They involve: f, a scalar function of the matrix X; y, a column vector function of the column vector x; g, a scalar function of the column vector x; Y, a matrix function of the matrix X; the matrices P, M, N, H and the column vector m being of appropriate orders.

(4.4) af/lX = P implies af/l vec X = vec P, and conversely.




(4.5) dy = Mdx implies ay/ax = M', and conversely. (4.5.1) vec dY=N vec dX implies a vec Y10 vec X=N', and conversely. (4.5.2) dg m'dx implies ag/ax = m, and conversely. (4.5.3) df= tr H'dX implies af/aX = H, and conversely.

With the help of the previous theorems we can now establish the main theorem of section 4:

(4.6) If the matrix Y is a function of the matrix X, X having t columns, the matrices Pi, Qi, Rj and Sj (i 1 . f; j=1 . . . g) being of appropriate orders, then

d Y - E Pi(dX)Qi + 2 Rj(dX)'Si i j

implies Si 0 (R)1.

a vecY Y - EQi XP, + E a vec X S

0Si (R.?)t. and conversely.

Proof: We can without loss of generality confine ourselves to dY=P(dX)Q +R(dX)'S. From this we derive

vec dY = vec P(dX)Q + vec R(dX)'S = (Q' 0 P) vec dX + (S' 0 R) vec dX',

by means of (2.10). According to (4.5.1) and (2.4) the first term contributes towards a vec YIO vec X: Q 0 P'; the second term contributes towards a vec Y10 vec X': S0R'. Given the structural relationship between a vec Y/1 vec X' and a vec Y/I vec X the second term contributes towards a vec Y/a vec X:

K 0 (R')I.

[S 03 (R')z.

This concludes the proof. We shall now quote some theorems about linking derivative matrices. They

involve: scalar f; column vectors x, y and z; matrices X, Y and Z; constant matrix A:

'0Z Oy az (4.7) =

ax ax ay a vec Z a vec Y a vec Z

(4.7. 1)- -= 1avecX avecX avec Y

(4.7.2) axf ax af

(4.8) ay Ay ax OAx



SOME THEOREMS ON MATRIX DIFFERENTIATION 957 Proof: According to (4.7): Oyl=Ax/= x ay/lAx. According to (4.5): MAx/ax = A'. From this the theorem follows.

a vec Y a vec Y (4.8.1) - =(I (0 A') AX

a vec X 0veA

(4.8.2) = A' ,x aAx

It goes without saying that the matrix of second-order partial derivatives of a scalar function can be derived by applying (4.6) to the matrix of first-order partial derivatives. Let us use the notation

f ^2 ovecX

Vvec xf =

We can then establish the two useful theorems for scalar f:

(4.9) d2f = tr A (dX)'B(dX) C implies

Vvec xf = '-(A'C' 0 B + CA 0 B'), and conversely.

Proof: By definition d2f= (vec dX)' V2veC xf (vec dX). We rewrite:

tr A(dX)'B(dX)C = 1(vec dX)'(A'C' 0 B + CA 0 B') vec dX, using (2.12). From this the theorem follows.

(4.9.1) d f = (dx)'Adx implies Vrf = -(A + A').

5. MATRIX DIFFERENTIATION THEOREMS. SECOND PART

Obviously the theorems of the previous section, in particular theorem 4.6, cannot be applied to Kronecker matrix products. There is a need, however, for treatment of such products. Three obvious cases are: First-order partial derivatives of a Kronecker product involving a matrix X, second-order partial derivatives of a matrix Y with respect to a matrix X, and third-order partial derivatives of a scalar function of a matrix X with respect to X. We shall employ the two types of derivatives defined in section 3 under (3.2) and (3.3). Five theorems will be developed.

(5.1) dY = A 0B(dX)C implies

[aiu = [A 0 B.iCj.], and conversely.

Proof: Using elementary matrices Eij, having zeros everywhere except in the ijth position where they have unity, we rewrite:

dY = A 0 B(dX)C = A 0 B( ? Eijdxij)C ii

= A 0 E (BEijC)dxij = A 0 B.iCj.dxij ii ii

= Z (A 0 B.jCj.)dxjj. ij




Thus

- = A (0 B.iC.

and

[ = [A 0 B.jCj].

This proves the theorem. (5.2) dZ=F (dcX) G O H implies

- = (vec F) (veC G')' 0 H, and conversely.

Proof: Obviously

dZ = F(dX)G 0 H = (F Gj. 0 H)dx,j.

Thus

az - = F.Gj. 0 H axii

and

-x] = [FFG. 0 H] = (vec F)(vec G')' 0 H.

(5.3) If dY=A0B(dX)C, where dY= [dYqr], dYqr=aqrB(dX)C, then

0O vec_Yqr] A 0 B r qr] = A 0 C 0 B' La vec X

and conversely.

Proof: dYqr= a,rB(dX)C, so

dvec Yr -= C

? (aqrB)/ -4rC 0 B',

da vec X

according to (4.6). Therefore

L9 vec Yqr A 0O vec Xi

(5.4) If dZ=F(dX)G0H, where dZ= [dZqr, dZqr= {Fq.(dX)G.r}zH, q=1 *.* w, then

G 0 (F1.)' (vec H)' r9 vee Zqr~

Ld vec i H LG 0 (Fw.)' (vee HY,



SOME THEOREMS ON MATRIX DIFFERENTIATION 959 and conversely.

Proof: dZqr { Fq.(dX)G.r} H. So

d vec Zqr = {Fq.(dX)G.r} vec H (vec H)Fq.(dX)G.r,

and consequently

la vecZqr= G.r 0 { (vecH )Fq,}' a vecX

= G.r 0 (F.)'(vec H)' by 4.6.

This leads to

'G 0D (F'.)' (vec H)' La vecZ.,] a vecX j

G 0 (Fm.)' (vec H)',

We can conclude from theorems (5.1-5.4) that the derivative [a vec Yqr/d vec X] notationally best suits differentials like dY = A 0 B(dX) C, and that the derivative [a Y/axij] best suits differentials like dY = F(dX)G0H. In case of a mixed differential like dY=AOB(dX)C+F(dX)GXH, we obviously have to choose one of the two derivatives. We cannot have a mixture of definitions. [a Y/axij] definitely has the edge on [a vec Yqr/a vec XI in general. The reason is that application of the second derivative presupposes one unique partition of dY. As can be seen from (5.3) and (5.4) this requirement cannot be met gen- erally. In (5.3) it is the order of A that determines the partition; in (5.4) it is the order of F(dX)G that does so, and there is no reason why the two orders should be identical. A comparable situation arises when Y depends on X'. It is obvious that [a Y/axji] can easily be derived from [a Y/axj I by blockwise transposition. [a vec Yqr/d vec X'] cannot so easily be derived from [a vec Yqr/a vec XI.

6. JACOBIANS

If Y is a new set of m X n variables and X is the original set of also m X n variables, the Jacobian will be equal to (the absolute value of) the determinant of [a vec Y/d vec XI-'. With the help of theorem (4.6) this expression can be evaluated, provided a vec Y/a vec X is nonsingular. Complications arise if columns of a vec Y/a vec X are linearly dependent. This may happen, if e.g., elements of Y are multiple, proportional, or if they have relationships of the type: Yij=Ykt+C c constant. This may happen even if the elements of X are still mathematically independent. If this assumption basic to our analysis has to be dropped, the position becomes even more complicated. Theorem (4.6) cannot be applied then any longer. A case of this type is the transformation Y = TXT' where X' = X, T being a constant matrix. Clearly Y' = Y, so a vec Y la vec X contains multiple columns. It also contains multiple rows because X/=X. Therefore, T'0 T' I# a vec Y/d vec X.

There are several approaches to this kind of problem but we shall not discuss them here.




7. APPLICATIONS

(7.1) Jacobians. In the following we shall evaluate the Jacobians of several transformations. All matrices will be assumed to be square of order (n, n), X and Y variable, A, B etc. constant.

(7.1.1) Y=AXB leads to dY=A(dX)B,

a vec Y = B 0 A',

d vec X

a vec Y -1 |= B 0 A''-1 = I AK-nI BI-n by (2.7) and (4.) a vec X

(7.1.2) Y = X2 leads to d Y= XdX+ (dX)X,

a vec Y =I 0X' +X 1x,

vec YX

vc Y _ =| I 0X' + X 01K = (1[ k> n(1 +

3vec X ~ the (i being the characteristic roots of X.

9 For n = 3 the Jacobian becomes - t

8[(tr X)3 -trX3]2| X|

1 For n = 2 the Jacobian reduces to

4 (tr X)2 I X~

(7.2) Miscellaneous (7.2.1) y=X2 has a vec Y/a vec X=I0X'+X01. Further, dZ=d(a vec Y

la vec X) = I0 (dX)'+ (dX) 01. From this we derive the matrix of second- order partial derivatives:

49Z [axij] = [I 0 Eij] + (vec I) (vec I)' 0 I.

(7.2.2) Y = X X, X being a matrix of order (m, n), has first-order differential d Y = (dX) ?X+X0 dX. This implies

[-] = (vec Im) (vec In) ? X + [X (& I.0 ij. -axij-

and

eIn 0 (I1.)' (vec X)'

i vIn (Im) (vecXY)'

where

dYij = dxijX + xijdX. (i = 1.. m; j 1 n)




(7.3) Maximum Likelihood Estimation. Let U = [u}j] be an (N, T) matrix of normally distributed random variables with Euij = 0 and

(0 if,}#z~ Euijuk= *

'iik if =t

Consider the model

BY+ rZ = U, where B and r are (N, N) and (N, A) matrices of parameters. The problem is to obtain maximum likelihood estimators of B and r. Define Z = [oaik] (i, k= 1

N) so that EU=O and E(1/T)UU'=2. The log likelihood is given by

T T log p(U) = constant + T log B + -log I -| - tr 2;-'AMA',

2 2 where

T [ z (Y . Z'), and A = (B *F)

Differentiation of log p (U) gives:

T T 0 = d log p(U) = T tr B-1dB + - tr ldl' - - tr AMA'd;-

2 2 2 2

-T tr MA'>-dA = T tr {( - MA'-1 }dA

T + - tr(2 - AMA')d2;-',

2

From which follow the likelihood equations:

/B-1\ (1) - MA'>-' and

(2) = AMA'.

After substitution we get:

(3) (Bl) = MA'(AMA')-'.

This equation is nonlinear in the unknown parameters. For dealing with the nonlinearity several procedures have been devised. They involve V2ve AX*,

where X* is the concentrated likelihood function:

X- constant + T log I B I - log I AMA'I. 2

I See Fisk (pp. 6 et seq., 22 et seq.)




The Newton-Raphson method is one of these methods. (If all equations are just-identified solving equation (3) is straightforward.)

We shall now derive V2ve2 AX*. Obviously

AX*/aA -T[(B')-' 0] - T(AMA')-'AM.

(By (4.5.3).) Then

OX* d A -T[(B')-1 :O](dA)'[(B')-1 :0] OA

+ T(AMA')-1(dA)M1A'(AMA')-1AM + T(AMA')-1AM(dA)'(AMA')-1AM - T(AMA')-'(dA)M,

from which follows (by virtue of (4.6)):

[(B')-l .0 (D B,18

Vve, AX* = - 0 . + TMA'(AMA')-1AM 0 (AMA')-'

t (') .O] 0 B -'(AMA')-1AM 0 {MA'(AMA')-l}1.]

+T . TM 0 (AMlA')-'. L (AMA'j)-IAM 0 MA'(AMA')-1J}q_.

(Weput: q=N+A, p=N.)

REFERENCES

[1] Bellman, R. Introduction to Matrix Analysis. New York: McGraw-Hill Book Com- pany, 1960.

[2] Bodewig, E. Matrix Calculus. Amsterdam: North Holland Publishing Company, 1956. [3] Deemer, W. L. and Olkin, I., "Jacobians of matrix transformations useful in multi-

variate analysis," Biometrika, 38 (1951), 345-67. [4] Dwyer, P. S., "Some applications of matrix derivatives in multivariate analysis,"

Journal of the American Statistical Association, 62 (1967), 607-25. [5] Dwyer, P. S. and MacPhail, M. S., "Symbolic matrix derivatives," Annals of Mathe-

matical Statistics, 19 (1948), 517-34. [6] Fisk, P. R. Stochastically Dependent Equations. London: Charles Griffin and Com-

pany, 1967. [7] Jack, H., "Jacobians of transformations involving orthogonal matrices," Proceedings

of the Royal Society of Edinburgh, Section A, LXVII (1964-5), 81-103. [8] Olkin, I., "Note on 'Jacobians of matrix transformations useful in multivariate analy-

sis'," Biometrika, 40(1953), 43-46. [9] Rothenberg, T. J. and Leenders, C. T., "Efficient estimation of simultaneous equation

systems," Econometrica, 32 (1964) 57-76. [10] Roy, S. N. Some Aspects of Multivariate Analysis. New York: John Wiley and Sons,

1957. [11] Turnbull, H. W., "On differentiating a matrix," Proceedings of the Edinburgh Mathe-

matical Society, Second Series, I (1927-29), 111-28. [12] Turnbull, H. W., "A matrix form of Taylor's theorem," Proceedings of the Edinburgh

Mathematical Society, Second Series, II (1930-31), 33-54. [13) Turnbull, H. W., "Matrix differentiation of the characteristic function," Proceedings

of the Edinburgh Mathematical Society, Second Series, II (1930-31), 256-64.




POSTSCRIFT

When reading the proofs, I noticed a paper by Conlisk. This provides some interesting applications from which I take the following: To find

a vec S O vec A

for S = ASA'+ V, where S'= S is an m Xm matrix. Differentiation leads to:

dS = (dA)SA' + A(dS)A' + AS(dA)' + dV

or

dT (dA)SA' + AS(dA)' + dV,

where

dT = dS - A(dS)A'.

Using 4.6 and 4.7.1 we get:

ovecS Ovec T (vec TV' o vec A O vec A \ d vecS,

-[SA' ? I + (I ? SA')+](I 0 I - A' 0 A')-1, where

'I 0 (SA')1.

(I 0 SA')+ is shorthand for

I 0 (SA')m.

REFERENCE

[1] Conlisk, J., "The equilibrium covariance matrix of dynamic econometric models," Jour- nal of the American Statistical Association, 64 (1969) 277-279.



Some Theorems on Matrix Differentiation with Special Reference to Kronecker Matrix Products...

Documents

Transcript of Some Theorems on Matrix Differentiation with Special Reference to Kronecker Matrix Products...