Smooth Monotone Covariance for Elliptical Distributions and Applications in Finance

18
This article was downloaded by: [Xiaoping Zhou] On: 08 May 2014, At: 14:31 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Quantitative Finance Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rquf20 Smooth monotone covariance for elliptical distributions and applications in finance Xiaoping Zhou a , Dmitry Malioutov b , Frank J. Fabozzi c & Svetlozar T. Rachev a a Stony Brook University, Stony Brook, NY, USA. b IBM Research, Yorktown Heights, NY, USA. c EDHEC Business School, Nice, France. Published online: 06 May 2014. To cite this article: Xiaoping Zhou, Dmitry Malioutov, Frank J. Fabozzi & Svetlozar T. Rachev (2014): Smooth monotone covariance for elliptical distributions and applications in finance, Quantitative Finance, DOI: 10.1080/14697688.2014.911949 To link to this article: http://dx.doi.org/10.1080/14697688.2014.911949 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Transcript of Smooth Monotone Covariance for Elliptical Distributions and Applications in Finance

This article was downloaded by: [Xiaoping Zhou]On: 08 May 2014, At: 14:31Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Quantitative FinancePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/rquf20

Smooth monotone covariance for ellipticaldistributions and applications in financeXiaoping Zhoua, Dmitry Malioutovb, Frank J. Fabozzic & Svetlozar T. Racheva

a Stony Brook University, Stony Brook, NY, USA.b IBM Research, Yorktown Heights, NY, USA.c EDHEC Business School, Nice, France.Published online: 06 May 2014.

To cite this article: Xiaoping Zhou, Dmitry Malioutov, Frank J. Fabozzi & Svetlozar T. Rachev (2014): Smoothmonotone covariance for elliptical distributions and applications in finance, Quantitative Finance, DOI:10.1080/14697688.2014.911949

To link to this article: http://dx.doi.org/10.1080/14697688.2014.911949

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Quantitative Finance, 2014http://dx.doi.org/10.1080/14697688.2014.911949

Smooth monotone covariance for ellipticaldistributions and applications in finance

XIAOPING ZHOU∗†, DMITRY MALIOUTOV‡, FRANK J. FABOZZI§ andSVETLOZAR T. RACHEV†

†Stony Brook University, Stony Brook, NY, USA‡IBM Research, Yorktown Heights, NY, USA

§EDHEC Business School, Nice, France

(Received 22 October 2013; accepted 6 March 2014)

Sample covariance is known to be a poor estimate when the data are scarce compared with thedimension. To reduce the estimation error, various structures are usually imposed on the covariancesuch as low-rank plus diagonal (factor models), banded models and sparse inverse covariances. Weinvestigate a different non-parametric regularization method which assumes that the covariance ismonotone and smooth. We study the smooth monotone covariance by analysing its performancein reducing various statistical distances and improving optimal portfolio selection. We also extendits use in non-Gaussian cases by incorporating various robust covariance estimates for ellipticaldistributions. Finally, we provide two empirical examples using Eurodollar futures and corporatebonds where the smooth monotone covariance improves the out-of-sample covariance prediction andportfolio optimization.

Keywords: Smooth monotone covariance; Regularization; Elliptical distributions

JEL Classification: C10, C16, C19

1. Introduction

Covariance estimation is an important problem in risk manage-ment, portfolio selection and asset pricing. Sample covariancehas been widely known as a poor estimate when the samplesize n is close to or less than the dimension p. This problem isfrequently faced by investors because the investment universeis very large relative to the historical return data available toestimate the covariance. Therefore, in the asset managementcommunity, the covariance estimation problem for ‘large psmall n’ situations has attracted considerable attention.

The classical results in random matrix theory have shownthat the eigenvalue spectrum is biased with p/n being fixed asn → ∞, which asymptotically follows the Marcenko–Pasturlaw (Marcenko and Pastur 1967). The theory states that theeigenvalues for the sample covariance are more dispersed thanthose of the true covariance. In the portfolio selection problem,researchers have dealt with this by adjusting the eigenval-ues using different methods such as truncating the smallesteigenvalues (Elton and Gruber 1973), adjusting the princi-pal components in some interval (Laloux et al. 1999, Plerouet al. 1999, Kwapien and Drozdz 2006, Conlon et al. 2007) or

∗Corresponding author. Email: [email protected]

imposing constraints on the condition number of the covariancematrix such that it is well-conditioned (Won et al. 2009). Parkand O’Leary (2010) apply Tikhonov regularization to filterthe eigenvalues and place various estimators for adjusting theeigenvalues in a common framework which differs only in thechoice of the filtering function. Shrinkage estimates are anotherway to adjust the eigenvalues such as weighting the samplecovariance matrix with a structured covariance matrix (Ledoitand Wolf 2003, 2004) such as the identity matrix or constantcorrelation model, or shrinking toward a truncated covariancematrix (Bengtsson and Holst 2002). Because these shrinkageestimates are adjusting the eigenvalues by a linear rate, Karoui(2008) and Ledoit and Wolf (2012) propose non-linear shrink-age estimators by estimating the spectral distribution from thesample spectral distribution, which is an inverse problem tothe Marcenko–Pastur law.

However, estimators that only adjust the eigenvalues stillfail to provide consistent covariance estimates because theeigenvectors are also inconsistent estimates in ‘large p, large n’asymptotics (Silverstein 1995). To reduce the large number ofparameters relative to the sample size, there are many solutionsproposed by imposing a sparse or a low-dimensional structureon the covariance such as factor models (Rubin and Thayer

© 2014 Taylor & Francis

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

2 X. Zhou et al.

1982, Fan et al. 2008), sparse inverse covariance (Yuan andLin 2007, d’Aspremont et al. 2008, Friedman et al. 2008,Rothman et al. 2008) and thresholding methods (Bickel andLevina 2008a).

Another class of regularization methods relies on a naturalordering among the random variables. This is common in fi-nance, examples being autocorrelated time series, bonds withdifferent maturities, bonds with different ratings, options withrespect to different strikes or maturities. Furrer and Bengtsson(2007) consider ‘tapering’ the sample covariance matrix bygradually shrinking off-diagonal elements toward zero. Wu andPourahmadi (2003) and Huang et al. (2006) use the Choleskydecomposition of the covariance matrix to ‘band’ theinverse covariance matrix, which assumes the conditional in-dependence between faraway components. Bickel and Levina(2008b) considered banding the sample covariance matrix orestimating a banded version of the inverse population covari-ance matrix. The models for banding the inverse covariancematrix are also known in the field of statistics as covarianceselection and in the field of machine learning as Gaussiangraphical models or Gaussian Markov Random Fields (MRFs).Parametric models assume a functional form for the covariancesuch as an exponential or power-law decay; Rasmussen andWilliams (2006) provide a general framework for such models.However, despite the successful applications of these bandedor parametric models, in general they still impose very strictstructures on the covariance. Malioutov (2011) proposes anon-parametric regularization method by imposing monotoneconstraints and a smoothness penalty when estimating thecovariance matrix, which he calls a smooth monotonecovariance. He shows the outperformance of this method interms of Frobenius norm error to the sample covariance orprincipal component analysis for the term-rate covariancesby using an interpolated smooth and monotone variant of thesample covariance as the true covariance.

In this paper, we first discuss the smooth monotonecovariance by using various covariance distance measuresunder Gaussian assumptions, which is usually assumed in thevast literature about covariance estimation. In the literature,the Frobenius norm error (or mean-squared error (MSE)) hasbeen mainly used for both the objective function in covarianceestimation and evaluating the performance against the true co-variance. However, Fan et al. (2008) showed that the Frobeniusnorm error may not be a good measure to tell the differencebetween two covariance estimates. They also used a scaledquadratic loss or Frobenius norm error of inverse covarianceto measure distances. We will study various loss functionsincluding Frobenius norm error, quadratic loss, entropy loss(or, Stein’s loss), Kullback–Leibler divergence and Hellingerdistance together for evaluating the performance of our smoothmonotone covariance estimates. We will also explore whetherimproved estimates can be obtained by replacing the Frobeniusnorm error in the objective function with a more statisticallymeaningful distance measure.

In finance, non-Gaussian distributions are more interest-ing because of the heavy-tailedness of financial returns. Wewill review the usual solutions for estimating the covariancematrix under elliptical distributions assumptions which serveas a broad family of particular distributions such as the Stu-dent t distribution and normal inverse Gaussian distribution.

The typical estimation methods include sample covariance,Kendall’s tau transformation method, maximum likelihoodestimation and Tyler’s method (Tyler 1987). For heavy-tailedelliptical distributions, the sample covariance and linear cor-relation become very sensitive to the presence of outliers,and rank correlations such as Kendall’s tau are more robustestimates. Therefore, an intuitive way is to decompose thecovariance matrix into variance components and correlationmatrix, and estimate them separately where the latter can beobtained by transforming from rank correlations. Lindskoget al. (2003) show that the well-known relation between the lin-ear correlation and Kendall’s tau under Gaussian assumptionsexists for all elliptical distributions as well. However, the prob-lem of using the Kendall’s tau transformation to form a corre-lation matrix is that the resulting matrix might not be positivedefinite. A more classical way of estimating the covariancematrix is to use the maximum likelihood estimation (MLE)method. Unlike Gaussian distributions, for elliptical distribu-tions, the ML estimates are usually different from the samplecovariance. For a broad class of elliptical distributions—thenormal variance mixture distributions — the MLE estimate canbe obtained readily using an expectation-maximization (EM)algorithm. However, the MLE method requires an assumptionfor the type of distribution and the existence of its probabilitydensity function. An easier but also robust way to estimate thecovariance is using Tyler’s method (Tyler 1987, Chen et al.2011) by working on the normalized samples. We will investi-gate these four types of covariance estimates using simulationstudies, and incorporate them into the smooth monotone co-variance estimation by using them as initial estimates in theoptimization.

The paper is structured as follows. Section 2 reviews a vari-ety of covariance distances, and investigates the performanceof smooth monotone covariance for the Gaussian distributionunder various distances using simulation studies. In addition,the improvement of smooth monotone covariance in portfoliorisk measurement and the generalization of the smooth mono-tone covariance by minimizing alternative distances are stud-ied. In section 3, we review and compare various covarianceestimates for elliptical distributions and study the performanceof their corresponding smooth monotone covariance estimates.In section 4, we present two empirical application examples—one in Eurodollar futures portfolio selection and one in cor-relation forecasting for corporate bonds—to demonstrate theperformance of the smooth monotone covariance over the sam-ple covariance.

2. Smooth monotone covariance

Malioutov (2011) introduced a non-parametric approach toregularize the covariance by imposing a monotone and smoothstructure. The motivation for this approach is to model thecovariance structure among the interest rate changes along aninterest rate term structure curve, whose correlation exhibitsa monotone and smooth effect. However, this approach hasa variety of potential applications in empirical finance, suchas time series autocorrelation and joint movements of bondfutures prices and credit default rates.

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 3

Compared to the banded or tapered covariances, the smoothmonotone covariance imposes a less strict structure. Under theassumption that random variables are indexed by some naturalordering, one can assume that the covariances (or correlations)between the nearby random variables are higher than those be-tween the distant random variables. Furthermore, smoothnessstructure is assumed to avoid the stairs-effect of the covariancewhich are rarely reasonable in practice.

The mathematical formulation of smooth monotone covari-ance is as follows. Without loss of generality, suppose we havea p-dimensional zero-mean random vector x(t) = (x1(t), . . . ,x p(t))T . Here we ignore the temporal dependence but assumethe samples are independent and identically distributed, and weconsider the cross-sectional covariance matrix P∗ = E[xxT ].Assuming that the available length of sample data is n, and thatthe sample data is X p×n = (x(t1), . . . , x(tn)),then the samplecovariance matrix is:

Q = 1

nX X T

Dvec(P) = vec

⎛⎝⎡⎣ P11 − P21 P22 − P12 P23 − P13 P24 − P14P21 − P31 P22 − P32 P33 − P23 P34 − P24P31 − P41 P32 − P42 P33 − P43 P44 − P34

⎤⎦⎞⎠ ≥ 0

D2vec(P) = vec

([P11 − 2P21 + P31 0 P13 − 2P23 + P33 P14 − 2P24 + P34P21 − 2P31 + P41 P22 − 2P32 + P42 0 P24 − 2P34 + P44

])

However, when n is comparable to or even smaller than p,this estimate is highly inaccurate. We want to find an estimateP with minimal distance under a prespecified error metricD(P, Q) under monotonicity constraints on the covariancesuch that P ∈ M, where

M = {P|P � 0, Pi j ≥ Pik for i < j < k}.In Malioutov (2011), the Frobenius norm error ‖P − Q‖2

F wasused for the distance measure. To avoid the ‘stairs-effect’of P ,a smoothness penalty on the curvature is added to the objectivefunction, and a discrete version of the Laplacian operator isused:

L(P) =∑v

∇2v (P),

where

∇2v ( f ) =

∑u∈N (v)

( f (xu)− f (xv)).

Here, N (v) is the set of neighbours of point v. For covariancePi j the neighbours of vertex v = (i, j) can be set to (i ±1, j) and (i, j ± 1). By penalizing the square of the Laplacianoperator, we formulate the optimization problem as following:

minP

D(P, Q)+ λ∑v

(∇2v (P)

)2, such that P ∈ M

where λ is a parameter adjusting the smoothness. For simplic-ity, in the simulation experiment in this paper, we set λ equalto 1 which implies the equal weight between the accuracy andsmoothness. In practice, the parameter is often chosen by cross-validation techniques.

When the distance D(P, Q) is convex with respect to P,the problem can be represented as a semidefinite optimizationproblem:

minP

D(P, Q)+ λ ‖D2vec(P)‖F , such that P � 0, Dvec(P) ≥ 0.

where the operation vec(P) denotes stacking the columns ofP into a vector, and the matrices D2 and D compute thedifferences of relevant entries of P encoding smoothness andmonotonicity constraints, respectively.

For example, for a four-dimensional covariance matrix P =⎡⎢⎢⎣P11 P12 P13 P14P21 P22 P23 P24P31 P32 P33 P34P41 P42 P43 P44

⎤⎥⎥⎦, the monotone constraint and smooth-

ness penalty can be represented as following:where the zero elements in the D2vec(P) mean not imposingsmoothness across the diagonal elements.

The resulting problem for small p can be readily solved viaan interior point method using one of the convex optimization

toolboxes for MATLAB such as cvx (Grant and Boyd 2012).For large p,Malioutov et al. (2012) studied the fast first-ordermethods for smooth monotone covariances.

Note that it is straightforward to add additional constraintssuch as positivity of correlations or Pii = 1 for correlationmatrices. Without loss of generality, in this paper, we all focuson the correlation matrix

R = D−1�D−1

where D = diag{σi }, σ 2i are the variances (the diagonal

elements of the covariance matrix �). In practice, the co-variance can be decomposed into a correlation matrix and thediagonal elements. This will relieve the difficulty of estimatingthe covariance matrix when it is highly ill-conditioned becauseof heteroskedasticity on the diagonals. In addition, to betterforecast the time-varying variances, GARCH models are oftenused to model the volatilities. In this paper, we do not step intothose issues but only focus on the correlation matrix.

3. Covariance estimation for Gaussian distributions

3.1. Distance measures

In the objective function of smooth monotone covariance, thedistance between the covariance estimate and the samplecovariance (or, any other initial covariance) is needed. Whenevaluating the performance of the covariance estimate, its dis-tance against the true covariance is also needed. In the fol-lowing, we denote the estimated covariance by P , the samplecovariance by Q and the true covariance by P∗. Note that thesample covariance Q may not be full rank (e.g. p > n) so some

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

4 X. Zhou et al.

well-known distances such as Kullback–Leibler divergencefrom an estimate to the sample covariance might not be well-defined. In addition, we would like to have D(P, Q) as aconvex function with respect to P so that we can formulatethe problem as a convex optimization problem, which is mucheasier to solve. So special attention is needed when selecting agood measure for D(P, Q).The simplest and easiest distanceis the Frobenius norm error (or, MSE):

||P − Q||2F = tr((P − Q)T (P − Q))

This measure has been mainly used as an objective functionfor covariance estimation and also for evaluating the perfor-mance of a covariance estimate in the literature (see Ledoit andWolf 2003, 2004, 2012, Chen et al. 2009; Chen et al. 2011).Despite the intuition of minimizing mean-squared error, to thebest of our knowledge, there is no direct connection to classicallikelihood-based statistical inference, and no justification in theliterature for preferring this error metric over alternative onesin applications such as portfolio risk measurement.

In portfolio selection, the variance of the portfolio returnsis frequently used as a proxy for portfolio risk. For ellipticaldistributions, the value-at-risk (VaR) can also be represented asa function of the variance. Therefore, the error of the portfoliovariance estimate is more interesting than the Frobneius normerror of the covariance estimate. Fan et al. (2008) reported thattheir proposed covariance estimate performs almost as well asthe sample covariance when computing the portfolio risk for anequal-weighted portfolio, but greatly outperforms the samplecovariance for a Markowitz optimal weighted portfolio risk.

Under Gaussian assumptions, the maximum absolute errorof the variance between two portfolios with the two covari-ances P and Q (normalized by ||w||2 = wTw) is:

maxw

∣∣∣∣wT Pw − wT Qw

wTw

∣∣∣∣= max

w

(wT (P − Q)w

wTw,wT (Q − P)w

wTw

)= max(λmax(P − Q), λmax(Q − P))

= max(λmax(P − Q),−λmin(P − Q))

= max |λi (P − Q)| = σ1(P − Q)

= ||P − Q||2where || · ||2 is the matrix 2-norm. The maximum is reached atw = v1(P − Q) or w = −vp(P − Q). Note the connectionbetween the squared eigenvalues of (P − Q) and the squaredFrobenius norm of (P − Q):

‖P − Q‖2F = tr((P − Q)2) =

∑i

λi ((P − Q)2)

=∑

i

λ2i (P − Q)

Therefore, minimizing the Frobenius norm of P−Q constrainsthe squared eigenvalues of (P − Q), thereby constraining themaximal absolute value of the eigenvalues of (P − Q).Assum-ing P is the estimated covariance and Q is the true covariance,it then constrains the error between the estimated portfoliorisk and the true portfolio risk. Therefore, the Frobenius normerror is a reasonable measure for covariance estimation whenone needs to estimate the portfolio risk.

Apart from the Frobenius norm error, there are also severalother measures related to statistical inference. To simplify thediscussion, we first focus on the Gaussian distributions. With-out loss of generality, we assume that the mean is known to bezero. Suppose x is a p-dimensional zero-mean random vectorfrom N (0, P):

f (x; P) = (2π)−p/2 |P|−1/2 exp

{−1

2xT P−1x

}If X = (x1, . . . , xn) are n identical and independently dis-tributed (i.i.d.) samples from this distribution, the log-likelihood is (omitting the constants):

L(X; P) = −n

2log |P| − 1

2

n∑i=1

xTi P−1xi

= −n

2log |P| − n

2tr(P−1 Q)

where Q = 1n X X T is the sample covariance matrix.

The maximum likelihood estimates can also be understoodas minimizing Kullback–Leibler divergence, which is anon-symmetric information theoretic measure of the differencebetween two probability distributions. Assuming the densityfunctions of two distributions F and G are f (x) and g(x),then the Kullback–Leibler divergence from F to G is definedas:

DK L(F ||G) =∫

f (x) logf (x)

g(x)dx

Let N (0, Q)be the empirical distribution and N (0, P)be theestimated distribution; to simplify the notation, we write theKullback–Leibler divergence between two zero-mean Gaus-sian distributions using their covariances. Then, the Kullback–Leibler divergence from P to Q is:

DK L(P||Q) = 1

2(tr(P Q−1)− log |P Q−1| − p).

which is also called Stein’s loss (Stein 1956, Yang and Berger1994) or entropy loss (omit the constant 1/2), which is convexwith respect to P .

However, the maximum likelihood estimate of P is obtainedby minimizing the Kullback–Leibler (KL) divergence from Qto P:

DK L(Q||P) = 1

2(tr(P−1 Q)− log |P−1 Q| − p)

which is convex with respect to P−1 but not for P , and is notappropriate for our smooth monotone regularization becausethe objective function is not convex.

Another loss function that appeared in the literature (Haff1980, Yang and Berger 1994, Fan et al. 2008) is the so-calledquadratic loss (QL):

DQL(P||Q) = tr((P Q−1 − I )2)

By some matrix calculation, we can show that the QL is pro-portional to a second-order approximation of KL in a neigh-bourhood of the true covariance (see Haff 1980, Appendix A):

DK L(P||Q) ≈ cDQL(P||Q)The above distances are in fact not a metric, which needs to

be symmetric. Therefore, we also consider another importantstatistical distance: Hellinger distance. Hellinger distance is

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 5

used to quantify the similarity between two probability distri-butions. The Hellinger distance between two probability dis-tributions F and G with densities f (x) and g(x) is defined as

H2(F,G) = 1

2

∫(( f (x)1/2 − g(x)1/2)2dx

= 1 −∫

f (x)1/2g(x)1/2dx .

The Hellinger distance has several nice statistical properties.First, it is a true metric satisfying symmetry and triangularinequality. Second, its value is bounded between 0 and 1. Thisprovides a more intuitive meaning of how close two distribu-tions are when one only looks at the value of the distance.The closer the Hellinger distance is to one, the less likelythat two samples come from the same distribution. In contrast,Kullback–Leibler divergence is not symmetric, and only hasrelative meaning in comparing two estimates.

By some calculations (Appendix B), the Hellinger distancebetween two Gaussian distributions with zero mean and co-variances P and Q is (similarly, we use the covariance P andQ to) :

H2(P, Q) = 1 − |P|1/4 |Q|1/4∣∣∣∣ P + Q

2

∣∣∣∣−1/2

In most of our experiments later in this paper, we will usethe Frobenius norm error (MSE), Kullback–Leibler divergencefrom P to Q (KL), quadratic loss (QL) and Hellinger distance(Hel) to evaluate the performance of the smooth monotonecovariance over sample covariance against the true covariance.

3.2. Simulation study

3.2.1. Comparing sample and smooth monotone covari-ance using various distances. In this subsection, we com-pare the estimation errors of sample covariance and smoothmonotone covariance against the true covariance using variousdistances including the MSE, QL, KL and Hellinger distance.Without loss of generality, for all the simulations, we focuson the correlation estimation, for which the diagonal elementsare normalized to be one. For our simulated data experiments,we focus on the autocovariance matrix of an autoregressiveprocess with order 1 (AR(1) process):

P∗(i, j) = ρ|i− j |.As this covariance satisfies the smooth and monotone assump-tion and is frequently used as a simple example in the literature(e.g. Bickel and Levina 2008b, Chen et al. 2011), we use it asour example throughout the simulation studies in this paperto demonstrate the benefit of smooth monotone covariance.However, we do not aim to use this estimate for estimating thecovariance of an AR(1) process because with known paramet-ric structure of the covariance one can use other methods toestimate ρ more accurately. In general, the true structure of acovariance is often unknown and it might be too strict for oneto impose an assumed parametric structure on the covariance.

In the following experiment, we set the dimension p =30, and the true covariance be the autocovariance matrix ofan AR(1) process with autocorrelation ρ = 0.9, which isa Toeplitz matrix with the first column being (1, ρ, ρ2, . . . ,

ρ p−1)T . We conduct M = 200 trials at each sample size n

varying from 40 to 300. Using the sample correlation as aninput for the smooth and monotone estimator, we obtain thesmooth monotone correlation. Then, we compute the averagedistance measure between the estimated covariance P and thetrue covariance P∗ using the four types of distances definedearlier: MSE, QL, KL and Hellinger distance.

The results are shown in figure 1. It shows that the smoothmonotone covariance uniformly outperforms the sample co-variance for all four measures. This is important because theusual performance measures used in the literature for covari-ance comparison are mostly the MSE, but the other distanceswith direct connection to basic statistical estimation principlesare not investigated. From this experiment, we see that thesmooth monotone covariance not only reduces the MSE, butalso reduces the other errors. In fact, the reduction of error ismore significant when using other measures than using MSE.

Fan et al. (2008) point out that the MSE may not be a goodmeasure to evaluate the performance of their factor-model-based estimation of the covariance matrix. They show thatunder QL and MSE of the inverse covariance, the performanceis much better than the sample covariance. Here we see similarresults for our smooth monotone covariance that the reducederrors are much more significant when using the QL, KL andHellinger as distance measures. However, we do not claimMSE is not a good measure for our smooth monotone co-variance but recommend using various distance measures toevaluate the covariance when the errors under MSE is notsignificant. When ρ is smaller (e.g. ρ = 0.7), we find oursmooth monotone covariance greatly outperforms the samplecovariance under MSE as well.

3.2.2. Comparison of sample and smooth monotone co-variance on portfolio risk measurement. Although thereare numerous distance measures for comparing the covari-ances, in practice what matters more is whether it leads toa more accurate estimate for the portfolio risk, or better per-formance in portfolio selection. The importance of covarianceestimation also varies under different objectives. For an equallyweighted or random portfolio, the sample covariance or a factorrisk model often provides good estimates for portfolio risk(Saxena and Stubbs 2010). Fan et al. (2008) show that the factormodel and sample covariance have almost the same perfor-mance for portfolio risk when applied to an equally weightedportfolio, but the factor model has much less error for anoptimal weighted portfolio. Pafka and Kondor (2003) alsofound that the effect of noise depends on whether the problemarises in a risk measurement or asset allocation context.

In the following, we conduct a similar experiment as Fanet al. (2008). We simulate M = 500 samples, with sample sizevarying from n = 40 to n = 200, from a p = 30 dimensionalGaussian distribution where covariance is an AR(1) autoco-variance matrix with ρ = 0.7. We note that for ρ = 0.7,the sample covariance performs even worse than with ρ =0.9, and we choose 0.7 to show that even in such a difficultcase with equally weighted portfolios we do not suffer muchwhen estimating portfolio variances. However, for optimizedportfolios, the sample covariance performs poorly even forρ = 0.9. We estimate the portfolio risk by sample covarianceand smooth monotone covariance, and compare the estimated

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

6 X. Zhou et al.

0 100 200 3001

1.5

2

2.5

3

3.5

4MSE

SampSM

0 100 200 3000

1

2

3

4

5

6KL

SampSM

0 100 200 3000

2

4

6

8

10

12

14QL

SampSM

0 100 200 3000

0.2

0.4

0.6

0.8

1Hel

SampSM

Figure 1. Estimation errors of sample and smooth monotone covariance.

risk and true risk, where we use the portfolio standard deviationas a proxy for portfolio risk.

Figure 2 presents the mean relative estimation error of port-folio risk with respect to varying sample size. In the left plot,we calculate the portfolio for an equally weighted portfolio.It shows that sample covariance and smooth monotone co-variance have roughly the same performance. Both of themestimate the true portfolio risk very well (relative error is lessthan 8% for all sample size from 40 to 200). In the right plot,we calculate the error for an optimal weighted portfolio. Theweights are determined by w = P−1μ, where we assume μbe a vector with all equal elements, 0.05. We find that thesmooth monotone covariance leads to much smaller error thanthe sample covariance.

In fact, it is known that the sample covariance actuallyunderestimates the risk of optimal portfolios and is a poorinput for the Markowitz portfolio selection (see Muller 1993,Karoui 2008). Michaud (1989) argues that optimization inthese cases is in fact an ‘error-maximization’ process that willoverestimate the expected returns and underestimate the vari-ances. In the following experiment, we artificially create aportfolio consisting of p = 30 financial instruments withexpected returns ranging from −0.05 to 0.05. The covarianceis AR(1) autocovariance with ρ = 0.7.We generate the returnsfrom the joint Gaussian distribution with sample size n = 50,and calculate the mean-variance optimal portfolio with targetportfolio return μ∗ ranging from 0 to 0.05.

The mean-variance optimal portfolio problem in the exam-ple is:

minwT�w

s.t. wTμ = μ∗, wT 1 = 1, w > 0

For each simulated sample, we calculate the mean-varianceefficient frontier using the sample covariance and smooth

monotone covariance and repeat the experiment for M = 200times. We then compare the difference between the optimalportfolio risk and the true portfolio risk σ − σ ∗ using samplecovariance and smooth monotone covariance. In figure 3, weplot the 10, 25, 50, 75 and 90% quantiles of the differencebetween the estimated portfolio risk and true portfolio risk for20 portfolio target return levels ranging from 0 to 0.05. Ascan be seen from figure 3, using the sample covariance willunderestimate the portfolio risk, while the smooth monotonecovariance has a much smaller error.

In the literature, there are also a variety of portfolio optimiza-tion methods using alternative risk measures or utilities suchas mean VaR and mean-CVaR. VaR is defined as the possibleloss for a trading position or portfolio under a given confidencelevelα, and has become a popular risk measure for setting bankcapital requirements. CVaR (conditional value-at-risk), alsocalled expected-tail loss (ETL), average value-at-risk (AVaR)or expected shortfall (ES), is the expectation of losses giventhat the losses exceed the VaR: CV a Rα = E[L|L > V a Rα].Kondor et al. (2007) studied the noise sensitivity of portfoliooptimization under various risk measures and found similar buteven stronger sensitivity than variance for alternative measuressuch as CVaR. For elliptical distributions, since the smoothmonotone covariance has less estimation error for portfoliovariance, it naturally leads to less estimation error for thesealternative risk measures as well because VaR and CVaR arein fact functions of the variance for elliptical distributions.

3.3. Generalization of smooth monotone covariance usingalternative distances

In this subsection, we consider generalizing the smooth andmonotone covariance by using various alternative distances as

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 7

40 60 80 100 120 140 160 180 2000.03

0.035

0.04

0.045

0.05

0.055

0.06

0.065

0.07

0.075

0.08Equally−weighted portfolio

Sample size

Rel

ativ

e er

ror o

f por

tfolio

risk

SampSM

40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Optimal portfolio

Sample size

Rel

ativ

e er

ror o

f por

tfolio

risk

SampSM

Figure 2. Estimation error of sample and smooth monotone covariance for equally weighted portfolio risk and optimal portfolio risk.

0 0.01 0.02 0.03 0.04

−0.1

−0.05

0

0.05

0.1

0.15sample covariance

portfolio returns

erro

r of p

ortfo

lio ri

sk

0 0.01 0.02 0.03 0.04

−0.1

−0.05

0

0.05

0.1

0.15SM covariance

portfolio returns

erro

r of p

ortfo

lio ri

sk

Figure 3. Estimation errors of portolio risk using sample and smooth monotone covariance in Markowitz portfolio selection.

the objective function such as Kullback–Leibler divergence.However, Hellinger distance H(P, Q) and the Kullback–Leibler divergence from Q to P (i.e. DK L(Q||P)) is not con-vex with respect to P . The quadratic loss DQL(P||Q) and theKullback–Leibler divergence from P to Q (i.e. DK L(P||Q),orthe Stein’s loss) are two convex functions with respect to P , andcan be used as alternative distances in our smooth monotoneestimates.

In section 3.1, we showed that the matrix 2-norm ||P − Q||2is the maximum absolute error of the normalized portfoliovariances. Minimizing the 2-norm of P − Q can also be trans-formed to a semidefinite programming problem as follows:

minP,s

s

s.t. − s I ≺ (P − Q) ≺ s I

When there are no monotone or smoothness constraints, weknow that the estimates under all norms recover the samplecovariance exactly. By imposing the monotone constraints andsmoothness penalty, we can generalize the smooth monotone

covariance by minimizing the alternative distances. However,the smoothness penalty parameter may be difficult to choosewhen minimizing KL or QL. From optimization theory, insteadof minimizing the joint objective, we can consider the con-strained form, where we move the smoothness from the objec-tive function to the constraint. We can see that the smoothnessparameter in the joint form serves as a Lagrangean multiplierfor the smoothness constraint in the second form. To comparethe performance of these different estimates, we formulatethe smoothness penalty as a constraint for the three differentproblems.

Let Q be the sample covariance and P be the covarianceestimate, then the optimization problems of smooth monotonecorrelation under the Frobenius norm error, 2-norm error, KLand QL are, respectively, formulated as follows:

• Minimizing Frobenius norm:

minP

||P − Q||2F ,s.t. P � 0, diag(P) = 1, P ∈ M,

∑(∇2 P)2 ≤ α

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

8 X. Zhou et al.

• Minimizing 2-norm:

minP,s

s

s.t. − s I ≺ (P − Q) ≺ s I

P � 0, diag(P) = 1, P ∈ M,∑

(∇2 P)2 ≤ α

• Minimizing KL:

minP

DK L(P||Q) = tr(P Q−1)− log∣∣∣P Q−1

∣∣∣ − p

s.t. P � 0, diag(P) = 1, P ∈ M,∑

(∇2 P)2 ≤ α

• Minimizing QL:

minP

DQL(P||Q) = tr((P Q−1 − I )2)

s.t. P � 0, diag(P) = 1, P ∈ M,∑

(∇2 P)2 ≤ α

For minimizing KL and QL, the inverse of the sample co-variance matrix is needed, which is undefined when p > n.Therefore, for the following simulations, we focus on the well-defined cases (n > p).

In the following, for all cases, we set the smoothness con-straint parameter α = 1 to make sure the constraints are thesame. Note that α and λ are different, and it is not trivial to finda direct relation between α and λ. The true covariance in thisexperiment is AR(1) covariance with ρ = 0.9 and dimensionp = 30. The sample size varies from 40 to 200. Figure 4shows the distance from the optimum covariance estimates tothe sample covariance using MSE, 2-norm, QL and KL as theobjective function, respectively. This is used to confirm that theoptimization is correct—for training data, when the distancemeasure used for optimization is the same as the one used forevaluation, it should achieve the minimal distance.

However, when making comparisons to the true covariance,this may not be true due to the finite sample effect. Figure 5shows the distances of optimal covariance estimates against thetrue covariance. We can see that, when comparing the obtainedcovariance estimates with the true covariance, minimizing KLor QL does not lead to minimal distance even when it is evalu-ated by KL or QL. Instead, when minimizing MSE or 2-norm,the errors against the true covariance under any of the fourdistances are much smaller.

The results in figures 4 and 5 also show that minimizingthe QL or KL to the sample covariance under smooth andmonotone constraints leads to much larger MSE and 2-normerrors. In fact, the performance can be even much worse thanthe sample covariance. This is likely caused by the fact thatcomputing KL and QL requires inverting the sample covari-ance matrix, which is very sensitive for small samples. Inaddition, the estimates that are close under KL and QL mightbe very far away under MSE and 2-norm (and vice versa).Therefore, minimizing KLor QLagainst the sample covariancematrix is not a good choice.

Comparing the results of minimizing MSE and 2-norm, wesee that they are very close to the true covariance for any ofthe four distances, suggesting both of them are more robustestimates. The similarity of the results between minimizingMSE and 2-norm is natural because the squared Frobeniusnorm is the sum of singular values of (P − Q)2, while thesquared 2-norm is the largest singular value of (P − Q)2.In some extreme cases (e.g. AR(1) with ρ = 0.9999), the

difference between minimizing Frobenius norm and 2-normmay become substantial, but in typical cases they lead to verysimilar answers. Therefore, we will use Frobenius norm in thestudies performed in the rest of this paper.

4. Covariance estimation for elliptical distributions

4.1. Elliptical distributions

Elliptical distributions have been an important extension ofthe Gaussian distributions because they provide a better fit toobserved financial returns according to many empirical studies(see Rachev et al. 2005). It is more flexible than the Gaussiandistribution to allow for heavier tails and higher peaks.

If the characteristic function of a random vector X has therepresentation φX (t) = eitTμψ(t T�t), where μ is locationparameter, � is the dispersion matrix and ψ is the character-istic generator, we say that X follows elliptical distributionEd(μ,�,ψ). With a specified characteristic generator, thetype of distribution is then decided, and a relation betweenthe dispersion matrix and the covariance matrix is known ifthe covariance exists.

Under the classical Markowitz portfolio selection frame-work, a representation of the portfolio returns and portfoliovariances is needed. The elliptical distributions fit well withinthis framework because portfolio risk can be represented interms of the weights in each asset and their covariance ma-trix. Mathematically speaking, if X ∼ Ed(μ,�,ψ) is a d-dimensional random vector, and Y = AX + b ∈ Rk is a lineartransformation of X where A ∈ Rk×d and b ∈ Rk, then we willhave Y ∼ Ek(Aμ+ b, A�AT , ψ). For elliptically distributedreturns X with dispersion matrix�, the dispersion matrix for aportfolio with weightsw = (w1, . . . , wd)

T is then representedby wT�w.

In financial markets, in addition to the observedheavy-tailedness and leptokurtosis, the distribution of financialreturns have also been often observed to exhibit negative skew-ness. The normal variance mean mixture distributions includea lot of examples, such as the skewed Student t distribution,generalized hyperbolic and normal tempered stable distribu-tions, which provide much more realistic fittings to observedfinancial returns than the Gaussian distribution. However, tofocus on the covariance estimation, in this paper we ignore theskewness part, considering only the normal variance mixturecases which have zero skewness and include typical examplesfor elliptical distributions such as the multivariate Student tdistribution, normal inverse Gaussian distribution and normaltempered stable distributions (with zero skewness).

Normal variance mixture distributed random variables canbe represented as

X = μ+ √W AZ (1)

where Z ∼ N (0, I ), W > 0 is a scalar random variableindependent of Z , μ is the location vector and A is a lowertriangular matrix that determines the dispersion matrix � =AAT .

The typical elliptical distributions used in empirical financecan be formulated by choosing different distributions for themixture random variable W . For example:

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 9

50 100 150 2000

1

2

3

4

5

6

7MSE

50 100 150 2000

1

2

3

4

5

62 Norm

50 100 150 2000

20

40

60

80

100

120QL

50 100 150 2000

2

4

6

8

10

12

14KL

Min FnormSM Min KLSM Min 2normSM Min QLSM

Figure 4. Distances of various smooth monotone covariance estimates against the sample covariance.

50 100 150 2001

2

3

4

5

6

7MSE

50 100 150 2001

2

3

4

5

62 Norm

50 100 150 2002

4

6

8

10

12

14QL

50 100 150 2001

2

3

4

5

6KL

Samp Min FnormSM Min KLSM Min 2normSM Min QLSM

Figure 5. Distances of various smooth monotone covariance estimates against the true covariance.

(1) when W ∼ Inverse Gamma (v/2, v/2), we say thatX follows multivariate Student t distributiont (μ,�, v);

(2) when W ∼ Inverse Gaussian (χ,ψ),X is said to fol-low a normal inverse Gaussian distributionN I G(μ,�, χ,ψ); and

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

10 X. Zhou et al.

(3) when W follows a classical tempered stable distribu-tion CT S(α, λ), X is said to have a normal temperedstable distribution N T S(μ,�, α, λ) (Rachev et al.2011).

Therefore, we will discuss the general methods of estimatingcovariance for elliptical distributions, which can then be ap-plied in fitting the above distributions that are useful in finance.

4.2. Covariance estimation for elliptical distributions

Since the sample covariance (or Pearson’s correlation) is notrobust to the outliers, it is generally even worse for the heavy-tailed elliptical distributions. Therefore, alternative estimatorsare needed. In this subsection, we review some alternativeestimators of the covariance (or dispersion) matrix for ellipticaldistributions.

Under the Gaussian assumption, the sample covariance co-incides with the ML estimator. However, for elliptical distri-butions, these two estimates are generally different. For thenormal variance mixture distributions, the EM algorithm isvery useful. Note that when W is known, then X |W = w

follows a Gaussian distribution N (μ;w�),and the densityof X can be written as pN (X |W ;μ,w�)p(W ; θW ). In eachiteration, the EM algorithm first treats W as a known hiddenvariable, calculates the expectation of the log-likelihood withthe known W and then finds the next W by maximizing theexpected log-likelihood.†

Another solution is to separate the covariance matrix intovariance components and the correlation matrix. The marginalvariances are obtained by sample variance or time-varyingvolatility models in the non-i.i.d. cases. For the normal dis-tribution, there is a well-known relation between the linearcorrelation and Kendall’s tau ρ = sin(τπ/2). Lindskog et al.(2003) show that this relation also holds for all elliptical dis-tributions. Given that the Kendall’s tau, as a rank correlationcoefficient, is less sensitive to the outliers, we can use it toobtain the linear correlation and thus the covariance.

Another covariance estimation method for elliptical distribu-tions is Tyler’s method (1987), whose idea is estimating withthe normalized samples instead of the original samples. Let{x1, . . . , xn} be n random vectors drawn from a p-dimensionalelliptical distribution and, without loss of generality, assumethe mean is zero. Then, xi has the representation xi = √

Wi ui ,where

√Wi is the mixture random variable, and ui is a p × 1

zero-mean jointly Gaussian random vector with covariance� = AAT .

Denote the normalized sample by si = xi/ | |xi ||2, then thedensity of si is given by (Frahm 2004):

f (si ;�) = (p/2)

2π p/2

√det(�−1)

(sT

i �−1si

)−p/2,

†For the estimation of multivariate Student t-distribution, seeLiu and Rubin (1995) and Kotz and Nadarajah (2004). For theestimation of generalized hyperbolic distribution, see McNeil et al.(2005), Protassov (2004), Hu (2005), and Breymann and Luthi(2008). In particular, for the NIG distribution, see Karlis (2002).For normal tempered stable distribution, the density function isanalytically intractable, and the estimates are usually obtained throughcharacteristic function.

and a fixed-point representation of the maximum likelihoodsolution of � when n > p is (Tyler 1987):

� = p

n

n∑i=1

si sTi

sTi �

−1si.

The solution can be found through fixed point iterations(Chen et al. 2011):

� j+1 = p

n

n∑i=1

si sTi

sTi �

−1j si

.

4.3. Simulation study

4.3.1. Comparison of various covariance estimates forelliptical distributions. In this subsection, we compare thefour covariance estimators: sample covariance, Kendall’s tautransformation, EM algorithm andTyler’s method.As an exam-ple, we choose the most typical elliptical distribution—Studentt distribution in our simulations. We focus on comparing thecorrelation estimation by converting the covariances to cor-relations. For our experiment, we focus on the cases wherep < n such that the MLE and Tyler’s estimate both exist. Forthe Kendall’s tau transform method, the obtained correlationmay not be positive definite, so we adjust it by setting thenon-positive eigenvalues to be a small number ε (here we setε = 10−6). Suppose the raw correlation transformed fromKendall’s tau is P, if P = V DV T , then we let D be a diagonalmatrix with Dii = Dii 1{Dii>0} + ε1{Dii ≤0}, and adjust theestimate to be P = V DV T .

In the following simulation, we compare the correlation esti-mates obtained by the above four methods with the prespecifiedtrue correlation under four types of distances – MSE, KL, QLand Hellinger distance. The settings for the experiment are asfollows: dimension p = 30, sample size ranging n from 40to 200 with a step size 10, true correlation is autocovarianceof AR(1) process with ρ = 0.9 and the number of trials Mat each sample size is 1000. The estimation errors of the fourtypes of covariance estimates against the true covariance underthe four types of distance measures are shown in figure 6.Note that the KL and Hellinger distance are derived fromGaussian distributions and we use them here as two derivedloss functions between two covariance matrices rather thanthe true KL and Hellinger distance for Student t distributions.

We see that the EM and Tyler’s method lead to very close es-timates and have the least errors under any of the four distances.The Kendall’s tau method leads to a smaller MSE and QL thanthe sample covariance, suggesting it provides a more robustestimate than sample covariance under these two distances.However, due to the non-positive definiteness of the correlationtransformed from Kendall’s tau, we have to adjust the non-positive eigenvalues to be a small positive value. Therefore, theerrors of the covariance estimates by Kendall’s tau transformmethod evaluated under the KL and Hellinger distance are verylarge. To reduce the error under KL and Hellinger distanceas well requires a better adjustment method, which might bemainly related to shrinkage methods to make the eigenvaluesless dispersed. We do not discuss those issues here. The mostinteresting fact is that Tyler’s method provides very robustestimates in heavy-tailed elliptical distributions and the result

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 11

50 100 150 2001.5

2

2.5

3

3.5

4

4.5

5MSE

50 100 150 2002

4

6

8

10

12

14KL

50 100 150 2004

6

8

10

12

14

16

18QL

50 100 150 2000.5

0.6

0.7

0.8

0.9

1Hel

Samp Kendall EM Tyler

Figure 6. Errors of various covariance estimates for t(5) distribution.

is very close to the EM method. However, the EM method isoften more complex to implement, and furthermore, it may noteven be available if there is no explicit analytical form for thedensity function.

Given that the EM and Tyler’s method provide more robustcovariance estimates, when we use these estimates as inputsfor the smooth and monotone regularization, we should obtainmore robust smooth monotone covariance estimates as well.In the following, we use the four correlations—sample covari-ance, Kendall’s tau, EM and Tyler’s estimate—as inputs to getthe smooth monotone covariance for each. The settings of theexperiment are the same as above, and the results are shownin figure 7.

Comparing the estimation errors of the smooth monotonecovariance with the errors of raw estimates, we can see thatthe smooth monotone covariance greatly reduces the error forany of the four distances. Similar to the raw estimates, withsmooth monotone regularization the EM and Tyler’s methodalso provide the closest estimates to the true covariance. Notethat the EM and Tyler’s method cannot be applied when thecovariance is not invertible if p < n. In that case, one may usethe Kendall’s tau transformation method or modify the initialcovariance and the iteration steps of the EM and Tyler’s methodsuch that the obtained covariance in each iteration is positivedefinite.

4.3.2. Smooth monotone covariance estimates of portfoliorisk for elliptical distributions. We repeat the experimentof portfolio optimization for elliptical distributions by cal-culating the estimation error of portfolio risk for Markowitzmean-variance portfolios. Because the results for mean-CVaRportfolios are very similar, we do not report here.

The settings of the experiment are exactly the same as for theGaussian distributions, except the distribution here is changedto the multivariate t distribution with five degrees of freedom.Figure 8 shows the estimation error for the optimal portfoliorisk using mean-variance optimization where the distribution ist(5), sample size n is 50, dimension p is 30 and the covarianceis AR(1) covariance with ρ = 0.7. We see that for Student tdistribution, the sample covariance underestimates more riskthan for Gaussian distributions, and smooth monotone covari-ance has a much smaller error.

In our simulation study for elliptical distributions, we usedthe multivariate t distribution as our examples. However, theKendall’s tau transformation method and Tyler’s method forcovariance estimation apply to all the elliptical distributionssuch as the normal inverse Gaussian, generalized hyperbolicdistributions and normal tempered stable distributions. There-fore, smooth monotone covariance using various inputs forthese distributions can be obtained easily as well.

5. Empirical applications

In this section, we apply the smooth and monotone covarianceestimation to real problems in the finance area. In practice,the entities or instruments that we consider might be indexedby some natural order. For example, in yield curve modelling,the yield of different bonds can be ordered by their maturitiessuch as one month, three months, six months, one year, twoyears and so on. When studying joint movement over time ofthe yields with different maturities, an accurate estimate fortheir covariance plays a central role. A monotone assumptionis natural such that the covariance between adjacent yields are

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

12 X. Zhou et al.

50 100 150 2001

1.5

2

2.5

3

3.5

4MSE

50 100 150 2000.8

1

1.2

1.4

1.6

1.8

2KL

50 100 150 2001

1.5

2

2.5

3QL

50 100 150 2000.2

0.25

0.3

0.35

0.4

0.45Hel

Samp Kendall EM Tyler

Figure 7. Errors of various smooth monotone covariance estimates for t(5) distribution.

0 0.01 0.02 0.03 0.04−0.2

−0.1

0

0.1sample covariance

portfolio returns

erro

r of p

ortfo

lio ri

sk

0 0.01 0.02 0.03 0.04−0.2

−0.1

0

0.1SM covariance

portfolio returns

erro

r of p

ortfo

lio ri

sk

Figure 8. Estimation errors of optimal portfolio risk for t(5) distribution using sample and smooth monotone covariance.

greater than faraway yields, and to prevent the stairs effect asmoothness penalty is applied.

5.1. Eurodollar futures portfolio optimization

To make the empirical study transparent and easily replicable,we use the most liquid futures contract traded in the finan-cial market: the Eurodollar (ED) futures. The ED futures arefrequently used by asset managers to control interest rate risk(i.e. level risk and yield curve risk), trading desks to hedgeinterest risk exposure and corporate treasurers to hedge fund-

ing costs. A Eurodollar future is similar to a forward rateagreement to borrow or lend US$1 000 000 for three monthsstarting on the contract settlement date. For historical reasons,ED futures are priced as 100 − x where x is the forwardrate. A combination of ED rates with different expiries at aspecific date reflects a snapshot of the US forward curve. Thecovariance of yield changes are often well approximated bysample covariance or principal component analysis methoddue to the well-known dominant three factors (level, slope andcurvature). However, if a portfolio has significant exposureto these futures, then the first principal component will beso dominant that it is hard to measure the relative merit of

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 13

1995 2000 2005 201091

92

93

94

95

96

97

98

99

100

ED p

rices

1995 2000 2005 2010−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

ED s

prea

d ch

ange

s

Figure 9. The daily close prices and spread changes of Eurodollar futures.

different covariance models. Also, in practice, the exposure tolevel-shift is usually tightly controlled so the portfolios can beviewed as mostly spread portfolios. Therefore, we focus on thecovariance of spread changes rather than the yield changes: ifyi (t) is the price of the i th ED contract at time t, the i th spreadis xi (t) = yi (t)− yi+1(t).We use the spreads as our syntheticinstruments, for which better covariance modelling can havedramatic improvements over simple models.

The data we used for our study are the historical monthlylast prices of 32 generic ED tickers, namely the first con-tract, second contract and so on. The data were downloadedfrom Bloomberg terminal and the generic tickers were madeby rolling each contract to the next contract using a rollingmethod in Bloomberg called ‘with active futures’. The periodwe choose is from 1 January 1995 to 31 December 2012,providing 216 months in total and there are 32 tickers withfull data in our study period. Figure 9 shows the daily closingvalues of the 32 ED yields and the daily changes of the 31 EDspreads.

5.1.1. Comparison of portfolio selection using sample andsmooth monotone covariance. We compare the perfor-mance of portfolio selection based on the spreads using samplecovariance and smooth monotone covariance. The portfolio’sdimension is 31, and the sample size we use for estimatingthe covariance is 40. To focus on the performance comparisonof the two covariances, we assume the variances are the sameas the sample variances, and use the sample correlation andsmooth monotone correlation to obtain the two covariances.Note that better modelling of the variances can be achieved byapplying GARCH models, but that is beyond the scope of thispaper.

We use following mean-variance portfolio selection withoutconsidering the transaction costs:

maxw

wTμ− 1

2λwT�w

The explicit solution of this portfolio is:

w = 1

λ�−1μ.

Note that for the ED spreads portfolio selection here,‘returns’ in the traditional portfolio selection process are thespread changes. The weight w in fact represents the positionsinstead of the amount of capital invested. In our simulation, we

specify constant total absolute positions at the beginning ofeach period. To simplify the calculation, we use the normalizedweights:

w∗ = w

|w|T 1.

By assuming a total absolute position be 1, and the initialwealth is $1, we conduct the single-period portfolio optimiza-tion according to the above selection algorithm using samplecorrelation and smooth monotone correlation from the 101thmonth June 2003 to December 2012. We will examine theportfolio’s performance when the covariances are calculatedbased on different sample sizes ranging from 30 to 100. Tofacilitate analysing the impact of different sample sizes on thecovariance estimation, we should fix the forecasting of spreadchanges so that it does not interfere with our judgment aboutthe covariance. Therefore, we first forecast the spread changesusing an enlarging window approach and will use the forecastsfor all the cases discussed below. Note that one can also select awindow size and use a moving window approach for the spreadchanges model, but that is beyond the focus of our discussionof covariance modelling.

For the prediction model of spread changes, we apply a pth-order forward linear predictor which predicts the current valuesbased on past samples:

r(t) = −a(2)r(t −1)−a(3)r(t −2)− . . .−a(p +1)r(t − p)

where the coefficients a = (1, a(2), . . . , a(p+1)) are obtainedby Levison-Durbin recursion (‘lpc’ function in Matlab SignalProcessing Toolbox). The reason why this model is used is tomake the spread changes prediction model simple enough sothat one can replicate the results easily. Moreover, we find thatit provides a satisfactory prediction of the returns, leading topositive portfolio returns in our case. Here we choose p = 5,and calibrate the coefficients at each period using all availablehistorical data.

The weights of the portfolio are rebalanced every month,and we calculate the realized gains (i.e. the increased wealth)of the two portfolios every month. Figure 10 shows the cumu-lative gains of the two portfolios (which are also equal to thecumulative returns since the initial wealth is $1) using sam-ple covariance and smooth monotone covariance with samplesize n equal to 40. By using the almost 10 years back-testingperformance, we can see that the smooth monotone covarianceimproves the gross gains significantly, as well as improving the

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

14 X. Zhou et al.

03 04 05 06 07 08 09 10 11 12 13−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Year

Cum

ulat

ive

Gro

ss G

ains

SampSM

Figure 10. Realized portfolio return using sample and smoothmonotone covariance.

Sharpe ratio. Here, the Sharpe ratio is normalized to an annualbase by multiplying by

√12.

5.1.2. Sensitivity of portfolio performance to the lengthof sample data. To investigate the performance of smoothmonotone covariance and sample covariance to varying samplesize, we conduct the above back-testing for sample size from 30to 100 with a step size equal to 5, and plot the cumulative returnsand Sharpe ratio at each sample size for the two covariancesin figure 11.

From figure 11, we see that the portfolio returns and Sharperatio are more sensitive to the change of sample size when usingthe sample covariance. Overall, we see that the smooth mono-tone covariance outperforms the sample covariance greatly forboth small and large sample sizes as measured in terms of bothportfolio returns and Sharpe ratio.

5.2. Corporate bonds application

5.2.1. Block Smooth monotone covariance. In this sub-section, we provide another example to show the potentialapplications of smooth monotone covariance for fixed incomesecurities—corporate bonds. Since corporate bonds of differ-ent companies usually have different coupon rates, differentmaturities and some might have callable options, it would bedifficult to use individual corporate bonds data for our example.To simplify our illustration, we use the corporate bond indicesconstructed by Bloomberg Financial Markets. The index isreferred to as the Bloomberg BVal Index. There are indicesfor each credit rating. We will use the Bloomberg BVal Indexfor three investment-grade credit ratings: AA, A and BBB.† Inaddition, for each credit rating, Bloomberg provides an indexby maturity (i.e. a yield curve by index). There are nine maturityindexes for each credit rating: 1 year, 2 years, 3 years, 5 years, 7

†The tickers for these three indices are, respectively, IGUUDC,IGUUAC and IGUUBC. These indices are populated with US dollardenominated fixed rate bonds issued by domestic companies in UnitedStates.

years, 10 years, 15 years, 20 years and 25 years. Consequently,there are in total 27 corporate indices (three credit ratingsmultiplied by nine maturities). Similar to Eurodollar futures,the first principal component is dominant in the correlation ofyield changes. Therefore, instead of forecasting the correlationof the yield changes, we focus on forecasting the correlationof spread changes xit − xi,t−1, where xit = yit − yi−1,t andyit is the value of the i th index for each credit rating at timet . In total, we have 24 corporate bond spread indices (threecredit ratings multiplied by eight spreads). Figure 12 showsthe historical daily yields for the 27 corporate bond indicesand daily changes of the 24 corporate bond spreads from 1January 2010 to 31 December 2012.

Given that for each rating the correlation of the spread changesexhibits a smooth and monotone effect, the correlation of spreadchanges for all ratings together is then a block matrix withsmooth and monotone effect on the diagonal blocks if wearrange the spreads of the same ratings together. For the non-diagonal components, we can also add a smoothness penalty.

Suppose the diagonal blocks are P11, P22, . . . PB B, the totalcovariance is

P =

⎡⎢⎢⎣P11 P12 . . . P1B

PT12 P22 . . . P2B

. . . . . . . . . . . .

PT1B PT

2B . . . PB B

⎤⎥⎥⎦We impose monotone constraints on the diagonal blocks Pii ,

and smoothness penalty on both the diagonal and off-diagonalblocks:

minP

D(P, Q)+ λ1

∥∥∥D(i i)2 vec(Pii )

∥∥∥2+ λ2

∥∥∥D(i j)2 vec(Pi j )

∥∥∥2,

such that P � 0, D(i)vec(Pii ) ≥ 0.

In figure 13, the left plot shows the sample correlation ofthe yield spread changes, and the right plot shows the smoothmonotone correlation. It can be found that, by using the mono-tone and smooth regularization, the noise effect in the covari-ance has been reduced.

5.2.2. Correlation forecasting. To compare the sample co-variance and smooth monotone covariance, we conduct a studyof forecasting future correlation coefficient matrices over sev-eral years of corporate bond indices data. The approach is asfollows: using the previous N days data as a training window,calculate the sample correlation and the block smooth mono-tone correlations. We compare the correlation with the real-ized future correlations using 50 days out-of-sample samplecorrelation. We calculate the daily spread changes and its sam-ple correlation using historical data with a specified windowlength, and estimate the block smooth monotone correlations.To choose the smoothness penalty parameter λ, we use anapproach similar to cross-validation.

Suppose we have a p-dimensional sample data X p×M , andwe divide it into m groups so that in each group there aren = M/m data points and denote the i th group data as X (i).For each group data X (i) (i = 1, 2, . . . ,m), we estimate thesmooth monotone correlation using different λ, and calculatethe Frobenius norm error against the sample correlation usingthe rest data X\X (i). For each group i , we find out the optimal

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 15

40 60 80 100−0.2

0

0.2

0.4

0.6

0.8

Gro

ss R

etur

n

SampSM

40 60 80 100−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Shar

pe R

atio

SampSM

Figure 11. Sensitivity of portfolio return and Sharpe ratio with respect to sample size.

2010 2011 20120

1

2

3

4

5

6

7

Cop

orat

e bo

nd y

ield

2010 2011 2012−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Cor

pora

te b

ond

spre

ad c

hang

es

Figure 12. Daily yield and spread changes of corporate bond indices.

510

1520

510

1520

−1

−0.5

0

0.5

1

Sample Correlation

510

1520

510

1520

−1

−0.5

0

0.5

1

Block SM Correlation

Figure 13. Sample correlation and Block SM correlation.

λi which leads to the smallest error. Then, we calculate theaverage optimal λ over all the m groups: λ = 1

m

∑mi=1λi .

In the following, for the 769 data points in our sample, we usethe first 300 data points for in-sample data, and the last 469 datapoints for out-of-sample testing. We divide the 300 data pointsinto 10 groups each with 30 days data, and then using the abovecross-validation technique to choose the optimal λ, we find the

approximate optimal λ for the spread changes correlation tobe 0.2. For simplicity, we let the smoothness penalty on thediagonal blocks and off-diagonal blocks be the same.

Figure 14 shows the realized Frobenius norm error of sam-ple correlation and block smooth monotone correlation withrespect to varying training window size from 10 to 50. Weuse running windows with shifts by five business days over

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

16 X. Zhou et al.

10 15 20 25 30 35 40 45 505

5.5

6

6.5

7

7.5

8

8.5

9

Sample Size

MSE

SampSM

Figure 14. Correlation forecasting performance using samplecorrelation and SM correlation.

the 469 daily data for the spread changes. The figureshows that the block smooth monotone correlation has betterout-of-sample forecasting performance than the sample corre-lation. The smooth and monotone regularization appears espe-cially valuable for a small training window size, demonstratingrobustness in forecasting risk in scenarios with severely limiteddata.

6. Conclusion

In this paper, we study high-dimensional covariance estimationwith smooth and monotone regularization. This methodologyis useful when financial instruments have an ordering. Usinga series of simulation studies, we first analyse its performanceunder Gaussian distributions by using various distances tomeasure the error relative to the true covariance. Our findingssuggest that the smooth monotone covariance works uniformlybetter than sample covariance under not only Frobenius normerror, but also KL, QL and Hellinger distances. We also findits improvement for optimal portfolio risk measurement bycomparing the Markowitz efficient frontiers. We also study thegeneralization of the smooth monotone covariance by usingalternative distances as objective functions such as 2-norm, KLand QL. However, we find that it is not advisable to minimizethe KL or QL although they are good for evaluating covarianceestimates.

To fit heavy-tailed distributions which are more common infinance, we study the smooth monotone covariance forelliptical distributions using various inputs such as samplecovariance, Kendall’s tau transformation method, ML esti-mates (EM algorithm) and Tyler’s estimate. We find that byusing alternative robust covariance inputs, the performanceof smooth mototone covariance can be improved. Finally, weprovide two applications. Our first application is to Eurodollarfutures portfolio optimization. We find that the smooth mono-tone covariance outperforms the sample covariance by gen-erating a larger return and a higher Sharpe ratio. The otherapplication is to corporate bond portfolios, where we extendthe smooth monotone covariance into block smooth monotonecovariance, and demonstrate that it is more robust than thesample covariance for out-of-sample covariance prediction.

This smooth monotone regularization can be generally ap-plied when there are term-rate instruments related portfoliorisk modelling such as credit derivatives based on bond yields,commodity futures or foreign exchange forwards. Potentialapplications can also be found in volatility surface modellingand time-series autocovariance estimation, where the smoothand monotone assumption is appropriate.

References

Bengtsson, C. and Holst, J., On portfolio selection: Improvedcovariance matrix estimation for Swedish asset returns. Paperpresented at the 31st Meeting, Euro Working Group on FinancialModeling, Ayia Napa, Cyprus, 2002.

Bickel, P.J. and Levina, E., Covariance regularization by thresholding.Ann. Stat., 2008a, 36(6), 2577–2604. doi:10.1214/08-aos600.

Bickel, P.J. and Levina, E., Regularized estimation of largecovariance matrices. Ann. Stat., 2008b, 36(1), 199–227. doi:10.1214/009053607000000758.

Breymann, W. and Luthi, D., ghyp: A package on generalizedhyperbolic distributions. Tech. rep. Institute of Data Analysis andProcess Design, 2008. Available online at: http://cran.r-project.org/.

Chen, Y., Wiesel, A. and Hero, A.O., Shrinkage estimation ofhigh dimensional covariance matrices. In Proceedings of the2009 IEEE International Conference on Acoustics, Speech andSignal Processing, pp. 2937–2940, 2009 (IEEE Computer Society:Taipei).

Chen, Y., Wiesel, A. and Hero, A.O., Robust shrinkage estimationof high-dimensional covariance matrices. IEEE Trans. SignalProcess., 2011, 59(9), 4097–4107.

Conlon, T., Ruskin, H.J. and Crane, M., Random matrix theory andfund of funds portfolio optimisation. Physica A, 2007, 382(2),565–576.

d’Aspremont,A., Banerjee, O. and El Ghaoui, L., First-order methodsfor sparse covariance selection. SIAM J. Matrix Anal. Appl., 2008,30(1), 56–66.

Elton, E.J. and Gruber, M.J., Estimating the dependence structure ofshare prices-implications for portfolio selection. J. Finance, 1973,28(5), 1203–1232.

Fan, J., Fan, Y. and Lv, J., High dimensional covariance matrixestimation using a factor model. J. Econom., 2008, 147(1),186–197.

Frahm, G., Generalized elliptical distributions: Theory andapplications. Dissertation, Universitat zu Koln, 2004.

Friedman, J., Hastie, T. and Tibshirani, R., Sparse inverse covarianceestimation with the graphical lasso. Biostatistics, 2008, 9(3),432–441.

Furrer, R. and Bengtsson,T., Estimation of high-dimensional prior andposterior covariance matrices in Kalman filter variants. J. Multivar.Anal., 2007, 98(2), 227–255.

Grant, M.C. and Boyd, S.P., The CVX Users’ Guide, 2012 (CVXResearch: Cambridge, UK).

Haff, L., Empirical Bayes estimation of the multivariate normalcovariance matrix. Ann. Stat., 1980, 8(3), 586–597.

Hu, W., Calibration of multivariate generalized hyperbolicdistributions using the EM algorithm, with applications in riskmanagement, portfolio optimization and portfolio credit risk. PhDThesis, Florida State University, 2005.

Huang, J.Z., Liu, N., Pourahmadi, M. and Liu, L., Covariancematrix selection and estimation via penalised normal likelihood.Biometrika, 2006, 93(1), 85–98.

Karlis, D., An EM type algorithm for maximum likelihood estimationof the normal-inverse Gaussian distribution. Stat. Probab. Lett.,2002, 57(1), 43–52.

Karoui, N., Spectrum estimation for large dimensional covariancematrices using random matrix theory. Ann. Stat., 2008, 36(6),2757–2790.

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4

Smooth monotone covariance 17

Kondor, I., Pafka, S. and Nagy, G., Noise sensitivity of portfolioselection under various risk measures. J. Bank. Finance, 2007,31(5), 1545–1573.

Kotz, S. and Nadarajah, S., Multivariate t-distributions and theirApplications, 2004 (Cambridge University Press: Cambridge, UK).

Kwapien, J. and Drozdz, S., The bulk of the stock market correlationmatrix is not pure noise. Physica A, 2006, 359, 589–606.

Laloux, L., Cizeau, P., Bouchaud, J.P. and Potters, M., Noise dressingof financial correlation matrices. Phys. Rev. Lett., 1999, 83(7),1467–1470.

Ledoit, O. andWolf, M., Improved estimation of the covariance matrixof stock returns with an application to portfolio selection. J. Empir.Finance, 2003, 10(5), 603–621.

Ledoit, O. and Wolf, M., A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal., 2004, 88(2),365–411.

Ledoit, O. and Wolf, M., Nonlinear shrinkage estimation oflarge-dimensional covariance matrices. Ann. Stat., 2012, 40(2),1024–1060.

Lindskog, F., Mcneil, A. and Schmock, U., Kendall’s tau for ellipticaldistributions. In Credit Risk – Measurement, Evaluation andManagement, edited by G. Bol, G. Nakhaeizadeh, S.T. Rachev, T.Ridder and K.-H. Vollmer, pp. 149–156, 2003 (Physica-Verlag:Heidelberg).

Liu, C. and Rubin, D.B., ML estimation of the t distribution using EMand its extensions, ECM and ECME. Stat. Sin., 1995, 5(1), 19–39.

Malioutov, D., Smooth isotonic covariances. Paper presented at theStatistical Signal Processing Workshop (SSP), 2011 IEEE, Nice,2011.

Malioutov, D., Corum, A. and Cetin, M., Smooth and monotonecovariance regularization: Fast first-order methods. Working Paper,IBM Research, 2012.

Marcenko, V.A. and Pastur, L.A., Distribution of eigenvalues for somesets of random matrices. Sb. Math., 1967, 1(4), 457–483.

McNeil, A.J., Frey, R. and Embrechts, P., Quantitative RiskManagement: Concepts, Techniques, and Tools, 2005 (PrincetonUniversity Press: Princeton, NJ).

Michaud, R.O., The Markowitz optimization enigma: Is ‘optimized’optimal? Financ. Anal. J., 1989, 45(1), 31–42.

Muller, P., Empirical Tests of Biases in Equity Portfolio Optimization,1993 (Cambridge University Press: Cambridge).

Pafka, S. and Kondor, I., Noisy covariance matrices and portfoliooptimization II. Physica A, 2003, 319, 487–494.

Park, S. and O’Leary, D.P., Portfolio selection using tikhonov filteringto estimate the covariance matrix. SIAM J. Financ. Math., 2010,1(1), 932–961.

Plerou, V., Gopikrishnan, P., Amaral, L.A.N., Meyer, M. and Stanley,H.E., Scaling of the distribution of price fluctuations of individualcompanies. Phys. Rev. E, 1999, 60(6), 6519–6529.

Protassov, R.S., EM-based maximum likelihood parameter estimationfor multivariate generalized hyperbolic distributions with fixed λ.Stat. Comput., 2004, 14(1), 67–77.

Rachev, S.T., Kim, Y.S., Bianchi, M.L. and Fabozzi, F.J., FinancialModels with Lévy Processes and Volatility Clustering, 2011 (Wiley:Hoboken, NJ).

Rachev, S.T., Menn, C. and Fabozzi, F.J., Fat-tailed and Skewed AssetReturn Distributions: Implications for Risk Management, PortfolioSelection, and Option Pricing, 2005 (Wiley: Hoboken, NJ).

Rasmussen, C.E. and Williams, C.K.I., Processes for MachineLearning (Vol. 1), 2006 (MIT Press: Cambridge, MA).

Rothman,A.J., Bickel, P.J., Levina, E. and Zhu, J., Sparse permutationinvariant covariance estimation. Electron. J. Stat., 2008, 2,494–515.

Rubin, D.B. and Thayer, D.T., EM algorithms for ML factor analysis.Psychometrika, 1982, 47(1), 69–76.

Saxena, A. and Stubbs, R.A., Alpha alignment factor: A solution tothe underestimation of risk for optimized active portfolios.Axioma,Inc., Research Report, 15, 2010.

Silverstein, J.W., Strong convergence of the empirical distributionof eigenvalues of large dimensional random matrices. J. Multivar.Anal., 1995, 55(2), 331–339.

Stein, C., Some problems in multivariate analysis, Part I. TechnicalReport, Department of Statistics, Stanford University, 1956.

Tyler, D.E., A distribution-free M -estimator of multivariate scatter.Ann. Stat., 1987, 15(1), 234–251.

Won, J.H., Lim, J., Kim, S.J. and Rajaratnam, B., Maximum likelihoodcovariance estimation with a condition number constraint.Technical Report 2009–10, Department of Statistics, StanfordUniversity, 2009.

Wu, W.B. and Pourahmadi, M., Nonparametric estimation of largecovariance matrices of longitudinal data. Biometrika, 2003, 90(4),831–844.

Yang, R. and Berger, J.O., Estimation of a covariance matrix usingthe reference prior. Ann. Stat., 1994, 22(3), 1195–1211.

Yuan, M. and Lin, Y., Model selection and estimation in the Gaussiangraphical model. Biometrika, 2007, 94(1), 19–35.

Appendix A

Proof Using the following formula

log det(I + δV ) =∞∑

n=1

(−1)n−1

nδntr(V n),

we have

log det(P Q−1) = log det(I + P Q−1 − I )

= tr(P Q−1 − I )− 1

2tr((P Q−1 − I )2)+ . . .

≈ tr(P Q−1)− p − 1

2tr((P Q−1 − I )2)

Thus,

2DK L (P||Q) = tr(P Q−1)− log det(P Q−1)− p

≈ 1

2tr((P Q−1 − I )2) = 1

2DQL (P, Q).

Appendix B

Proof

H2(P, Q) = 1 −∫

1

(2π)d/2 |P|1/4 |Q|1/4

× exp

{− xT (P−1 + Q−1)x

4

}dx

= 1 −

∣∣∣∣ ( P−1+Q−1

2

)−1∣∣∣∣1/2

|P|1/4 |Q|1/4×

∫1

(2π)d/2∣∣∣∣ ( P−1+Q−1

2

)−1∣∣∣∣1/2

× exp

{− xT (P−1 + Q−1)x

4

}dx

= 1 −

∣∣∣∣ ( P−1+Q−1

2

)−1∣∣∣∣1/2

|P|1/4 |Q|1/4 = 1 −∣∣∣ P−1+Q−1

2

∣∣∣−1/2

|P|1/4 |Q|1/4

= 1 −∣∣∣∣ P + Q

2P Q

∣∣∣∣−1/2|P|−1/4 |Q|−1/4

= 1 − |P|1/4 |Q|1/4∣∣∣∣ P + Q

2

∣∣∣∣−1/2

Dow

nloa

ded

by [

Xia

opin

g Z

hou]

at 1

4:31

08

May

201

4