Hedge Fund Replication: Putting the Pieces Together
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Hedge Fund Replication: Putting the Pieces Together
Electronic copy available at: http://ssrn.com/abstract=2202270
Hedge Fund Replication: Putting the Pieces Together
Vincent Weber∗† and Florian Peres∗
Prime Capital AG
March 17, 2013
Abstract
In this paper, we aim to bring together into one common framework various advances in factor-
based hedge fund replication. Our replication methodology relies on a set of investable dynamic
risk factors extracted from futures contract prices and on an automatic variable and model
selection procedure. The methodology is then validated by creating out-of-sample replicating
portfolios for the monthly returns of more than 7,000 hedge funds ranging from 2006 to 2012
and under the assumption of transaction costs. Our results suggest that hedge fund replication
is on average possible and works best for liquid strategies.
∗Address: Bockenheimer Landstr. 51-53, 60325, Frankfurt am Main, Germany.†Corresponding author: [email protected].
1
Electronic copy available at: http://ssrn.com/abstract=2202270
1 Introduction
Whether the returns of hedge funds should be attributed to luck or skills has been widely debated
both among academic and practitioners. At the core of the literature related to this debate are
studies like Fung and Hsieh (2001, 2006) who suggest that hedge fund performance can be to a
significant extent attributed to alternative betas rather than skills. These findings led practitioners
to develop a number of investment strategies seeking to replicate hedge fund returns while avoiding
some of their common pitfalls such as high fees, a lack of transparency and restricted liquidity.
This paper focuses on factor-based replication which is to our knowledge the most widely applied
approach to hedge fund replication1. Factor-based replication expands on Sharpe’s return-based
style analysis2 by removing some of its constraints, such as having the style weights sum up to a
hundred percent and not allowing negative style weights.
We contribute to the field by putting together into one common framework various advances in
hedge fund replication:
1. like Hasanhodzic and Lo (2007), we estimate a large number of linear factor models using
returns of individual hedge funds and analyze the performance of those replicators;
2. we analyze hedge fund returns not only using asset class indices but also rule-based strategies
as proposed by Fung and Hsieh (2001);
3. we use model and variable selection techniques to find the best set of variables for a given
hedge fund as presented by Giamouridis and Paterlini (2009);
4. we employ only fully investable risk factors constructed from futures contract prices;
5. we attempt to incorporate transaction costs into the performance analysis and analyze metrics
of trading intensity;
6. we consider the time-varying nature of hedge fund risk exposures by using an Exponentially
Weighted Moving Average (EWMA) rolling window;
7. we use a large universe of more than 7,000 hedge funds ranging from 2006 to 2012, a period
covering both bull and bear markets.
To our knowledge, this is one of the most extensive hedge fund replication study to date, due to
the large size of the analyzed dataset as well as due to the depth of the replication methodology
employed. Our results show that hedge fund replication is on average possible and works best
for liquid strategies. This study shows that hedge fund replication is a viable alternative to a
diversified portfolio of hedge funds and while it may not match the risk-adjusted returns of some
illiquid strategies, it comes with the additional benefits of transparency and liquidity.
The paper is organized as follows: in section 2 we introduce our approach to factor models, in
section 3 we present the design of our data analysis and in section 4 we discuss the results of that
data analysis. Finally, section 5 offers a brief summary with concluding remarks.
1Amenc et al. (2007) provides a comprehensive review of competing approaches such as the payoff distributionapproach developed by Amin and Kat (2003).
2Sharpe (1992).
2
2 Methodology
2.1 Stylized facts
We provide some stylized facts about hedge fund returns: we show first that a linear factor model
can provide a good understanding of hedge fund returns and secondly that style classification labels
do add little value in understanding the risk characteristics of hedge fund returns. The dataset of
hedge fund returns is extracted from a proprietary database provided by Prime Capital. A key
feature of this database is that it provides a consolidated representation of various commercial
(TASS, Hedgefund.net, Eurekahedge) and proprietary sources3. Additionally, a fund’s potential
various series and share classes are mapped into one unique leading series in order to avoid double
counting. Last but not least, we rely on Prime Capital’s proprietary style nomenclature as described
in Appendix A. The dataset is described in more details later in section 3.
We apply Principal Component Analysis (PCA) to each style universe of hedge funds and look
at the dynamics of the top eigenvectors over time. Figure 1 shows over time and for each style,
the number of eigenvectors required to explain 2/3 of the cross-sectional variability of hedge fund
returns. It shows that a limited number of eigenvectors ranging between one and twelve is sufficient
to explain about 2/3 of the cross-sectional variability of hedge fund returns. Furthermore, these
numbers have been stable over time, despite the significant growth experienced by the hedge fund
industry. Another interesting feature which can be noted is that the number of eigenvectors required
to explain 2/3 of the cross-sectional variability tends to decrease during periods of financial stress
like 1998 and 2008: in other words, the returns of hedge funds become more correlated during
periods of financial stress.
Next, we would like to know to which extent we can rely on style classification labels to assign
risk factors to a given hedge fund. If style classification labels were informative, we could for
each style select a specific subset of relevant risk factors. This would allow us to optimize our
replication model at the style level instead of having only one set of factors applied to each hedge
fund, independently of its style. To achieve this, we compare the style classification labels to the
correlation profiles of hedge funds and measure to which extent they match: for each month, we
cluster the hedge fund universe into five groups using the k-means4 clustering technique. This
algorithm aims to group our styles into k = 5 groups, corresponding to our number of styles. If
style labels were highly informative, then each particular style would likely be associated with one
particular cluster and if style labels were not informative at all, then each particular style would
likely be distributed evenly among the various clusters. In order to estimate the concentration of
the distribution of each style across the various clusters, we decide to use the Shannon entropy
measure5.
3This includes hedge funds not reporting to commercial databases but to Prime Capital.4A description of this clustering technique is provided in Appendix B.5Shannon (1948).
3
Figure 1: A linear model can explain 2/3 of the variability of hedge fund returns with a limitedset of factors.
Definition 1 (Shannon entropy) Nent is defined as the Shannon entropy, so that:
N ient = exp
(−∑5
n=1 pin ln pin
)where pin is the proportion of hedge funds with style label i assigned to cluster n and:
• 1 ≤ N ient ≤ 5;
• N ient = 1⇒ very informative;
• N ient = 5⇒ no information at all.
Figure 2 shows the evolution over time of the entropy measure for the various hedge fund styles.
The value is constantly near to its maximum of five, showing that the styles are almost evenly
distributed among the various risk clusters, and hence that classification labels do not contain
meaningful information relative to the risk characteristics of hedge funds.
4
Figure 2: Entropy and clustering - Evolution over time.
To conclude, we can assert that hedge fund returns can be well captured by a linear factor
model, that a small number of risk factors can explain a large portion of the variability of hedge
fund returns and that style classification labels do not contain meaningful information relative to
the risk characteristics of hedge funds.
2.2 Factor models
Following Roncalli and Weisang (2011), we briefly review factor models and their applications
to hedge fund replication. Based on the work of Fung and Hsieh (2001) as an extension of Sharpe’s
style analysis, factor models were first introduced as a tool for performance analysis.
Assumption 1 The structure of all hedge fund returns can be summarized by a set of risk factors
ri, i = 1, ..., n.
While in Sharpe’s style analysis only asset class style indices are used as risk factors, factor-based
models for hedge fund analysis make also use of dynamic risk factors, which are used to take
into account the dynamic trading strategies employed by hedge funds, such as for example trend
following on various asset classes. Dynamic risk factors are simple rule-based trading strategies
which aim to capture some of the major styles to which hedge funds tend to be exposed to. In
order to apprehend what replicators can and cannot capture, we first need to keep in mind the
decomposition of managerial skill into security and market timing. When hedge fund managers
trade, the correlation between their returns and those of our set of risk factors change, which leads
5
to a periodic computation of new allocation weights. The idea of replicating a hedge fund is thus
to take long or short positions in a set of assets classes indices and dynamic risk factors, so as
to minimize the tracking error between the replicator and its underlying hedge fund, which makes
hedge fund replication an iterative process. A typical replication procedure can thus be decomposed
in two steps:
• step 1: factor model for hedge fund returns: rHFi,t = αi +∑n
k=1 βi,krk,t + εi,t where beta
exposures βi,k are multiplied by the risk premia associated with those exposures rk,t, αi is
the manager-specific alpha and εi,t the statistical error.
• step 2: identification of the replicating portfolio strategy: rreplicatori,t =∑n
k=1 βi,krk,t where
βi,k are the values of βi,k which minimize the previous sum and are estimators of βi,k.
More specifically, our methodology is based on fully investable risk factors represented by listed
futures across asset classes. We consider rule-based strategies and asset class indices like momentum
and carry across asset classes, stock market risk, duration risk or USD risk. We select the best subset
of risk factors using variable and model selection techniques: the Least Absolute Shrinkage and
Selection Operator (LASSO) associated to the Least Angle Regression (LAR), which is called LAR
LASSO for variable selection, and the Bayesian Information Criterion (BIC) for model selection.
Furthermore, time-varying betas are taken into account with the integration of a linear regression
associated to an Exponentially Weighted Moving Average (EWMA) weight vector. To finish, we
rigorously validate the model replicating more than 25,000 years of hedge fund performance data.
This approach follows a five-step process which is described in the next subsections:
1. generation of risk factors;
2. adjustment of hedge fund returns for serial correlation;
3. variable selection;
4. estimation;
5. validation.
2.3 Risk factors
We create a number of asset class and dynamic risk factors out of a large universe of listed
futures6. Table 1 and Table 2 show the list of exchanges as well as the types of contracts used. The
risk factors are computed on a daily basis, with any potential rebalancing occurring at the market
closing price and with transaction costs as described in Appendix C. The rollover schedule used
to construct continuous futures price series is described in the Appendix E. The sections below
describe the various types of risk factors employed. All futures data are obtained from Bloomberg.
6We aim to use an investment universe which is, based on our experience, similar to those used by institutionalmanaged futures managers.
6
Table 1: Investment universe - Futures exchanges.
North America EMEA Asia Pacific
CBOE Futures Exchange Eurex Australian Securities Exchange
Chicago Board of Trade ICE Futures Europe Hong Kong Futures Exchange
Chicago Mercantile Exchange London Metal Exchange Korea Exchange
Commodity Exchange, Inc. NYSE LIFFE Osaka Securities Exchange
ICE Futures US South African Futures Exchange Singapore Exchange
Kansas City Board of Trade Tokyo Financial Exchange
Montreal Exchange Tokyo Stock Exchange
New York Mercantile Exchange
Table 2: Investment universe - Futures contracts.
Equity Bond Currency Soft IR Metals Energy Volatility
CAC40 Aust 3Yr AUD Cattle AUD Bills Aluminium Crude Oil VIX
DAX Aust 10Yr CAD Cocoa CAD Bills Copper Gasoline
DJIA Can 10Yr CHF Coffee Euribor Gold Heating Oil
EuroStoxx 50 Euro Bobl EUR Corn Eurodollar Lead Natural Gas
FTSE 100 Euro Bund GBP Cotton Euroswiss Nickel
Hang Seng Euro Schatz JPY Hogs Euroyen Silver
JSE Top 40 JPY 10Yr KRW Soy Sterling Zinc
KOSPI2 Long Gilt MXN Sugar
MSCI Taiwan US 2Yr NZD Wheat
NASDAQ 100 US 5Yr
Nikkei 225 US 10Yr
Russell 2000 US Long
S&P500
TOPIX
2.3.1 Asset class factors
The technique employed to get an asset class factor portfolio is similar to the one presented by
Alexander and Dimitriu (2004). We first select a universe of markets (Equity, Currency, etc.) and
then we estimate the correlation matrix Ω of the assets’ returns and compute the largest eigenvector
of this matrix Ω. For each market, we compute volatility-adjusted weights such that w∗i = wi/σi
where wi is the loading of asset i to the largest eigenvector and σi is the volatility of asset i. To
finish, we standardize the loadings w∗i of the largest eigenvector such that∑n
i=1w∗i = 1. The asset
class factors are summarized in Table 3.
7
Table 3: Asset classes.
Risk factor Description Start date
DURATION Largest principal component of a basket of 10-year government bond futures 31-Jan-2003
DXY Largest principal component of a basket of currency futures 31-Jan-2003
ES S&P500 futures, proxy for equity market risk 31-Jan-2003
NATRES Largest principal component of a basket of base metals and energy futures 31-Jan-2003
PM Largest principal component of a basket of precious metals futures 31-Jan-2003
VIX 20% notional exposure to CBOE VIX futures 31-Mar-2007
Only the ES and VIX factors represent a direct static exposure to one underlying futures market.
In case of the VIX factor, we use only 20% notional exposure to the underyling CBOE VIX futures
in order to scale the volatility of this factor to a magnitude similar to that of equity indices.
2.3.2 Market spreads
The methodology to construct market spreads portfolio is based on dollar neutral long/short
portfolios. We create regional and style spreads such as for example the TOPIX vs. S&P500 spread
to represent the Japan country spread and the Russell 2000 vs. S&P500 spread to represent the
small cap stocks style spread. We summarize the market spread factors in Table 4.
Table 4: Market spreads.
Risk factor Description Start date
APAC Basket of Asia-Pacific index futures vs. S&P500 index futures 31-Jan-2003
DM Dow Jones Industrial Average vs. S&P500 index futures 31-Jan-2003
EU Basket of European index futures vs. S&P500 index futures 31-Jan-2003
NQ NASDAQ vs. S&P500 index futures 31-Jan-2003
RTA Russell 2000 vs. S&P500 index futures 31-Jan-2003
TPX TOPIX vs. S&P500 index futures 31-Jan-2003
2.3.3 Carry strategies
In order to construct portfolios of carry strategies, we choose to harvest roll yields across futures
markets by creating portfolios which long markets in backwardation and short markets in contango.
We create carry risk factors for a global cross-asset set of instruments as well as for asset class specific
universes. The various carry factors are summarized in Table 5. Portfolio construction follows a
two-step process which is summarized below.
We initially compute for each market i in the selected universe its expected carry-to-risk ratio κi
as shown on Figure 3 and such that κi = dFi/dtσi
where dFi is the variation of the futures price of
the market i and σi the volatility of instrument i.
8
Next, we build a portfolio with a target volatility of 10% p.a., weighting each instrument according
to its carry-to-risk ratio and its volatility. Firstly, for each instrument i, we define the basic weights
mi such that:
mi = Wi × 1Θi
where:
• Wi: weight of the futures category of instrument i as shown in Appendix F;
• Θi: total number of instruments in the same futures category as instrument i.
Secondly, we compute the vector w of initial portfolio weights such that for each i:
wi = mi × κiσi
where σi is the volatility of instrument i.
Finally, we compute the vector f of target portfolio weights by rescaling the previously created
portfolio to the target volatility σtarget = 10% p.a., so that:
f = ωσtargetσinit
where σinit =√wT Ω w and Ω is the expected covariance matrix7.
For equity indices, instead of the approach presented above, we decide to use the risk parity
strategy EQRP, which creates an equal risk contribution portfolio of the various equity futures
coupled with an overall target volatility of 10% p.a.. The rationales behind this decision are
twofold. Firstly, not all equity indices are net of dividends, making the futures term structure a
poor indicator of the actual carry that can be obtained by holding the underlying stocks. Secondly,
stocks’ earnings are not entirely paid as dividends but are also being reinvested or used to buy back
outstanding shares, making thus the concept of carry more challenging to define with stocks than
with other markets such as Currency or Bond8. Therefore, for simplicity purposes, we assumed an
identical positive carry for each equity index which leads us to the construction of the EQRP risk
factor.
Table 5: Carry strategies.
Risk factor Description Start date
COC Curve slide strategy applied to commodity futures 31-Jan-2003
FIC Curve slide strategy applied to fixed income futures 31-Jan-2003
FXC Curve slide strategy applied to currency futures 31-Jan-2003
GLC Curve slide strategy applied to a global universe of futures 31-Jan-2003
VA Beta hedged curve slide strategy applied to VIX futures9 31-Mar-2007
EQRP Risk parity strategy applied to a set of equity futures 31-Jan-2003
7We estimate the covariance matrix Ω using the exponential smoothing approach presented by RiskMetrics (1996)and applying a decay factor of 0.97. Furthermore, we consider the fact that we are dealing with heterogeneous timezones by using the adjustment method suggested by Bouchaud and Potters (2003).
8We refer the reader to Ilmanen (2011) for an extensive review of carry strategies across asset classes.9We refer the reader to Simon and Campasano (2012) who studied the VIX futures basis and present a similar
strategy.
9
Figure 3: Roll yield measurement.
2.3.4 Momentum strategies
We proceed in two steps to generate momentum risk factors. We first measure the trend strength
of each market and then construct a portfolio using the same approach as the one used for the set
of carry risk factors. Instead of replicating positions in lookback straddles like Fung and Hsieh
(2001), we employ an approach more commonly used by practitioners as presented by Burghardt
et al. (2010).
Definition 2 (Trend strength measurement) For each continuous futures price series i, we
define the trend strength TS such that:
TS = Moving Average10days (P ∗)−Moving Average250days (P ∗)
where P ∗ is the standardized price vector such that 0 ≤ P ∗i ≤ 1 for each i in P ∗.
We subsequently compute a portfolio with a target volatility of σp = 10% p.a. by using the same
portfolio construction methodology as the one previously described for the carry strategies10.
The choice of the parameters for the crossing moving average corresponds to a representative
parametrization of a medium term trend following strategy and leads to risk characteristics similar
to that of a diversified index of trend followers such as the Newedge CTA Trend Sub-Index11. For
illustration purposes, Figure 4 shows the Sortino12 ratio of the global trend following strategy for
different values of its two parameters, the slow and the fast moving averages. One can see that the
region around the values (10, 250) is very stable, hence suggesting that we have not over-optimized
10One only needs to replace the vector of carry signals κ with the vector of trend strength indicators TS.11Burghardt et al. (2010).12Sortino (1994).
10
the strategy. From a practitioner’s point of view, the stability of the Sortino ratio around this
particular region might also indicate why hedge funds focusing on trend following tend to produce
similar risk-adjusted return profiles as shown by Burghardt et al. (2010).
Figure 4: Sortino Ratio.
We create momentum risk factors for a global set of asset classes as well as for asset class specific
universes. The momentum factors are summarized in Table 6.
Table 6: Momentum strategies.
Risk factor Description Start date
COT Momentum strategy applied to commodity futures 31-Jan-2003
EQT Momentum strategy applied to equity index futures 31-Jan-2003
FIT Momentum strategy applied to fixed income futures 31-Jan-2003
FXT Momentum strategy applied to currency futures 31-Jan-2003
GLT Momentum strategy applied to a global universe of futures 31-Jan-2003
2.3.5 Currency crosses
Currency crosses risk factors represent long positions in currency futures. The aim is to capture
hedge funds’ unhedged currency exposures as well as specific macro bets. We summarize the
currency crosses factors in Table 7.
11
Table 7: Currency crosses.
Risk factor Description Start date
AUD/USD AUD/USD currency futures 31-Jan-2003
CHF/USD CHF/USD currency futures 31-Jan-2003
EUR/USD EUR/USD currency futures 31-Jan-2003
GBP/USD GBP/USD currency futures 31-Jan-2003
JPY/USD JPY/USD currency futures 31-Jan-2003
2.3.6 High yield proxy
For high yield proxies, we attempt to replicate the risk-return profile of a high yield bond index
with liquid instruments. We record the duration as well as the equity beta component of a typical
high yield index (Duration ' 4 and βEquity ' 0.4) and create a portfolio of bond and stock index
futures with the same risk characteristics. The high yield proxy factors are summarized in Table 8.
Table 8: High yield proxies.
Risk factor Description Start date
HYEU Proxy European HY factors (European indices and German Govt. futures) 31-Jan-2003
HYUS Proxy US HY factors (S&P500 and US Treasury futures) 31-Jan-2003
Figure 5: Value Added Daily Index (VADI) of a diversified set of risk factors.
Finally, Figure 5 shows the Value Added Daily Index (VADI) of the generated risk factors.
12
2.4 Estimation method
2.4.1 Liquidity adjustment
The illiquidity of certain hedge funds’ holdings can artificially reduce the volatility of their
reported Net Asset Value (NAV). To correct this bias we use the method presented by Geltner
(1991) which was initially applied to correct the returns in real estate indices. The aim of the
Geltner method is to reduce the first-order correlation in a time series of returns.
Assumption 2 The return at time t is equal to a linear combination of the real return recorded at
t and the return observed at t− 1.
We then estimate the first-order autocorrelation α and get the true series of returns:
Rt =R∗t−αR∗t−1
1−α
where R∗t is the return observed at t.
In our subsequent analysis, for each month where we estimate a factor model for a particular
hedge fund, we estimate the first-order correlation using a sample of 36 monthly observations and
restrict it only to positive values.
2.4.2 Variable and model selection
We now describe the variable and model selection employed in this analysis13. We also explain
the different steps which have driven us in our choices among several regularization methods to
estimate the most significant betas and decrease computational costs.
As previously mentioned, variable selection allows us to select for each fund to be replicated the
best subset of potential risk factors. Variable selection is relevant to hedge fund replication because
a fund’s return can be potentially affected by a very large universe of risk factors. Additionally,
because our estimation problem is often restricted to a limited number of observations14, we need
to restrict the number of risk factors which we can regress against the fund’s returns.
Our starting point is the Ordinary Least Squares (OLS) linear regression.
13The description of the various statistical methods presented in this paper is taken from Hastie et al. (2008).14We work with monthly hedge fund returns hence, for example, a fund with an already relatively long track
record of 5 years will only provide us with 60 observations.
13
Definition 3 (Ordinary Least Squares (OLS)) The OLS linear regression is a popular esti-
mation method in which we pick the coefficients β = (β1, β2, ..., βp)T , parameters to be estimated,
to minimize the Residual Sum of Squares (RSS):
RSS (β) =∑n
i=1(yi −∑p
j=1 xi,jβj)2
so that:
βOLS = argminβ
∑ni=1(yi −
∑pj=1 xi,jβj)
2
where:
• y = (y1, ..., yn), (n× 1) vector of observations (dependent variables);
• X = [xi,k], (n× p) matrix of observations (independent variables).
However, the practical implementation of the linear regression remains challenging. For example,
what happens if the variables in X are highly correlated? The matrix XTX may be nearly singular.
Or do the OLS estimates have a good prediction accuracy? On the one hand, we have a low bias
but on the other hand a high variance. How many factors should be used for the regression? Which
factors should be chosen? How can we boost the out-of-sample results? A lot of issues are raised
but, at the same time, there exists a wide range of potential solutions and our goal is to find the
best alliance between a diversified set of methodologies to match hedge fund returns out-of-sample,
improve factor selection and estimate an accurate set of betas.
A possible solution is to use regularization methods like Lq − norm regularization methods or
continuous shrinkage estimators. The main idea behind these methods is to control the variability
of the estimator by regularization: we shrink the betas by trading variance for bias, such that we
get a sparse model i.e. a model with few independent variables. We achieve this objective by
adding a L1 penalty to the OLS problem. This technique is called Least Absolute Shrinkage and
Selection Operator15. The LASSO is an alternative regularized version of the OLS problem which
uses the additional constraint that the L1 − norm of the parameter vector is no greater than a
given value.
Definition 4 (Least Absolute Shrinkage and Selection Operator (LASSO)) The estima-
tion of the betas using LASSO computation is given by the following formula:
βLASSO = argminβ
12
∑ni=1(yi −
∑pj=1 xi,jβj)
2 + λ∑p
j=1 | βj |
where λ is a positive complexity parameter that controls the amount of shrinkage, which is equivalent
to:
βLASSO = argminβ
∑ni=1(yi −
∑pj=1 xi,jβj)
2
subject to∑p
j=1 | βj |≤ t. There is a one-to-one correspondence between both parameters λ and t.
15We refer the reader to Pedregosa et al. (2011) for the LASSO implementation.
14
The intuition behind our choice of the LASSO is linked to the fact that we need automatic model
building while aiming to get good prediction accuracy with a sparse representation of the predictors
in the model. We need to control the complexity of the fitted model and the method we apply to
realize this strategy is based on the control of the variability of the estimator by regularization.
There are pros and cons attached to the LASSO methodology. For instance, on the one hand,
the choice of the regularization parameter λ can be difficult. We also notice that the optimization
problem is non-differentiable and that in general there is no closed-form solution for the global
minimum. On the other hand, the LASSO realizes both variable selection and model estimation in
a single step. We also get sparse and stable models easily interpretable with less extreme estimates.
And to finish, the LASSO tends to improve the prediction accuracy by stabilizing the parameter
estimates.
The problems while estimating the betas may be solved using quadratic programming or more
general convex optimization methods, as well as by specific algorithms such as the Least Angle
Regression (LAR). We have chosen to implement the LAR algorithm because it is an extremely
efficient algorithm for computing the entire path of potential LASSO solutions. The LAR actually
yields the entire LASSO solution path with the computational cost of a single OLS fit. The LAR
LASSO algorithm is described hereafter:
1. standardize the predictors to have mean zero and unit norm. Start with the residual ε = y−y,
betas being null and y being the fitted values, or predicted values, from the regression;
2. find the predictor xj most correlated with ε. For the next step, we need to define the following
notation: < , >. In convenient vector notation, we let y = (y1, ..., yn)T , x = (x1, ..., xn)T and
define:
< x, y >=∑n
i=1 xiyi = xT y
3. move βj from 0 towards its least-squares coefficient < xj , ε > until some other competitor xk
has as much correlation with the current residual as does xj ;
4. move βj and βk in the direction defined by their joint least-squares coefficient of the current
residual on (xj , xk) until some other competitor xl has as much correlation with the current
residual;
5. if a non-zero coefficient hits zero, drop its variable from the active set of variables and recom-
pute the current joint least squares direction;
6. continue in this way until all p predictors have been entered. After min(n − 1, p) steps we
arrive at the full least-square solution.
The final number of selected factors will depend on the regularization parameter λ: if λ is large,
the number of selected factors will decrease and vice versa. The selection problem of the optimal
λ may be solved using the Bayesian Information Criterion (BIC), a criterion which optimizes the
trade-off between model fit and complexity. The BIC is particularly well-suited for linear regression
models, given their explicit number of free parameters. Therefore, we select λ so as to maximize
the BIC.
15
Definition 5 (Bayesian Information Criterion (BIC)) The BIC is a criterion for model se-
lection among a finite set of models given by the following formula:
BIC = ln(σ2e) + λ.ln(n)
with:
• σ2e : error variance;
• λ: number of free parameters to be estimated;
• n: number of observations or sample size.
2.4.3 Weighted Least Squares (WLS)
In the previous part, we explained our methodology for variable selection which is based on
LASSO, LAR and BIC. These methods enable us to select a set of factors r, while we rely on WLS
coupled with an EWMA weight vector to estimate the actual model parameters used for replication.
The reasons for this second estimation step are twofold. Firstly, the LASSO is not scale-invariant
and so requires us to standardize the risk factors, which are the regression’s explanatory variables.
As a consequence, the resulting estimated regression model cannot be translated into an investable
portfolio. Secondly, the WLS method coupled with the EWMA weight vector allows us to better
capture the time-varying nature of hedge fund risk exposures while still being straightforward and
fast to calibrate.
We implement the WLS by first transforming the dataset as shown in Definition 6 and then by
applying OLS to the transformed dataset.
Definition 6 (Exponentially Weighted Moving Average (EWMA)) EWMA is computed
as described hereafter:
ri, i = 1, ...,m EWMA−→√
1−τ1−τT+1
ri,t√τri,t−1
...√τT ri,t−T
where τ is the decay constant.
For the decay constant τ , we use a value of 0.97 as suggested by Fleming et al. (2001).
2.4.4 Timeline for estimation and trading decisions
Figure 6 shows the sequence of actions performed to calibrate each factor model and calculate the
performance of the respective hedge fund replicator. While the OLS estimation requires 36 months
of observations, each step needs 37 observations because the estimation of liquidity-adjusted returns
relies on lagged returns and hence requires one additional observation.
16
Figure 6: Out-of-sample estimation.
2.5 Discussion
2.5.1 Comparison of other regularization and variable selection methods
In this section, we discuss other regularization and variable selection methods in order to justify
some of the methodological choices we made, following closely Hastie et al. (2008). Two other
popular regularization methods are the Ridge Regression and the Elastic Net methods. The first
one relies on a L1 penalty while the second one combines both a L1 and a L2 penalty. The
Ridge Regression also reduces the variability by shrinking the coefficients which improves prediction
accuracy but at the same time increases bias. The coefficients are driven toward zero but are never
equal to this value which makes this approach inappropriate for variable selection and hence for
our purposes. The Elastic Net method has the characteristics of both the LASSO and the Ridge
Regression and so enables to select variables but also to shrink the coefficients at the same time.
Unfortunately, since we are mostly interested in variable selection, it is more efficient for us to use
the LASSO which relies only on the L1 penalty.
Another popular variable selection method is the Stepwise Regression, which is a combination
of forward selection and backward elimination. At each step, the algorithm adds the best remaining
variable according to a confidence criterion and then the variables previously added in the regression
are tested to check if they can be removed, always according to a defined confidence criterion. This
loop keeps running until no variables are added or removed. As described before, we notice that
this process depends on two pre-defined confidence thresholds. This method is most likely to select
a subset of variables close to the best one rather than the best subset of variables. Since our focus
is on automatic model building and less on hypothesis testing, we do not see any benefits in using
this method instead of the LASSO.
17
To put it in a nutshell, we choose the LAR LASSO method because of the sparsity of its results,
its appropriateness for automatic model building and its computational efficiency.
2.5.2 Comparison of other estimation methods
We discuss the choice of our final estimation method, the WLS regression, coupled with an
EWMA weight vector. As previously mentioned, the choice of a second estimation method following
the LAR LASSO procedure coupled with a BIC model selection is twofold. Firstly, the LASSO
method is not scale-dependent and requires us to normalize the explanatory variables. For that
purpose, we normalize the explanatory variables using the in-sample variance. Hence, the resulting
regression model cannot be interpreted as a replication portfolio as it relies on non-investable
normalized variables. A proxy method which could have enabled us to directly use the regression
model provided by the LAR LASSO procedure would have required us to normalize all the risk
factors to a common volatility estimated on a rolling window basis. However, we did not want
to force our factor model to employ only variables with a constant target volatility: while this is
appropriate for exposure to certain risk factors like the trend following strategy, we find it less
pertinent to model the exposure to equity indices, as those tend to have time-varying volatilities.
Secondly, the final estimation method needs to capture the time-varying dynamics of hedge funds’
risk exposures. A natural approach for that purpose would be to employ state-space modeling
techniques like the Kalman Filter, as presented by Roncalli et al. (2011). However, we did not
find this approach practical for our purposes. On the one hand, since the LAR LASSO is regularly
changing the set of variables, we are not able to harvest the full computational efficiency of the
recursive Kalman algorithm, as we often need to re-initialize it. On the other hand and more
importantly, we found the calibration of the filter’s noise covariance matrices to be challenging. We
relied on a Maximum Likelihood (ML) estimation method as suggested by Roncalli et al. (2011)
and tried to estimate the parameters at each step. We found this approach not only to be very time-
consuming but also to be fraught with convergence issues, hence making it inappropriate for large-
scale automatic empirical testing. Finally, we decided to use the WLS procedure with an EWMA
weight vector. The OLS method is fast to implement and the rolling window implementation
coupled with exponential smoothing allows us to capture to a good extent the time-varying nature
of hedge funds’ risk exposures. Additionally, the reliance on the one decay constant τ allows us to
easily adapt the effective size of our estimation window.
3 Data and simulation
3.1 Data
We validate our model on a large dataset of monthly hedge fund returns. We use the proprietary
database provided by Prime Capital which we described previously. Figure 7 shows the number
18
of active hedge funds16 in the database for a given date. This number grows to above 10,000
before the 2008 crisis and subsequently decreases to about 6,000. The collapse of this number, as
we near the last recorded month of November 2012, simply reflects the fact that as of the end of
December 2012, when this figure was produced, not all active hedge funds had already submitted
their performance data to the various databases. Appendix G describes the various technical tools
and data employed to perform this analysis.
Figure 7: Evolution of the number of active funds in Prime Capital’s hedge fund database.
Figure 8 shows the validation dataset, split into styles. We show the number of funds, the
average number of replicated months and the average Sharpe ratio17 per style. Our study covers
the period from April 2006 to November 2012 which gives the possibility to gauge the robustness
of our replication process over a long term period including different crisis and other trends. We
also show the split per currency, as we limit our analysis to the following major currencies: the
Australian Dollar (AUD), Swiss Franc (CHF), Euro (EUR), Great Britain Pound (GBP) and the
United States Dollar (USD). We excluded from the analysis hedge funds denominated in Brazilian
Real (BRL) as we found the underlying data quality to be quite poor. We should also note that
each hedge fund replication is first performed on the fund’s excess returns above its money market
benchmark (e.g. 1M USD Libor for USD denominated funds) before we subsequently add back to
the performance of the replicator the performance of that benchmark less a spread.
16We define an active fund as a fund reporting a performance figure.17Sharpe (1994).
19
Table 9 and Table 10 show the numbers corresponding to Figure 8.
Table 9: Model validation - Summary statistics (1).
Replication dataset
Strategy Number of funds Avg. number of replicated months Avg. Sharpe Ratio
FoHF 2104 49.68 -0.18
Equity Hedge 2178 46.99 0.12
Event Driven 452 44.99 0.25
Multi-Strategy 563 44.39 0.13
Relative Value 704 44.99 0.48
Tactical Trading 1194 50.51 0.16
Table 10: Model validation - Summary statistics (2).
Replication dataset
Currency Number of funds
AUD 120
CHF 78
EUR 1093
GBP 91
USD 6167
3.2 Settings
Table 11 introduces some of the assumptions made to calculate the performance of hedge fund
replicators as well as some of the filters applied to detect and eliminate outliers in our universe of
hedge funds.
Table 11: Simulation settings.
Settings Parameters
Maximum number of risk factors 5
Cash collateral return max (Libor − 30bps, 0bps)
Implementation lag 10 business days
Decay factor τ 0.97 as suggested by Fleming et al. (2001)
Minimum start date for funds to be replicated March 2003
Length of estimation sample 36 months + 1 month for autocorrelation adjustment
First replicated month April 2006
Last replicated month November 2012
Outlier Only zero returns
Outlier Volatility p.a. > 50%
Outlier Abs(Sharpe Ratio) > 4
Outlier Corr(HF, Replicator) > 0.999
21
4 Simulation results
4.1 Statistics
Figure 9 provides summary statistics of the simulation results, with the associated values high-
lighted in Table 12.
22
Table 12: Simulation results - Summary statistics.
Summary statistics
Strategy Avg. Sharpe Ratio Avg. Bias Ratio Avg. correlation Avg. beta
Funds Replicators Funds Replicators Replicator to fund Replicator to fund
FoHF -0.18 0.23 0.74 1.17 0.56 0.62
Equity Hedge 0.12 0.22 1.10 1.15 0.50 0.57
Event Driven 0.25 0.14 1.27 1.04 0.38 0.47
Multi-Strategy 0.13 0.25 1.15 1.19 0.44 0.53
Relative Value 0.48 0.14 1.77 1.14 0.24 0.26
Tactical Trading 0.16 0.32 1.18 1.31 0.32 0.39
4.2 Summary statistics
A summary plot of the statistics of the simulation results is shown on Figure 9, with the asso-
ciated values presented in Table 12. It contains a performance comparison of funds vs. replicators
using the Sharpe ratio indicator which takes into account the volatilities of the different strategies.
The results are eye-catching and we see that for several categories the replicators do better or
slightly lower than the funds, except for the Relative Value style. For instance, the average Sharpe
ratio of the Tactical Trading replicators is 0.32 while the one for the corresponding funds is 0.16.
For Fund of Hedge Funds, replicators and funds have a Sharpe ratio of respectively 0.23 and -0.18.
For these two strategies, noticeable gaps exist. In four out of six cases, the average Sharpe ratio
of the replicators is higher than that of the funds: Fund of Hedge Funds, Tactical Trading, Equity
Hedge (0.22 vs. 0.12) and Multi-Strategy (0.25 vs. 0.13). For the Event Driven and Relative Value
styles, the average Sharpe ratio is lower for the replicators than for the respective funds, with
respectively 0.14 vs. 0.25 and 0.14 vs. 0.48, which also means a large gap for the Relative Value
style.
These results can be linked to some of the hypothesis raised by Hasanhodzic and Lo (2007): a
considerable part of the returns of the Event Driven style may be linked to the illiquidity premium
managers earn while providing capital in times of distress and which cannot be replicated with a
portfolio of liquid securities, explaining part of the underperformance we get in the measurement
of the average Sharpe ratio for this strategy. In the case of the Equity Hedge and Tactical Trading
styles, we can expect replicators to provide performance results closer to that of the underlying
hedge funds given the more liquid characteristics of their portfolios compared to the Event Driven
managers.
4.3 Factor exposures
For each style, we compare the risk and return characteristics of the replicators against that of
the replicated funds. Table 12 shows a comparison of the Sharpe ratios. We see that apart from
the Relative Value and Event Driven styles, the replicators tend to perform slightly better than
24
the actual funds. We also compare the Bias Ratio of the replicators against that of the funds:
apart from Relative Value, Event Driven and FoHF, hedge funds and replicators have similar Bias
Ratio. The Bias Ratio is an indicator to detect valuation bias or price manipulation in a portfolio
of assets18. Relative Value and Event Driven hedge funds have a higher Bias Ratio than their
respective replicators likely because of the illiquidity of some of their underlying assets which are
hard to value and whose pricing might rely only on a limited number of brokers’ quotes.
Figures 10, 12, 14, 16, 18 and 20 show the cumulative performance over time of replicators vs.
actual hedge funds for a given style, as well as the number of replicated funds for any given month.
Again, the collapse in the number of replicated months near November 2012 simply reflects the
number of available performance data as of writing of this paper. We can observe that, for each
style, the performance of replicators and actual funds share a strong common trend. For Tactical
Trading in particular, the most liquid strategy, the performance dynamic of replicators and hedge
funds is extremely close.
Figures 11, 13, 15, 17, 19 and 21 show for each strategy the evolution over time of some of
the most frequently selected risk factors in terms of frequency and average beta. These figures
also show the corresponding average values over the whole sample of replicated months. We can
notice on the one hand that, apart from the Tactical Trading style, EQRP (Equity Risk Parity)
happens to be the most frequent and significant factor. This factor describes a diversified long
portfolio of equity indices with a constant target volatility of 10% which introduces a pro-cyclical
money management mechanism, reducing risk exposure during poor performing periods for equity
markets when volatility tends to rise and increasing risk exposure during periods of low volatility
when equity markets tend to perform well. This behavior is not uncommon from what can be
observed from hedge fund managers who tend to act in a procyclical fashion often because of
risk management rules. On the other hand, the most frequent and significant factors for the
Tactical Trading replicators are momentum factors, starting with GLT, COT and FXT (Global,
Commodities and Currency trend following). Again, these results are not surprising since trend
following is a strategy very often employed by Tactical Trading managers. These results are also
summarized in Table 13 and Table 14.
18The Bias Ratio is described in details in Appendix D.
25
Table 13: Summary statistics - Frequency.
Risk factors - frequency
Factors FoHF Equity Hedge Event Driven Multi-Strategy Relative Value Tactical Trading
EQRP 0.7602 0.6024 0.4578 0.5914 0.3388 0.2268
AUDUSD 0.2731 0.2090 0.1119 0.2466 0.1720 0.1478
GLT 0.2386 0.0875 0.0719 0.1775 0.0667 0.3768
NATRES 0.3373 0.1436 0.2112 0.2489 0.1253 0.1256
APAC 0.2051 0.1759 0.0763 0.1519 0.1439 0.0957
ES 0.1639 0.3959 0.4378 0.2369 0.2299 0.1239
HYEU 0.0650 0.1723 0.1175 0.1119 0.1238 0.0595
PM 0.2380 0.1128 0.0699 0.1619 0.0923 0.1946
TPX 0.1077 0.0658 0.0440 0.0782 0.1062 0.0600
COT 0.0943 0.0497 0.0274 0.0724 0.0318 0.2269
RTA 0.1247 0.2162 0.1820 0.1423 0.1008 0.0677
FXC 0.0751 0.0708 0.0925 0.0909 0.0901 0.1137
EU 0.0442 0.0619 0.0369 0.0533 0.0521 0.0431
DXY 0.0447 0.0852 0.0506 0.0582 0.0473 0.0563
FXT 0.0763 0.0528 0.0499 0.0722 0.0488 0.2527
NQ 0.0487 0.1064 0.0359 0.0629 0.0692 0.0544
HYUS 0.0200 0.0736 0.0570 0.0498 0.1058 0.0554
GBPUSD 0.0247 0.0220 0.0714 0.0390 0.0609 0.0373
EURUSD 0.0340 0.0277 0.0585 0.0422 0.0462 0.0403
FIT 0.0055 0.0066 0.0043 0.0146 0.0237 0.0874
VA 0.0035 0.0040 0.0022 0.0043 0.0104 0.0318
FIC 0.0042 0.0118 0.0130 0.0085 0.0229 0.0123
EQT 0.0020 0.0072 0.0140 0.0067 0.0211 0.0141
GLC 0.0015 0.0047 0.0096 0.0039 0.0104 0.0124
COC 0.0013 0.0039 0.0055 0.0042 0.0081 0.0145
JPYUSD 0.0208 0.0263 0.0376 0.0262 0.0494 0.1155
CHFUSD 0.0244 0.0309 0.0294 0.0406 0.0569 0.0580
VIX 0.1176 0.0746 0.1573 0.1201 0.1385 0.0485
DURATION 0.0600 0.0535 0.1155 0.0723 0.1246 0.0625
DM 0.1145 0.1253 0.1632 0.1340 0.1470 0.0405
38
Table 14: Summary statistics - Average beta.
Risk factors - average beta
Factors FoHF Equity Hedge Event Driven Multi-Strategy Relative Value Tactical Trading
EQRP 0.3115 0.4132 0.2011 0.2717 0.1298 0.1116
AUDUSD 0.0516 0.0831 0.0344 0.0611 0.0405 0.0493
GLT 0.0412 0.0256 0.0129 0.0437 0.0137 0.2432
NATRES 0.0389 0.0316 0.0389 0.0361 0.0211 0.0383
APAC 0.0387 0.0773 0.0228 0.0441 0.0302 0.0290
ES 0.0364 0.1846 0.1495 0.0850 0.0730 0.0448
HYEU 0.0286 0.1408 0.0518 0.0749 0.0531 0.0348
PM 0.0236 0.0263 0.0137 0.0212 0.0123 0.0421
TPX 0.0192 0.0218 0.0105 0.0138 0.0275 0.0040
COT 0.0177 0.0168 0.0080 0.0179 0.0083 0.1384
RTA 0.0172 0.0933 0.0507 0.0337 0.0198 0.0008
FXC 0.0138 0.0266 0.0296 0.0237 0.0134 0.0076
EU 0.0133 0.0319 0.0097 0.0205 0.0091 0.0061
DXY 0.0131 0.0627 0.0203 0.0269 0.0195 0.0270
FXT 0.0114 0.0064 0.0011 0.0103 0.0035 0.1225
NQ 0.0104 0.0409 0.0048 0.0179 0.0150 0.0102
HYUS 0.0083 0.0520 0.0312 0.0438 0.0556 0.0272
GBPUSD 0.0056 0.0071 0.0351 0.0110 0.0154 0.0013
EURUSD 0.0045 0.0095 0.0153 0.0072 0.0073 0.0087
FIT 0.0016 0.0023 0.0011 0.0058 0.0080 0.0587
VA 0.0013 0.0024 0.0025 0.0029 0.0056 0.0239
FIC 0.0010 0.0049 0.0041 0.0030 0.0046 0.0045
EQT 0.0004 0.0027 0.0031 0.0028 0.0053 0.0070
GLC 0.0003 0.0015 0.0018 0.0011 0.0027 0.0042
COC 0.0003 0.0011 0.0017 0.0012 0.0017 0.0055
JPYUSD -0.0014 -0.0009 -0.0048 -0.0013 0.0036 0.0110
CHFUSD -0.0030 0.0034 -0.0043 0.0015 -0.0008 0.0160
VIX -0.0179 -0.0221 -0.0453 -0.0259 -0.0363 -0.0062
DURATION -0.0234 -0.0483 -0.0764 -0.0414 -0.0221 -0.0086
DM -0.0653 -0.1477 -0.1681 -0.1223 -0.1319 -0.0180
4.4 Trading intensity
As previously mentioned, we also measure the trading intensity of the generated replicators.
Since we only use listed futures, our main metrics of trading intensity is the Round Turns Per
Million per year (RTPM).
Definition 7 (Round turn) A round turn is a completed futures transaction involving both a
purchase and a liquidating sale or a sale followed by a covering purchase.
39
Definition 8 (Round Turns Per Million per year (RTPM)) The Round Turns Per Million
per year (RTPM) is a standardized measure of how many times a year a futures trader would trade
for a one million dollar size account.
The RTPM is used by both brokers and investors to gauge the trading intensity of a particular
strategy. For example, a typical low frequency Global Tactical Asset Allocation (GTAA) program
with 10% volatility p.a. will have a RTPM of about 1,000, a typical short term trading program
with 10% volatility p.a. will have a RTPM greater than 7,000 and a typical index tracking program
will have a RTPM around 200. A chart with the risk factors’ RTPM is provided on Figure 22.
Finally, Figure 23 displays the breakdown by style of the clones’ average RTPM. The numbers are
moderate and in line with the underlying strategies. This suggests that the methodology employed
is suitable for actual trading purposes.
Figure 22: Trading intensity - Risk factors’ RTPM.
40
Figure 23: Trading intensity - Proof of concept.
4.5 Examp les
After presenting aggregated results only, we now show specific examples of replicated hedge
funds to provide a better intuition behind the potentials and pitfalls of the employed methodology.
We start with the replication of the HFRI Fund Weighted Index which is a diversified hedge
fund index. We notice on Figure 24 a static exposure to equity market risk and the result of
the replication is accurate. On Figure 25, we apply the algorithm to the Dow Jones HFI Managed
Futures Index, an index of managed futures funds as its name indicates. We notice a stable exposure
to trend following factors, which is consistent with prior knowledge that a majority of managed
futures managers tend to be systematic trend followers. The replication is as well accurate. Another
example of accurate replication is the Winton Futures Fund, a large well-known diversified trend
follower. The replication is accurate with a risk profile similar to that of the Dow Jones HFI
Managed Futures Index (Figure 26). Next, we apply the algorithm to Kingate Global Fund which
used to be a feeder fund into the Ponzi scheme run by Bernard Madoff. No risk factor has been
selected by the variable selection procedure and the replication does not work (Figure 27). Similarly,
we try to replicate CFM Stratus, a short term trading hedge fund, with a focus on futures trading,
statistical arbitrage and options trading. We note that the replication does not work (Figure 28)
as none of our relatively static or slowly trading factors manage to capture the idiosyncratic return
profile of this fund. Similarly, the algorithm manages to replicate a significant component of the
returns of the Blackstone Partners Fund, a large diversified fund of hedge funds (Figure 29). To
41
conclude with these examples, we apply the algorithm to the Barclay Aggregate Bond Index which
is a diversified long only fixed income index. We can see on Figure 30 that this replication is
accurate.
Figure 24: Performance comparison - HFRI Fund Weighted Index.
42
Figure 25: Performance comparison - Dow Jones HFI Managed Futures Index.
Figure 26: Performance comparison - Winton Futures Fund.
43
Figure 27: Performance comparison - Kingate Global Fund.
Figure 28: Performance comparison - CFM Stratus Fund.
44
Figure 29: Performance comparison - Blackstone Partners Fund.
Figure 30: Performance comparison - Barclay Aggregate Bond Index.
45
5 Conclusion
Hedge fund replication does work on average. Hedge fund replication works best for liquid
strategies (e.g. Equity Hedge, Tactical Trading) but does not fully capture the returns of less
liquid strategies (e.g. Event Driven, Relative Value). Also, we must specify that the true rela-
tive performance of replicators is likely better than the generated figures because of survivorship
and backfill biases19. Additionally, although we did not include hedge fund indices in the model
validation dataset, those can very well be replicated.
This paper aims to contribute to the field of hedge fund replication by filling the gaps between
previously published studies. While Lo and Hasanhodzic (2007) conduct an extensive empirical
testing like we do, their methodology does not employ fully investable risk factors and does not
address the issue of transactions costs. While Giamouridis and Paterlini (2009) apply regulariza-
tion methods to hedge fund replication, they limit their empirical analysis to hedge fund indices.
Furthermore both Lo and Hasanhodzic (2007) and Giamouridis and Paterlini (2009) only consider
asset class risk factors. Finally, like the seminal work of Fung and Hsieh (2001), we employ both
asset class and dynamic risk factors.
Motivated by recent advances in statistics, this paper gives consideration to the application of
norm-constrained regularization methods to optimize the construction of hedge fund replicators.
This methodology allows to consider a relatively large universe of risk factors. We tested the
accuracy of this method on more than 25,000 years of hedge fund performance data, including
different financial markets regimes, which enables us to gauge its robustness. Furthermore, by
focusing on investable instruments, considering transaction costs and analyzing trading intensity, we
seek to ensure that our methodology and results can be translated into actual investment strategies.
This methodology can be used by practitioners for performance evaluation, risk measurement as
well as for investment purposes.
Further research could focus on the development of additional risk factors, on the optimal
estimation of the EWMA decay constant τ as well as on the improvement of the transaction costs
model.
Appendix
A Strategy description
The following list describes the different hedge fund styles being analyzed in this study. These
descriptions reflect Prime Capital’s proprietary nomenclature.
Equity Hedge (EH): A hedge fund strategy that maintains both long and short positions in
primarily equity and equity derivative securities. The investment decision is mainly fundamentally
driven. The investment universe can be broadly diversified or narrowly focused on specific sectors
and regions and the portfolio can range broadly in terms of levels of net exposure, leverage employed,
holding period, concentrations of market capitalizations and valuation ranges.
19Lo and Hasanhodzic (2007) provide a short but comprehensive overview of these issues.
46
Event Driven (ED): A hedge fund strategy that invests in various asset classes and seek to
profit from potential mispricing of securities related to a specific corporate or market event. Such
events can include: mergers, bankruptcies, financial or operational stress, restructurings, asset sales,
recapitalizations, spin-offs, litigation, regulatory and legislative changes as well as other types of
corporate events.
Fund of Hedge Funds (FoHF): A fund that invests solely in a number of hedge funds.
Multi-Strategy (MS): A hedge fund that utilizes a range of hedge fund strategies.
Relative Value (RV): A hedge fund strategy based on the expected realization of a valuation
discrepancy between multiple securities. Managers employ a variety of fundamental and quantita-
tive techniques to establish investment theses, and security types range broadly across equity, fixed
income, derivative or other security types.
Tactical Trading (TT): A hedge fund strategy based on managers who take positions in financial
derivatives and other securities and apply mostly directional strategies or hedged strategies across
asset classes and broad market segments. The instruments traded are predominantly equity and
bond market indices or currencies and commodities, as well as their derivatives (mostly futures and
options), but basically all instruments across all countries may be used to express predominantly
top-down views. The aim is to achieve a positive absolute return based on dynamic asset allocation
decisions. The techniques employed may be discretionary or systemic, technical (e.g. trading
oriented) or fundamental or a mix of them.
B K-means clustering
The k-means method classifies n observations into k clusters in which each observation belongs
to the cluster with the nearest mean.
Definition 9 (K-means clustering) Given a set of observations (xj , j = 1, ..., n), where each
observation is a real vector, this clustering method splits n observations into k subsets S =
Si, i = 1, ..., k with necessarily k ≤ n so as to minimize the sum of squares among each clus-
ter:
argminS
∑ki=1
∑xj∈Si
‖ xj − µi ‖2
where µi is the mean of the points in Si and ‖ . ‖ the L1 − norm.
In our context, the solution of k-means clustering is given by the Principal Component Analysis
(PCA). Actually, the PCA subspaces defined by the directions of the principal components are the
same as the cluster subspaces surrounding the centroids. So we choose k = 5, k being the number
of hedge fund styles, for the number of principal components to get the corresponding centroids
and the further spanned subspaces.
47
The standard algorithm uses an iterative refinement technique by alternating between two steps:
• assignment step: given an initial set of k means m(1)1 , ...,m
(1)k , this step assigns each obser-
vation to the cluster with the nearest mean:
S(t)i =
xp :‖ xp −m(t)
i ‖≤‖ xp −m(t)j ‖,∀1 ≤ j ≤ k
• update step: compute the new means which are the new centroids of the observations in
the clusters:
m(t+1)i = 1∣∣∣S(t)
i
∣∣∣∑
xj∈S(t)i
xj
The clustering is finished when no changes are done, so that convergence has been reached. The
global procedure proceeds as follows:
1. five initial means are randomly generated in the whole space of data;
2. five clusters are created and partition the whole set of observations, associating each obser-
vation with the nearest mean;
3. for each of the five clusters, the centroid is now the new mean;
4. iterative refinement of the two last steps until convergence has been reached.
C Transaction costs analysis
Transaction costs are very hard to estimate especially when dealing with heterogeneous asset
classes. Both over- and understating transaction costs can have a negative impact on style selection.
As a rule of thumb, commissions are the most important costs for retail traders, bid-ask spreads
for medium size funds and price impact dominates for very large funds, as discussed by Bouchaud
and Potters (2003). We focus on commissions and spreads and ignore price impact. We consider
an exchange fee of $1.5 per lot, an execution and clearing commission of $1.5 per lot and a bid-ask
spread curve which we estimate by fitting empirical spreads against volatilities with a discretionary
upward adjustment. Hereafter, we show a plot of the model employed with associated spreads and
volatilities (Figure 31).
On the one hand, we might potentially underestimate the costs of large transactions associated
with full portfolio liquidations. On the other hand, we might potentially overestimate the costs of
rolling over futures contracts, as our simulator trades the outright market instead of the much more
liquid spread market. Overall, we think that our approach produces fair results for a medium size
fund. On Figure 32, we can see an example of the impact of transaction costs over time applied
to the Net Asset Value (NAV) of the Equity Timing factor (EQT). We can also see this evolution
gross of commissions and spreads. Nevertheless this remains a very crude model of transaction
costs but we think that, for the purpose of this study, it provides better insights than no model at
all.
48
D Bias Ratio
We describe how to estimate the Bias Ratio, a metric used to detect valuation bias or deliberate
price manipulation and which was first presented by Abdulali (2006). This indicator measures how
far the returns from an investment portfolio are from an unbiased distribution. The Bias Ratio
stems from an insight into the pricing behavior of asset managers as they address the expectations
of investors.
Definition 10 (Bias Ratio) Let ri be the return in month i, 1 ≤ i ≤ n, n the number of monthly
returns and σ a standard deviation of returns. We define Bias Ratio as described hereafter:
Bias Ratio = BR = Count(ri|ri∈[0,σ])1+Count(ri|ri∈[−σ,0[)
where:
• 0 ≤ BR ≤ n;
• ∀i ∈ [0, n], if ri ≤ 0 then BR = 0;
• ∀i ∈ [0, n], if ri /∈ [0, σ] then BR = 0;
• if the distribution ri is Normal with mean = 0 then BR approaches 1 as n goes to infinity.
E Rollover schedule of futures contracts
In Table 15 and Table 16, we specify the rollover schedule of the futures contracts used for
this analysis. The industry letter convention which is used in the tables to describe the contract
expirations is detailed in Table 17.
50
Table 15: Rollover schedule of contracts (1).
Active contract for:
Category Code Exchange Generic # BaseYear Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec RollDay Currency
Bond TU CBT 1 0 H H M M M U U U Z Z Z H 23 USD
Bond FV CBT 1 0 H H M M M U U U Z Z Z H 23 USD
Bond TY CBT 1 0 H H M M M U U U Z Z Z H 23 USD
Bond US CBT 1 0 H H M M M U U U Z Z Z H 23 USD
Bond OE EUX 1 0 H H H M M M U U U Z Z Z 3 EUR
Bond DU EUX 1 0 H H H M M M U U U Z Z Z 3 EUR
Bond RX EUX 1 0 H H H M M M U U U Z Z Z 3 EUR
Bond JB TSE 1 0 H H H M M M U U U Z Z Z 7 JPY
Bond G LIF 1 0 H H M M M U U U Z Z Z H 22 GBP
Bond XM SFE 1 0 H H H M M M U U U Z Z Z 10 AUD
Bond YM SFE 1 0 H H H M M M U U U Z Z Z 10 AUD
Bond CN MSE 1 0 H H M M M U U U Z Z Z H 22 CAD
Currency AD CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency BP CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency CD CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency EC CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency JY CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency PE CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency NV CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency SF CME 1 0 H H H M M M U U U Z Z Z 9 USD
Currency KU KFE 1 0 F G H J K M N Q U V X Z 15 KRW
Equity Index ES CME 1 0 H H H M M M U U U Z Z Z 12 USD
Equity Index NQ CME 1 0 H H H M M M U U U Z Z Z 12 USD
Equity Index DM CBT 1 0 H H H M M M U U U Z Z Z 12 USD
Equity Index VG EUX 1 0 H H H M M M U U U Z Z Z 12 EUR
Equity Index CF EOP 1 0 H H H M M M U U U Z Z Z 12 EUR
Equity Index GX EUX 1 0 H H H M M M U U U Z Z Z 12 EUR
Equity Index Z LIF 1 0 H H H M M M U U U Z Z Z 12 GBP
Equity Index AI SAF 1 0 H H H M M M U U U Z Z Z 12 ZAR
Equity Index NK OSE 1 0 H H H M M M U U U Z Z Z 5 JPY
Equity Index RTA NYF 1 0 H H H M M M U U U Z Z Z 12 USD
Equity Index TP TSE 1 0 H H H M M M U U U Z Z Z 5 JPY
Equity Index KM KFE 1 0 H H H M M M U U U Z Z Z 5 KRW
Equity Index HI HKG 1 0 F G H J K M N Q U V X Z 22 HKD
Equity Index TW SGX 1 0 F G H J K M N Q U V X Z 22 USD
Interest Rate BA MSE 1 0 H H H M M M U U U Z Z Z 12 CAD
Interest Rate BA MSE 2 0 M M M U U U Z Z Z H H H 12 CAD
Interest Rate ED CME 1 0 H H H M M M U U U Z Z Z 12 USD
Interest Rate ED CME 2 0 M M M U U U Z Z Z H H H 12 USD
Interest Rate ED CME 3 0 U U U Z Z Z H H H M M M 12 USD
Interest Rate ED CME 4 0 Z Z Z H H H M M M U U U 12 USD
Interest Rate ED CME 5 1 H H H M M M U U U Z Z Z 12 USD
Interest Rate ED CME 6 1 M M M U U U Z Z Z H H H 12 USD
Interest Rate ED CME 7 1 U U U Z Z Z H H H M M M 12 USD
Interest Rate ED CME 8 1 Z Z Z H H H M M M U U U 12 USD
Interest Rate ED CME 9 2 H H H M M M U U U Z Z Z 12 USD
Interest Rate ER LIF 1 0 H H H M M M U U U Z Z Z 12 EUR
Interest Rate ER LIF 2 0 M M M U U U Z Z Z H H H 12 EUR
Interest Rate ER LIF 3 0 U U U Z Z Z H H H M M M 12 EUR
Interest Rate ER LIF 4 0 Z Z Z H H H M M M U U U 12 EUR
Interest Rate ER LIF 5 1 H H H M M M U U U Z Z Z 12 EUR
Interest Rate ER LIF 6 1 M M M U U U Z Z Z H H H 12 EUR
Interest Rate ER LIF 7 1 U U U Z Z Z H H H M M M 12 EUR
Interest Rate ER LIF 8 1 Z Z Z H H H M M M U U U 12 EUR
Interest Rate ER LIF 9 2 H H H M M M U U U Z Z Z 12 EUR
Interest Rate ES LIF 1 0 H H H M M M U U U Z Z Z 12 CHF
Interest Rate ES LIF 2 0 M M M U U U Z Z Z H H H 12 CHF
Interest Rate IR SFE 1 0 H H H M M M U U U Z Z Z 4 AUD
Interest Rate IR SFE 2 0 M M M U U U Z Z Z H H H 4 AUD
Interest Rate IR SFE 3 0 U U U Z Z Z H H H M M M 4 AUD
Interest Rate IR SFE 4 0 Z Z Z H H H M M M U U U 4 AUD
Interest Rate L LIF 1 0 H H H M M M U U U Z Z Z 12 GBP
Interest Rate L LIF 2 0 M M M U U U Z Z Z H H H 12 GBP
Interest Rate L LIF 3 0 U U U Z Z Z H H H M M M 12 GBP
Interest Rate L LIF 4 0 Z Z Z H H H M M M U U U 12 GBP
Interest Rate L LIF 5 1 H H H M M M U U U Z Z Z 12 GBP
Interest Rate L LIF 6 1 M M M U U U Z Z Z H H H 12 GBP
Interest Rate L LIF 7 1 U U U Z Z Z H H H M M M 12 GBP
Interest Rate YE TFX 1 0 U U U Z Z Z H H H M M M 6 JPY
Interest Rate YE TFX 2 0 Z Z Z H H H M M M U U U 6 JPY
Interest Rate YE TFX 3 1 H H H M M M U U U Z Z Z 6 JPY
51
Table 16: Rollover schedule of contracts (2).
Active contract for:
Category Code Exchange Generic # BaseYear Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec RollDay Currency
Crude Oil CL NYM 1 0 H H K K N N U U X X F F 9 USD
Crude Oil CL NYM 2 0 K N N U U X X F F H H K 9 USD
Crude Oil CO ICE 1 0 H H K K N N U U X X F F 9 USD
Crude Oil CO ICE 2 0 K N N U U X X F F H H K 9 USD
Refined Products HO NYM 1 0 H H K K N N U U X X F F 9 USD
Refined Products HO NYM 2 0 K N N U U X X F F H H K 9 USD
Natural Gas NG NYM 1 0 H H K K N N U U X X F F 9 USD
Natural Gas NG NYM 2 0 K N N U U X X F F H H K 9 USD
Refined Products QS ICE 1 0 H H K K N N U U X X F F 9 USD
Refined Products QS ICE 2 0 K N N U U X X F F H H K 9 USD
Refined Products XBW NYM 1 0 H H K K N N U U X X F F 9 USD
Refined Products XBW NYM 2 0 K N N U U X X F F H H K 9 USD
Precious Metal GC CMX 1 0 G J J M M Q Q Z Z Z Z G 9 USD
Precious Metal SI CMX 1 0 H H K K N N U U Z Z Z H 9 USD
Base Metal HG CMX 1 0 H H K K N N U U Z Z Z H 9 USD
Base Metal HG CMX 2 0 K N N U U Z Z Z H H H K 9 USD
Base Metal LA LME 1 0 H H K K N N U U X X F F 9 USD
Base Metal LA LME 2 0 K N N U U X X F F H H K 9 USD
Base Metal LL LME 1 0 H H K K N N U U X X F F 9 USD
Base Metal LL LME 2 0 K N N U U X X F F H H K 9 USD
Base Metal LN LME 1 0 H H K K N N U U X X F F 9 USD
Base Metal LN LME 2 0 K N N U U X X F F H H K 9 USD
Base Metal LX LME 1 0 H H K K N N U U X X F F 9 USD
Base Metal LX LME 2 0 K N N U U X X F F H H K 9 USD
Soy BO CBT 1 0 H H K K N N Z Z Z Z F F 9 USD
Soy BO CBT 2 0 K N N Z Z Z Z F F H H K 9 USD
Corn C CBT 1 0 H H K K N N U U Z Z Z H 9 USD
Corn C CBT 2 0 K N N U U Z Z Z H H H K 9 USD
Soy S CBT 1 0 H H K K N N X X X X F F 9 USD
Soy S CBT 2 0 K N N X X X X F F H H K 9 USD
Wheat W CBT 1 0 H H K K N N U U Z Z Z H 9 USD
Wheat W CBT 2 0 K N N U U Z Z Z H H H K 9 USD
Wheat KW KCB 1 0 H H K K N N U U Z Z Z H 9 USD
Fibers CT NYB 1 0 H H K K N N Z Z Z Z Z H 9 USD
Fibers CT NYB 2 0 K N N Z Z Z Z Z H H H K 9 USD
Foodstuff CC NYB 1 0 H H K K N N U U Z Z Z H 9 USD
Foodstuff KC NYB 1 0 H H K K N N U U Z Z Z H 9 USD
Foodstuff KC NYB 2 0 K N N U U Z Z Z H H H K 9 USD
Foodstuff SB NYB 1 0 H H K K N N V V V H H H 9 USD
Foodstuff SB NYB 2 0 K N N V V V H H H H H K 9 USD
Livestock FC CME 1 0 H H K K Q Q Q V V F F F 9 USD
Livestock LC CME 1 0 G J J M M Q Q V V Z Z G 9 USD
Livestock LC CME 2 0 M M Q Q V V Z Z G G J J 9 USD
Livestock LH CME 1 0 G J J M M N Q V V Z Z G 9 USD
Livestock LH CME 2 0 M M N Q V V Z Z G G J J 9 USD
Non-Equity Index UX CBF 1 0 F G H J K M N Q U V X Z 6 USD
52
Table 17: Futures month codes.
Letter Contract expiration
F Jan
G Feb
H Mar
J Apr
K May
M Jun
N Jul
Q Aug
U Sep
V Oct
X Nov
Z Dec
F Category weights
Table 18 describes the fixed category weights used as inputs for the carry and momentum
strategies.
Table 18: Futures contracts’ weights for the carry and momentum strategies.
Category Weight
Bond 14.29%
Currency 14.29%
Equity Index 14.29%
Interest Rate 14.29%
Energy20 14.29%
Precious Metal 7.14%
Base Metal 7.14%
Other21 14.29%
Non-Equity Index 0.00%
20Includes Crude Oil, Refined Products and Natural Gas.21Includes Soy, Corn, Wheat, Fibers, Foodstuff and Livestock.
53
G Technical fact sheet
Table 19: Technical fact sheet - Data.
Data sources Numerical Analysis Machine Learning
Eurekahedge NumPy scikit-learn
Hedgefund.net SciPy
TASS
Prime Capital
Bloomberg
References
Abdulali, A., The bias ratio: measuring the Sharpe of fraud, Protege Partners Quarterly Letter,
Q3 2006.
Alexander, C., Dimitriu, A., Sources of over-performance in equity markets: mean reversion, com-
mon trends and herding, EFMA Basel Meeting Paper, 2004.
Amenc, N., Gegin, W., Martellini, L., Meyfredi, J.C., The myths and limits of passive hedge fund
replication, EDHEC Risk and Asset Management Research Centre, June, 2007.
Amenc, N., Martellini, L., Meyfredi, J. C., Ziemann, V., Performance of passive hedge fund repli-
cation strategies, EDHEC Risk and Asset Management Research Centre, September, 2009.
Amin, G., Kat, H., Hedge fund performance (1990-2000): do the money machines really add value?,
Journal of Financial and Quantitative Analysis 38, 251-274, 2003.
Bollen, N., Fisher, G., Send in the clones? Hedge fund replication using futures contracts, Barclay’s
Insider Report, August, 2012.
Bouchaud, J. P., Potters, M., Theory of financial risk and derivatives pricing: from statistical
physics to risk management, Cambridge University Press, 2003.
Burghardt, G., Duncan, R., Liu, L., Two benchmarks for momentum trading, Newedge Alterna-
tiveEdge Research, August, 2010.
Fleming, J., Kirby, C., Ostdiek, B., The economic value of volatility timing, Journal of Finance,
56 (1), 2001.
Fung, W., Hsieh, D. A., The risk in hedge fund strategies: theory and evidence from trend followers,
The Review of Financial Studies, Vol. 14, No. 2, 2001.
Fung, W., Hsieh, D. A., Ramadorai, T., Naik, N., Hedge funds: performance, risk and capital
formation, SSRN Working Paper, 2006.
54
Geltner, D., Smoothing in appraisal-based returns, Journal of Real Estate Finance and Economics,
Vol. 4, 327-345, 1991.
Giamouridis, D., Paterlini, S., Regular(ized) hedge fund clones, Journal of Financial Research, 33
(3), 2010.
Hasanhodzic, J., Lo, A. W., Can hedge-fund returns be replicated?: The linear case, Journal of
Investment Management, 5(2):5-45, 2007.
Hastie, T., Tibshirani, R., Friedman, J., The elements of statistical learning (2nd edition), Springer-
Verlag, 2008.
Ilmanen, A., Expected returns: an investor’s guide to harvesting market rewards, Wiley & Sons,
2011.
Jones, C., Hedge funds of funds: a guide for investors, The Wiley Finance Series, July, 2008.
J.P. Morgan and Reuters, RiskMetrics - Technical document, 1996.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M.,
Perrot, M., Duchesnay, E., Scikit-learn: machine learning in Python, Journal of Machine Learning
Research 12, 2825-2830, 2011.
Roncalli, T., Weisang, G., Tracking problems, hedge fund replication and alternative beta, Journal
of Financial Transformation, Vol. 31, 19-29, 2011.
Shannon, C., A mathematical theory of communication, Bell System Technical Journal, 27 (3),
1948.
Sharpe, W., Asset allocation: management style and performance measurement, Journal of Port-
folio Management, 18 (2), 1992.
Sharpe, W., The Sharpe Ratio, Journal of Portfolio Management, 21 (1), 1994.
Simon, D., Campasano, J., The VIX futures basis: evidence and trading strategies, SSRN Working
Paper, June, 2012.
Sortino, F. and Price, L., Performance measurement in a downside risk framework, Journal of
Investing, 59-65, Fall 1994.
55