Trend Filtering Methods PDF

December
2 0 11
Issue #8
W H I T E PA PE R
T R E N D F I LT E R I N G
METHODS FOR
M O M E N T U M S T R AT E G I E S
Benjamin Bruder
Research & Development
Lyxor Asset Management, Paris
[email protected]
712100_215829_ white paper 8 lot 1.indd 1
Tung-Lam Dao
[email protected]
Jean-Charles Richard
[email protected]
Thierry Roncalli
[email protected]
13/12/11 16:08
712100_215829_ white paper 8 lot 1.indd 2
13/12/11 16:08
T R E N D F I LT E R I N G M E T H O D S F O R M O M E N T U M S T R AT E G I E S
Issue # 8
Foreword
The widespread endeavor to identify trends in market prices has given rise to a significant amount of literature. Elliott Wave Principles, Dow Theory, Business cycles, among
many others, are common examples of attempts to better understand the nature of market
prices trends.
Unfortunately this literature often proves frustrating. In their attempt to discover new
rules, many authors eventually lack precision and forget to apply basic research methodology.
Results are indeed often presented without any reference neither to necessary hypotheses nor
to condence intervals. As a result, it is dicult for investors to nd there rm guidance
and to dierentiate phonies from the real McCoy.
This said, attempts to dierentiate meaningful information from exogenous noise lie at
the core of modern Statistics and Time Series Analysis. Time Series Analysis follows similar
goals as the above mentioned approaches but in a manner which can be tested. Today more
than ever, modern computing capacities can allow anybody to implement quite powerful
tools and to independently tackle trend estimation issues. The primary aim of this 8th
White Paper is to act as a comprehensive and simple handbook to the most
widespread trend measurement techniques.
Even equipped with rened measurement tools, investors have still to remain wary about
their representation of trends. Trends are sometimes thought about as some hidden force
pushing markets up or down. In this deterministic view, trends should persist.
However, random walks also generate trends! Five reds drawn in a row from a non
biased roulette wheel do not give any clue about the next drawn color. It is just a past trend
with nothing to do with any underlying structure but a mere succession of independent
events. And the bottom line is that none of those two hypotheses can be conrmed or
dismissed with certainty.
As a consequence, overtting issues constitute one of the most serious pitfalls in applying
trend ltering techniques in nance. Designing eective calibration procedures reveals to be
as important as the theoretical knowledge of trend measurement theories. The practical
use of trend extraction techniques for investment purposes constitutes the other
topic addressed in this 8th White Paper.
Nicolas Gaussel
Global Head of Quantitative Asset Management
Q U A N T R E S E A R C H B Y LY X O R
712100_215829_ white paper 8 lot 1.indd Sec1:1
13/12/11 16:08
13/12/11 16:08
Issue # 8
Executive Summary
Introduction
The ecient market hypothesis implies that all available information is reected in current
prices, and thus that future returns are unpredictable. Nevertheless, this assumption has
been rejected in a large number of academic studies. It is commonly accepted that nancial
assets may exhibit trends or cycles. Some studies cite slow-moving economic variables related
to the business cycle as an explanation for these trends. Other research argues that investors
are not fully rational, meaning that prices may underreact in the short run and overreact at
long horizons.
Momentum strategies try to benet from these trends. There are two opposing types:
trend following and contrarian. Trend following strategies are momentum strategies in which
an asset is purchased if the price is rising, while in the contrarian strategy assets are sold
if the price is falling. The rst step in both strategies is trend estimation, which is the
focus of this paper. After a review of trend ltering techniques, we address practical issues,
depending on whether trend detection is designed to explain the past or forecast the future.
The principles of trend ltering

In time series analysis, the trend is considered to be the component containing the global
change, which contrasts with local changes due to noise. The separation between trend and
noise has a long mathematical history, and continues to be of great interest to the scientic
community. There is no precise denition of the trend, but it is generally accepted that it
is a smooth function representing long-term movement. Thus, trends should exhibit slow
change, while noise is assumed to be highly volatile.
The simplest trend ltering method is the moving average lter. On average, the noisy
parts of observations tend to cancel each other out, while the trend has a cumulative nature.
But observations can be averaged using many dierent types of weightings. More generally,
the dierent averages obtained are referred to as linear ltering. Several examples representing trend ltering for various linear lters are shown in Figure 1. In this example, the
averaging horizon (65 business days or one year) has much more inuence than the type of
averaging.
Other trend following methods, which are classied as nonlinear, use more complex
calculations to obtain more specic results (such as lters based on wavelet analysis, support
vector machines or singular spectrum analysis). For instance, the L1 lter is designed to
obtain piecewise constant trends, which can be interpreted more easily.
13/12/11 16:08
Figure 1: Trend estimate of the S&P 500 index
Variations around a benchmark estimator

Trend ltering can be performed either to explain past behaviour of asset prices, or to
forecast future returns. The choice of the estimator and its calibration primarily depend
on that objective. If the goal is to explain past price behaviour, there are two possible
approaches. The rst is to select the model and parameters that minimise past prediction
error. This can be performed using a cross-validation procedure, for example. The second
option is to consider a benchmark estimator, such as the six-month moving average, and to
calibrate another model to be as close to the benchmark as possible. For instance, the L1
lter of Figure 2 is calibrated to deliver a constant trend over an average six-month period.
This type of lter is more easily interpreted than the original six-month moving average,
with clearly delimited trend periods. This procedure can be performed on any time series.
From trend ltering to forecasting

Trend ltering may also be a predictive tool. This is a much more ambitious objective.
It supposes that the last observed trend has an inuence on future asset returns. More
precisely, trend following predictions suppose that positive (or negative) trends are more
likely to be followed by positive (or negative) returns. Any trend following method would
be useless if this assumption did not hold.
Figure 3 illustrates that the distributions of the one-month GSCI index returns after
a very positive three-month trend (i.e. above a threshold) clearly dominate the return
distribution after a very negative trend (i.e. below the threshold).
13/12/11 16:08
Issue # 8
Figure 2: L1 versus moving average ltering
Figure 3: Distribution of the conditional standardised monthly return
13/12/11 16:08
Furthermore, this persistence eect is also tested in Table 1 for a number of major
nancial indices. This table compares the average one-month return following a positive
three-month trend period to the average one-month return following a negative three month
trend period.
Table 1: Average one-month conditional return based on past trends
Trend
Eurostoxx 50
S&P 500
MSCI WORLD
MSCI EM
TOPIX
EUR/USD
USD/JPY
GSCI
Positive
1.1%
0.9%
0.6%
1.9%
0.4%
0.2%
0.2%
1.3%
Negative
0.2%
0.5%
0.3%
0.3%
0.4%
0.2%
0.2%
0.4%
Dierence
0.9%
0.4%
1.0%
2.2%
0.9%
0.4%
0.4%
1.6%
On average, for all indices under consideration, returns are higher after a positive trend than
after a negative one. Thus, the trends are persistent, and seem to have a predictive value.
This makes the case for the study of trend following strategies, and highlights the appeal of
trend ltering methods.
Conclusion
The ultimate goal of trend ltering in nance is to design portfolio strategies that may benet
from the identied trends. Such strategies must rely on appropriate trend estimators and
time horizons. This paper highlights the variety of estimators available in the academic
literature. But the choice of trend estimator is just one of the many questions that arises
in the denition of those strategies. In particular, diversication and risk budgeting are key
aspects of success.
13/12/11 16:08
Issue # 8
Table of Contents
1 Introduction
2 A review of econometric estimators for
2.1 The trend-cycle model . . . . . . . . .
2.2 Linear ltering . . . . . . . . . . . . .
2.3 Nonlinear ltering . . . . . . . . . . . .
2.4 Multivariate ltering . . . . . . . . . .
9
.
.
.
.
10
10
11
21
27
3 Trend ltering in practice

3.1 The calibration problem . . . . . . . . . . . . . . . . . . . . . .
3.2 What about the variance of the estimator? . . . . . . . . . . . .
3.3 From trend ltering to trend forecasting . . . . . . . . . . . . .
30
30
33
38
4 Conclusion
40
A Statistical complements
A.1 State space model and Kalman ltering
A.2 L1 ltering . . . . . . . . . . . . . . . .
A.3 Wavelet analysis . . . . . . . . . . . . .
A.4 Support vector machine . . . . . . . .
A.5 Singular spectrum analysis . . . . . . .
trend
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ltering
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
42
44
47
50
13/12/11 16:08
13/12/11 16:08
Issue # 8
Trend Filtering Methods

for Momentum Strategies
Benjamin Bruder
[email protected]
Tung-Lam Dao
[email protected]
Jean-Charles Richard
[email protected]
Thierry Roncalli
[email protected]
December 2011
Abstract
This paper studies trend ltering methods. These methods are widely used in momentum strategies, which correspond to an investment style based only on the history
of past prices. For example, the CTA strategy used by hedge funds is one of the
best-known momentum strategies. In this paper, we review the dierent econometric
estimators to extract a trend of a time series. We distinguish between linear and nonlinear models as well as univariate and multivariate ltering. For each approach, we
provide a comprehensive presentation, an overview of its advantages and disadvantages
and an application to the S&P 500 index. We also consider the calibration problem of
these lters. We illustrate the two main solutions, the rst based on prediction error,
and the second using a benchmark estimator. We conclude the paper by listing some
issues to consider when implementing a momentum strategy.
Keywords: Momentum strategy, trend following, moving average, ltering, trend extraction.
JEL classication: G11, G17, C63.
Introduction
The ecient market hypothesis tells us that nancial asset prices fully reect all available
information (Fama, 1970). One consequence of this theory is that future returns are not
predictable. Nevertheless, since the beginning of the nineties, a large body of academic
research has rejected this assumption. One of the arguments is that risk premiums are time
varying and depend on the business cycle (Cochrane, 2001). In this framework, returns
on nancial assets are related to some slow-moving economic variables that exhibit cyclical
patterns in accordance with the business cycle. Another argument is that some agents are
We
are grateful to Guillaume Jamet and Hoang-Phong Nguyen for their helpful comments.
13/12/11 16:08
not fully rational, meaning that prices may underreact in the short run but overreact at long
horizons (Hong and Stein, 1997). This phenomenon may be easily explained by the theory
of behavioural nance (Barberis and Thaler, 2002).
Based on these two arguments, it is now commonly accepted that prices may exhibit
trends or cycles. In some sense, these arguments chime with the Dow theory (Brown et al.,
1998), which is one of the rst momentum strategies. A momentum strategy is an investment
style based only on the history of past prices (Chan et al., 1996). We generally distinguish
between two types of momentum strategy:
1. the trend following strategy, which consists of buying (or selling) an asset if the estimated price trend is positive (or negative);
2. the contrarian (or mean-reverting) strategy, which consists of selling (or buying) an
asset if the estimated price trend is positive (or negative).
Contrarian strategies are clearly the opposite of trend following strategies. One of the tasks
involved in these strategies is to estimate the trend, excepted when based on mean-reverting
processes (see DAspremont, 2011). In this paper, we provide a survey of the dierent
trend ltering methods. However, trend ltering is just one of the diculties in building a
momentum strategy. The complete process of constructing a momentum strategy is highly
complex, especially as regards transforming past trends into exposures an important factor
that is beyond the scope of this paper.
The paper is organized as follows. Section two presents a survey of the dierent econometric trend estimators. In particular, we distinguish between methods based on linear
ltering and nonlinear ltering. In section three, we consider some issues that arise when
trend ltering is applied in practice. We also propose some methods for calibrating trend
ltering models and highlight the problem of estimator variance. Section four oers some
concluding remarks.
A review of econometric estimators for trend ltering
Trend ltering (or trend detection) is a major task of time series analysis from both a
mathematical and nancial viewpoint. The trend of a time series is considered to be the
component containing the global change, which contrasts with local changes due to noise.
The trend ltering procedure concerns not only the problem of denoising; it must also
take into account the dynamics of the underlying process. This explains why mathematical
approaches to trend extraction have a long history, and why this subject is still of great
interest to the scientic community1 . From an investment perspective, trend ltering is
fundamental to most momentum strategies developed in asset management and hedge funds
sectors in order to improve performance and limit portfolio risks.
2.1
The trend-cycle model
In economics, trend-cycle decomposition plays an important role by identifying the permanent and transitory stochastic components in a non-stationary time series. Generally, the
permanent component can be interpreted as a trend, whereas the transitory component may
1 See
Alexandrov et al. (2008).
10
13/12/11 16:08
Issue # 8
be a noise or a stochastic cycle. Let yt be a stochastic process. We assume that yt is the

sum of two dierent unobservable parts:
yt = xt + t
where xt represents the trend and t is a stochastic (or noise) process. There is no precise
denition for trend, but it is generally accepted to be a smooth function representing longterm movements:
[...] the essential idea of trend is that it shall be smooth. (Kendall, 1973).
It means that changes in the trend xt must be smaller than those of the process yt . From a
statistical standpoint, it implies that the volatility of yt yt1 is higher than the volatility
of xt xt1 :
(yt yt1 ) (xt xt1 )
One of the major problems in nancial econometrics is the estimation of xt . This is the
subject of signal extraction and ltering (Pollock, 2009).
Finite moving average ltering for trend estimation has a long history. It has been used
in actuarial science since the beginning of the twentieth century2 . But the modern theory of
signal ltering has its origins in the Second World War and was formulated independently
by Norbert Wiener (1941) and Andrei Kolmogorov (1941) in two dierent ways. Wiener
worked principally in the frequency domain whereas Kolmogorov considered a time-domain
approach. This theory was extensively developed in the fties and sixties by mathematicians
and statisticians such as Hermann Wold, Peter Whittle, Rudolf Kalman, Maurice Priestley,
George Box, etc. In economics, the problem of trend ltering is not a recent one, and may
date back to the seminal article of Muth (1960). It was extensively studied in the eighties and
nineties in the literature on business cycles, which led to a vast body of empirical research
being carried out in this area3 . However, it is in climatology that trend ltering is most
extensively studied nowadays. Another important point is that the development of ltering
techniques has evolved according to the development of computational power and the IT
industry. The Savitzky-Golay smoothing procedure may appear very basic today though it
was revolutionary4 when it was published in 1964.
In what follows, we review the class of ltering techniques that is generally used to
estimate a trend. Moving average lters play an important role in nance. As they are very
intuitive and easy to implement, they undoubtedly represent the model most commonly used
in trading strategies. The moving average technique belongs to the class of linear lters,
which share a lot of common properties. After studying this class of lters, we consider
some nonlinear ltering techniques, which may be well suited to solving nancial problems.
2.2
2.2.1
Linear ltering
The convolution representation
We denote by y = {. . . , y2 , y1 , y0 , y1 , y2 , . . .} the ordered sequence of observations of the

t be the estimator of the underlying trend xt which is by denition an
process yt . Let x
2 See,
in particular, the works of Henderson (1916), Whittaker (1923) and Macaulay (1931).
for example Cleveland and Tiao (1976), Beveridge and Nelson (1981), Harvey (1991) or Hodrick and
Prescott (1997).
4 The paper of Savitzky and Golay (1964) is still considered by the Analytical Chemistry journal to be
one of its 10 seminal papers.
3 See
11
13/12/11 16:08
unobservable process. A ltering procedure consists of applying a lter L to the data y:

x
= L (y)
1 , x
0 , x
1 , x
2 , . . .}. When the lter is linear, we have x
= Ly with the
with x
= {. . . , x
2 , x
normalisation condition 1 = L1. If we assume that the signal yt is observed at regular
dates5 , we obtain:

Lt,ti yti
(1)
x
t =
i=
We deduce that linear ltering may be viewed as a convolution. The previous lter may not
be of much use, however, because it uses future values of yt . As a result, we generally impose
some restriction on the coecients Lt,ti in order to use only past and present values of the
signal. In this case, we say that the lter is causal. Moreover, if we restrict our study to
time invariant lters, the equation (1) becomes a simple convolution of the observed signal
yt with a window function Li :
n1

x
t =
Li yti
(2)
i=0
With this notation, a linear lter is characterised by a window kernel Li and its support.
The kernel denes the type of ltering, whereas the support denes the range of the lter.
For instance, if we take a square window on a compact support [0, T ] with T = n the
width of the averaging window, we obtain the well-known moving average lter:
Li =
1
1 {i < n}
n
We nish this description by considering the lag representation:

x
t =
n1
Li Li yt
i=0
with the lag operator L satisfying Lyt = yt1 .

2.2.2
Measuring the trend and its derivative
We discuss here how to use linear ltering to measure the trend of an asset price and its
derivative. Let St be the asset price which follows the dynamics of the Black-Scholes model:
dSt
= t dt + t dWt
St
where t is the drift, t is the volatility and Wt is a standard Brownian motion. The
asset price St is observed in a series of discrete dates {t0 , . . . , tn }. Within this model, the
appropriate signal to be ltered is the logarithm of the price yt = ln St but not the price
itself. Let Rt = ln St ln St1 represent the realised return at time t over a unit period. If
t and t are known, we have:

1
Rt = t t2 + t t
2
5 We
have ti+1 ti = .
12
13/12/11 16:08
Issue # 8
where t is a standard Gaussian white noise. The ltered trend can be extracted using the
following equation:
n1

Li yti
x
t =
i=0
and the estimator of t is6 :
t
n1
1
Li Rti
i=0
We can also obtain the same result by applying the lter directly to the signal and dening
the derivative of the window function as i = L i :
n
t
1
i yti
i=0
We obtain the following correspondence:
L0
Li Li1
i =
Ln1
if i = 0
if i = 1, . . . , n 1
if i = n
(3)
t are related by the following expression:

Remark 1 In some senses,
t and x
t =
d
x
t
dt
Econometric methods principally involve x

t , whereas
t is more important for trading strategies.
Remark 2
t is a biased estimator of t and the bias increases with the volatility of the
process t . The expression of the unbiased estimator is then:
t =
n1
1 2
1
Li Rti
t +
2
i=0
Remark 3 In the previous analysis, x

t and
t are two estimators. We may also represent
them by their corresponding probability density functions. It is therefore easy to derive
estimates, but we should not forget that these estimators present some variance. In nance,
and in particular in trading strategies, the question of statistical inference is generally not
addressed. However, it is a crucial factor in designing a successful momentum strategy.
2.2.3
Moving average lters
Average return over a given period Here, we consider the simplest case corresponding
to the moving average lter where the form of the window is:
Li =
1
1 {i < n}
n
In this case, the only calibration parameter is the window support, i.e. T = n. It characterises the smoothness of the ltered signal. For the limit T 0, the window becomes
a Dirac distribution t and the ltered signal is exactly the same as the observed signal:
6 If
we neglect the contribution from the term t2 . Moreover, we consider = 1 to simplify the calculation.
13
13/12/11 16:08
x
t = yt . For T > 0, if we assume that the noise t is independent from xt and is a centered
process, the rst contribution of the ltered signal is the average trend:
x
t =
n1
1
xti
n i=0
If the trend is homogeneous, this average value is located at t (n 1) /2 by construction.

It means that the ltered signal lags the observed signal by a time period which is half the
window. To extract the derivative of the trend, we compute the derivative kernel i which
is given by the following formula:
i =
1
(i,0 i,n )
n
where i,j is the Kronecker delta7 . The main advantage of using a moving average lter is
the reduction of noise due to the central limit theorem. For the limit case n , the signal
is completely denoised but it corresponds to the average value of the trend. The estimator is
also biased. In trend ltering, we also face a trade-o between denoising maximisation and
bias minimisation. The problem is the calibration procedure for the lag window T . Another
way to determine the optimal parameter T is to take into account the dynamics of the
trend.
The above moving average lter can be applied directly to the signal. However,
t is
simply the cumulative return over the window period. It needs only the rst and last dates
of the period under consideration.
Moving average crossovers Many practitioners, and even individual investors, use the
moving average of the price itself as a trend indication, instead of the moving average of
returns. These moving averages are generally uniform moving averages of the price. Here
we will consider an average of the logarithm of the price, in order to be consistent with the
previous examples:
n1
1
yti
ytn =
n i=0
Of course, an average price does not estimate the trend t . This trend is estimated from
the dierence between two moving averages over two dierent time horizons n1 and n2 .
Supposing that n1 > n2 , the trend may be estimated from:
t
2
(
y n2 ytn1 )
(n1 n2 ) t
(4)
In particular, the estimated trend is positive if the short-term moving average is higher
than the long-term moving average. Thus, the sign of the trend changes when the shortterm moving average crosses the long-term moving average. Of course, when the short-term
horizon n1 is one, then the short-term moving average is just the current asset price. The
1
scaling term 2 (n1 n2 ) is explained below. It is derived from the interpretation of this
estimator as a weighted moving average of asset returns. Indeed, this estimator can be
interpreted in terms of asset returns by inverting the formula (3) with Li being interpreted
as the primitive of i :
if i = 0
0
i + Li1 if i = 1, . . . , n 1
Li =
if i = n
n+1
7
i,j
is equal to 1 if i = j and 0 otherwise.
14
13/12/11 16:08
Issue # 8
The weighting of each return in the estimator (4) is represented in Figure 1. It forms a
triangle, and the biggest weighting is given at the horizon of the smallest moving average.
Therefore, depending on the horizon n2 of the shortest moving average, the indicator can
be focused toward the current trend (if n2 is small) or toward past trends (if n2 is as large
as n1 /2 for instance). From these weightings, in the case of a constant trend , we can
compute the expectation of the dierence between the two moving averages:

n1 n2
1 2
n2
n1
t
E [
yt yt ] =
2
2
Therefore, the scaling factor in formula (4) appears naturally.
Figure 1: Window function Li of moving average crossovers (n1 = 100)
Enhanced lters To improve the uniform moving average estimator, we may take the
following kernel function:
n

4
i = 2 sgn
i
n
2
We notice that the estimator
t now takes into account all the dates of the window period.
By taking the primitive of the function i , the trend lter is given as follows:
n

4 n
Li = 2

i
n
2
2
We now move to the second type of moving average lter which is characterised by an
asymmetric form of the convolution kernel. One possibility is to take an asymmetric window
function with a triangular form:
Li =
2
(n i) 1 {i < n}
n2
15
13/12/11 16:08
By computing the derivative of this window function, we obtain the following kernel:
i =
2
(i 1 {i < n})
n
The ltering equation of t then becomes:

n1
2
1
t =
xti
xt
n
n i=0
Remark 4 Another way to dene
t is to consider the Lanczos generalised derivative
(Groetsch, 1998). Let f (x) be a function. We dene the Lanczos derivative of f (x) in
terms of the following relationship:

3
dL
tf (x + t) dt
f (x) = lim 3
0 2
dx
In the discrete case, we have:

dL
f (x) = lim
h0
dx
n
kf (x + kh)
n
2 k=1 k 2 h
k=n
We rst notice that the Lanczos derivative is more general than the traditional derivative.
Although Lanczos formula is a more onerous method for nding the derivative, it oers
some advantages. This technique allows us to compute a pseudo-derivative at points where
the function is not dierentiable. For the observable signal yt , the traditional derivative does
not exist because of the noise t , but does in the case of the Lanczos derivative. Let us apply
the Lanczos formula to estimate the derivative of the trend at the point t T /2. We obtain:
n

dL
12 n
x
t = 3
i yti
dt
n i=0 2
We deduce that the kernel is:

i =

12 n
i
1 {0 i n}
n3 2
By computing an integration by parts, we obtain the trend lter:

Li =
6
i (n i) 1 {0 i n}
n3
In Figure 2, we have represented the dierent functions Li given in this paragraph. We

may extend these lters by computing the convolution of two or more lters. For exemple,
the mixed lter in Figure 2 is the convolution of the asymmetric lter with the Lanczos
lter. Let us apply these lters to the S&P 500 index. The results are given in Figure 3
for two values of the window length (n = 65 days and n = 260 days). We notice that the
choice of n has a big impact on the ltered series. The choice of the window function seems
to be less important at rst sight. However, we should mention that traders are principally
interested in the derivative of the trend, and not the absolute value of the trend itself. In
this case, the window function may have a signicant impact. Figure 4 is the scatterplot of
the
t statistic in the case of the S&P 500 index from January 2000 to July 2011 (we have
considered the uniform and Lanczos lters using n = 260). We may also show that this
impact increases when we reduce the length of the window as illustrated in Table 1.
16
13/12/11 16:08
Issue # 8
Figure 2: Window function Li of moving average lters (n = 100)
Figure 3: Trend estimate for the S&P 500 index
17
13/12/11 16:08
Table 1: Correlation between the uniform and Lanczos derivatives

n
Pearson
Kendall
Spearman
5
84.67
65.69
83.15
10
87.86
68.92
86.09
22
90.14
70.94
88.17
65
90.52
71.63
88.92
130
92.57
73.63
90.18
260
94.03
76.17
92.19
Figure 4: Comparison of the derivative of the trend
2.2.4
Least squares lters
L2 ltering The previous Lanczos lter may be viewed as a local linear regression (Burch
et al., 2005). More generally, least squares methods are often used to dene trend estimators:
n
1
2
n } = arg min
(yt x
t )
{
x1 , . . . , x
2 t=1
However, this problem is not well-dened. We also need to impose some restrictions on the
t to obtain a solution. For example, we may
underlying process yt or on the ltered trend x
consider a deterministic constant trend:
xt = xt1 +
In this case, we have:
yt = t + t
(5)
Estimating the ltered trend x

t is also equivalent to estimating the coecient :
n
tyt
= t=1
n
2
t=1 t
18
13/12/11 16:08
Issue # 8
If we consider a trend that is not constant, we may dene the following objective function:
n
n1

1
2
2
(yt x
t ) +
(
xt1 2
xt + x
t+1 )
2 t=1
t=2
In this function, is the regularisation parameter which controls the competition between
t and the noise yt x
t . We may rewrite the objective function in the
the smoothness8 of x
vectorial form:
1
2
2
y x
2 + D
x 2
2
where y = (y1 , . . . , yn ), x
= (
x1 , . . . , x
n ) and the D operator is the (n 2) n matrix:
1 2
1
1 2
1
..
D=
1 2 1
1 2 1
The estimator is then given by the following solution:

1
x
= I + 2D D
y
It is known as the Hodrick-Prescott lter (or L2 lter). This lter plays an important role
in calibrating the business cycle.
Kalman ltering Another important trend estimation technique is the Kalman lter,
which is described in Appendix A.1. In this case, the trend t is a hidden process which
follows a given dynamic. For example, we may assume that the model is9 :

Rt = t + t
(6)
t = t1 + t
Here, the equation of Rt is the measurement equation and Rt is the observable signal of
to follow a random walk. We dene
realised returns. The hidden process t is supposed

2
t|t1 t . Using the results given in Appendix
t|t1 = Et1 [t ] and Pt|t1 = Et1

A.1, we have:
t|t1 + Kt Rt
t+1|t = (1 Kt )

where Kt = Pt|t1 / Pt|t1 + 2 is the Kalman gain. The estimation error is determined
by Riccatis equation:
Pt+1|t = Pt|t1 + 2 Pt|t1 Kt
Riccatis equation gives us the stationary solution:

P =
+ 2 + 42
2
The lter equation becomes:
t+1|t = (1 )
t|t1 + Rt
8 We notice that the second term is the discrete derivative of the trend x
t which characterises the smoothness of the curve.
9 Equation (5) is a special case of this model if = 0.
19
13/12/11 16:08
with:
=
2

+ 2 + 42
This Kalman lter can be considered as an exponential moving average lter with parameter10 = ln (1 ):

ei Rti
t = 1 e
i=0
11
with
t = Et [t ]. The lter of the trend x
t is therefore determined by the following
equation:

ei yti
x
t = 1 e
i=0
while the derivative of the trend may be directly related to the observed signal yt as follows:

ei yti
t = 1 e yt 1 e e 1
i=1
In Figure 5, we reported the window function of the Kalman lter for several values of .
We notice that the cumulative weightings increase strongly with . The half-life of this lter
is approximatively equal to
1 21 ln 2. For example, the half-life for = 5% is 14
days.
Figure 5: Window function Li of the Kalman lter
10 We
11 We
have 0 < < 1 and lambda > 0.

notice that
t+1|t =
t .
20
13/12/11 16:08
Issue # 8
We may wonder what the link is between the regression model (5) and the Markov model
(6). Equation (5) is equivalent to the following state space model12 :

yt = xt + t
xt = xt1 +
If we now consider that the trend is stochastic, the model becomes:

yt = xt + t
xt = xt1 + + t
This model is called the local level model. We may also assume that the slope of the trend
is stochastic, in which case we obtain the local linear trend model:
yt = xt + t
xt = xt1 + t1 + t
t = t1 + t
These three models are special cases of structural models (Harvey, 1989) and may be easily
solved by Kalman ltering. We also deduce that the Markov model (6) is a special case of
the latter when = 0.
Remark 5 We have shown that Kalman ltering may be viewed as an exponential moving
average lter when we consider the Markov model (6). Nevertheless, we cannot regard the
Kalman lter simply as a moving average lter. First, the Kalman lter is the optimal
lter in the case of the linear Gaussian model described in Appendix A.1. Second, it could
be regarded as an ecient computational solution of the least squares method (Sorensen,
1970). Third, we could use it to solve more sophisticated processes than the Markov model
(6). However, some nonlinear or non Gaussian models may be too complex for Kalman
ltering. These nonlinear models can be solved by particle lters or sequential Monte Carlo
methods (see Doucet et al., 1998).
Another important feature of the Kalman approach is the derivation of an optimal
smoother (see Appendix A.1). At time t, we are interested by the numerical value of xt , but
also by the past values of xti because we would like to measure the slope of the trend. The
Kalman smoother improves the estimate of x
ti by using all the information between t i
and t. Let us consider the previous example in relation to the S&P 500 index, using the local
level model. Figure 6 gives the ltered and smoothed components xt and t for two sets
of parameters13 . We verify that the Kalman smoother reduces the noise by incorporating
more information. We also notice that the restriction = 0 increases the variance of the
trend and slope estimators.
2.3
Nonlinear ltering
In this section, we review other ltering approaches. They are generally classed as nonlinear
lters, because it is not possible to express the trend as a linear convolution of the signal
and a window function.
12 In
what follows, the noise processes are white noise: t N (0, 1), t N (0, 1) and t N (0, 1).
the rst set of parameters, we assume that = 100 and = 1/100 . For the second set of
parameters, we impose the restriction = 0.
13 For
21
13/12/11 16:08
Figure 6: Kalman ltered and smoothed components
2.3.1
Nonparametric regression
In the regression model (5), we assume that xt = f (t) while f (t) = t. The model is said to
be parametric because the estimation of the trend consists of estimating the parameter .
t. With nonparametric regression, we directly estimate the function f ,
We then have x
t =
obtaining x
t = f (t). Some examples of nonparametric regression are kernel regression, loess
regression and spline regression. A popular method for trend ltering is local polynomial
regression:
yt
= f (t) + t
p

j
= 0 ( ) +
j ( ) ( t) + t
j=1
For a given value of , we estimate the parameters j ( ) using weighted least squares with
the following weightings:

t
wt = K
h
where K is the kernel function with a bandwidth h. We deduce that:
x
t = E [ yt | = t] = 0 (t)
Cleveland (1979) proposed an improvement to the kernel regression through a two-stage
procedure (loess regression). First, we
t a polynomial regression to estimate the residuals
|)) and run a
t . Then, we compute t = 1 u2t 1 {|ut | 1} with ut = t / (6 median (|
second kernel regression14 with weightings t wt .
14 Cleveland
(1979) suggests using the tricube kernel function to dene K.
22
13/12/11 16:08
Issue # 8
A spline function is a C 2 function S ( ) which corresponds to a cubic polynomial function

on each interval [t, t + 1[. Let SP be the set of spline functions. We then have to solve the
following optimisation programme:
T
n

2
2
wt (yt S (t)) + h
w S ( ) d
min (1 h)
SSP
t=0
where h is the smoothing parameter h = 0 corresponds to the interpolation case15 and

h = 1 corresponds to the linear regression16 .
Figure 7: Illustration of the kernel, loess and spline lters
We illustrate these three nonparametric methods in Figure 7. The calibration of these

lters is more complicated than for moving average lters, where the only parameter is the
length n of the window. With these methods, we have to decide the polynomial degree17 p,
the kernel function18 K and the smoothing parameter19 h.
2.3.2
L1 ltering
The idea of the Hodrick-Prescott lter can be generalised to a larger class of lters by using
the Lp penalty condition instead of the L2 penalty. This generalisation was previously
15 We
have x
t = S (t) = yt .
have x
t = S (t) = c +
t with (
c,
) the OLS estimate of yt on a constant and time t because the
optimum is reached for S ( ) = 0.
17 For the kernel regression, we use a Gaussian kernel with a bandwidth h = 0.10. We notice the impact
of the degree of polynomial. The higher the degree, the smoother the trend (and the slope of the trend).
18 For the loess regression, the degree of polynomial is set to 1 and the bandwidth h is 0.02. We show the
impact of the second step which modies the kernel function.
19 For the spline regression, we consider a uniform kernel function. We notice that the parameter h has an
impact on the smoothness of the trend.
16 We
23
13/12/11 16:08
discussed in the work of Daubechies et al. (2004) in relation to the linear inverse problem,
while Tibshirani (1996) considers the Lasso regression problem. If we consider an L1 lter,
the objective function becomes:
n
n1

1
2
(yt x
t ) +
|
xt1 2
xt + x
t+1 |
2 t=1
t=2
which is equivalent to the following vectorial form:

1
2
x 1
y x
2 + D
2
Kim et al. (2009) shows that the dual problem of this L1 lter scheme is a quadratic
, we may also use the quadratic
programme with some boundary constraints20 . To nd x
programming algorithm, but Kim et al. (2009) suggest using the primal-dual interior point
method in order to optimise the numerical computation speed.
We have illustrated the L1 lter in Figure 8. Contrary to all other previous methods, the
ltered signal comprises a set of straight trends and breaks21 , because the L1 norm imposes
the condition that the second derivative of the ltered signal must be zero. The competition
between the two terms in the objective function turns to the competition between the number
of straight trends (or the number of breaks) and the closeness to the data. Thus, the
smoothing parameter plays an important role for detecting the number of breaks. This
explains why L1 ltering is radically dierent to L2 (or Hodrick-Prescott) ltering. Moreover,
it is easy to compute the slope of the trend
t for the L1 lter. It is a step function, indicating
clearly if the trend is up or down, and when it changes (see Figure 8).
2.3.3
Wavelet ltering
Another way to estimate the trend xt is to denoise the signal yt by using spectral analysis. The Fourier transform is an alternative representation of the original signal yt , which
becomes a frequency function:
n

y () =
yt eit
t=1
We note y () = F (y). By construction, we have y = F 1 (y) with F 1 the inverse Fourier

transform. A simple idea for denoising in spectral analysis is to set some coecients y ()
to zero before reconstructing the signal. Figure 9 is an illustration of denoising using the
thresholding rule. Selected parts of the frequency spectrum can easily be manipulated by
ltering tools. For example, some can be attenuated, and others may be completely removed.
Applying the inverse Fourier transform to this ltered spectrum leads to a ltered time series.
Therefore, a smoothing signal can be easily performed by applying a low-pass lter, that is,
by removing the higher frequencies. For example, we have represented two denoised signals
of the S&P 500 index in Figure 9. For the rst one, we use a 95% thresholding procedure
whereas 99% of the Fourier coecients are set to zero in the second case. One diculty
with this approach is the bad time location for low frequency signals and the bad frequency
location for the high frequency signals. It is then dicult to localise when the trend (which
is located in low frequencies) reverses. But the main drawback of spectral analysis is that
it is not well suited to nonstationary processes (Martin and Flandrin, 1985, Fuentes, 2002,
Oppenheim and Schafer, 2009).
20 The
21 A
detail of this derivation is shown in Appendix A.2.

break is the position where the signal trend changes.
24
13/12/11 16:09
Issue # 8
Figure 8: L1 versus L2 ltering
Figure 9: Spectral ltering
25
13/12/11 16:09
A solution consists of adopting a double dimension analysis, both in time and frequency.
This approach corresponds to the wavelet analysis. The method of denoising is the same as
described previously and the estimation of xt is done in three steps:
1. we compute the wavelet transform W of the original signal yt to obtain the wavelet
coecients = W (y);
2. we modify the wavelet coecients according to a denoising rule D:
= D ()
3. We convert the modied wavelet coecients into a new signal using the inverse wavelet
transform W 1 :
x = W 1 ( )
There are two principal choices in this approach. First, we have to specify which mother
wavelet to use. Second, we have to dene the denoising rule. Let and + be two scalars
with 0 < < + . Donoho and Johnstone (1995) dene several shrinkage methods22 :
Hard shrinkage

i = i 1 |i | > +
Soft shrinkage

i = sgn (i ) |i | + +
Semi-soft shrinkage
si |i |
0
1

i =
sgn (i ) ( + ) + (|i | ) si < |i | +
i
si |i | > +
Quantile shrinkage is a hard shrinkage method where w+ is the q th quantile of the
coecients |i |.
Wavelet ltering is illustrated in Figure 10. We have computed the wavelet coecients
using the cascade algorithm of Mallat (1989) and the low-pass and high-pass lters of order
6 proposed by Daubechies (1992). The ltered trend is obtained using quantile shrinkage.
In the rst case, the noisy signal remains because we consider all the coecients (q = 0). In
the second and third cases, 95% and 99% of the wavelet coecients are set to zero23 .
2.3.4
Other methods
Many other methods can be used to perform trend ltering. The most recent include, for
example, singular spectrum analysis24 (Vautard et al., 1992), support vector machines25
and empirical mode decomposition (Flandrin et al., 2004). Moreover, we notice that traders
sometimes use their own techniques (see, inter alia, Ehlers, 2001).
22 In
practice, the coecients i are standardised before being computed.

is interesting to note that the denoising procedure retains some wavelet coecients corresponding to
high and medium frequencies and located around the 2008 crisis.
24 See Appendix A.5 for an illustration.
25 A brief presentation is given in Appendix A.4.
23 It
26
13/12/11 16:09
Issue # 8
Figure 10: Wavelet ltering
2.4
Multivariate ltering
Until now, we have assumed that the trend is specic to a nancial asset. However, we may
be interested in estimating the common trend of several nancial assets. For example, if we
wanted to estimate the trend of emerging markets equities, we could use a global index like
the MSCI EM or extract the trend by considering several indices, e.g. the Bovespa index
(Brazil), the RTS index (Russia), the Nifty index (India), the HSCEI index (China), etc. In
this case, the trend-cycle model becomes:
(1)
(1)
t
yt
.
. = xt + ..
.
.
(m)
(m)
yt
t
(j)
(j)
where yt and t are respectively the signal and the noise of the nancial asset j and xt
is the common trend. One idea for estimating the common trend is to obtain the mean of
the specic trends:
m
x
t =
1 (j)
x
m j=1 t
27
13/12/11 16:09
If we consider moving average ltering, it is equivalent to applying the lter to the average
m (j)
1
lter26 yt = m
j=1 yt . This rule is also valid for some nonlinear lters such as L1 ltering
(see Appendix A.2). In what follows, we consider the two main alternative approaches
developed in econometrics to estimate a (stochastic) common trend.
2.4.1
Error-correction model, common factors and the P-T decomposition
The econometrics of nonstationary time series may also help us to estimate a common trend.
(j)
(j)
(j)
yt is said to be integrated of order 1 if the change yt yt1 is stationary.
We will note
(j)
yt
(j)
I (1) and (1 L) yt
(1)
(m)
I (0). Let us now dene yt = yt , . . . , yt
. The vector yt
is cointegrated of rank r if there exists a matrix of rank r such that zt = yt I (0).

In this case, we show that yt may be specied by an error-correction model (Engle and
Granger, 1987):

i yti + t
(7)
yt = zt1 +
i=1
where t is a I (0) vector process. Stock and Watson (1988) propose another interesting
representation of cointegration systems. Let ft be a vector of r common factors which are
I (1). Therefore, we have:
yt = Aft + t
(8)
where t is a I (0) vector process and ft is a I (1) vector process. One of the diculties with
this type of model is the identication step (Pea and Box, 1987). Gonzalo and Granger
(1995) suggest dening a permanent-transitory (P-T) decomposition:
y t = P t + Tt
such that the permanent component Pt is dierence stationary, the transitory component Tt
is covariance stationary and (Pt , Tt ) satises a constrained autoregressive representation.
Using this framework and some other conditions, Gonzalo and Granger show that we may
obtain the representation (8) by estimating the relationship (7):
ft = yt
(9)
where = 0. They then follow the works of Johansen (1988, 1991) to derive the maximum
likelihood estimator of . Once we have estimated the relationship (9), it is also easy to
t .
identify the common trend27 x
26 We
have:
x
t
m n1
1 XX
(j)
Li yti
m j=1 i=0
0
1
n1
m
X
X
1
(j)
Li @
y A
m j=1 ti
i=0
n1
X
Li yti
i=0
27 If
a common trend exists, it is necessarily one of the common factors.
28
13/12/11 16:09
Issue # 8
2.4.2
Common stochastic trend model
Another idea is to consider an extension of the local linear trend model:
yt = xt + t
xt = xt1 + t1 + t
t = t1 + t

(1)
(m)
(1)
(m)
, t = t , . . . , t
N (0, ), t N (0, 1) and t N (0, 1).
with yt = yt , . . . , yt
Moreover, we assume that t , t and t are independent of each other. Given the parameters
(, , , ), we may run the Kalman lter to estimate the trend xt and the slope t whereas
the Kalman smoother allows us to estimate xti and ti at time t.
Remark 6 The case = 0 has been extensively studied by Chang et al. (2009). In
particular, they show that yt is cointegrated with = 1 and a m (m 1) matrix
such that 1 = 0 and 1 = Im1 . Using the P-T decomposition, they also found
that the common stochastic trend is given by 1 yt , implying that the above averaging
rule is not optimal.
We come back to the example given in Figure 6 page 22. Using the second set of
parameters, we now consider three stock indices: the S&P 500 index, the Stoxx 600 index
and the MSCI EM index. For each index, we estimate the ltered trend. Moreover, using the
previous common stochastic trend model28 , we estimate the common trend for the bivariate
signal (S&P 500, Stoxx 600) and the trivariate signal (S&P 500, Stoxx 600, MSCI EM).
Figure 11: Multivariate Kalman ltering
28 We
assume that j takes the value 1 for the three signals.
29
13/12/11 16:09
3
3.1
Trend ltering in practice

The calibration problem
For the practical use of the trend extraction techniques discussed above, the calibration of
ltering parameters is crucial. These calibrated parameters must incorporate our prediction
requirement or they can be mapped to a commonly-known benchmark estimator. These
constraints oer us some criteria for determining the optimal parameters for our expected
prediction horizon. Below, we consider two possible calibration schemes based on these
criteria.
3.1.1
Calibration based on prediction error
One idea for estimating the parameters of a model is to use statistical inference tools. Let
us consider the local linear trend model. We may estimate the set of parameters ( , , )
by maximising the log-likelihood function29 :
n
=
v2
1
ln 2 + ln Ft + t
2 t=1
Ft
# $
where vt = yt Et1 [yt ] is the innovation process and Ft = Et1 vt2 is the variance of vt .
In Figure 12, we have reported the ltered and smoothed trend and slope estimated by the
maximum likelihood method. We notice that the estimated components are more noisy than
those obtained in Figure 6. We can explain this easily because maximum likelihood is based
on the one-day innovation process. If we want to look at a longer trend, we have to consider
the innovation process vt = yt Eth [yt ] where h is the horizon time. We have reported
the slope for h = 50 days in Figure 12. It is very dierent from the slope corresponding to
h = 1 day.
The problem is that the computation of the log-likelihood for the innovation process
vt = yt Eth [yt ] is trickier because there is generally no analytic expression. This is
why we do not recommend this technology for trend ltering problems, because the trends
estimated are generally very short-term. A better solution is to employ a cross-validation
procedure to calibrate the parameters of the lters discussed above. Let us consider the
calibration scheme presented in Figure 13. We divide our historical data into a training set
and a validation set, which are characterised by two time parameters T1 and T2 . The size
of training set T1 controls the precision of our calibration, for a xed parameter . For this
training set, the value of the expectation of Eth [yt ] is computed. The second parameter
29 Another way of estimating the parameters is to consider the log-likelihood function in the frequency
domain analysis (Roncalli, 2010). In the case of the local linear trend model, the stationary form of yt is
S (yt ) = (1 L)2 yt . We deduce that the associated log-likelihood function is:
n1
n1
n
1 X I (j )
1 X
ln f (j )
ln 2
2
2 j=0
2 j=0 f (j )
where I (j ) is the periodogram of S (yt ) and f () is the spectral density:

f () =
because we have:
2 + 2 (1 cos ) 2 + 4 (1 cos )2 2
2
S (yt ) = t1 + (1 L) t + (1 L)2 t
30
13/12/11 16:09
Issue # 8
Figure 12: Maximum likelihood of the trend and slope components
T2 determines the size of the validation set, which is used to estimate the prediction error:
e (; h) =
nh
(yt Eth [yt ])
t=1
This quantity is directly related to the prediction horizon h = T2 for a given investment
strategy. The minimisation of the prediction error leads to the optimal value of the lter
parameters which will be used to predict the trend for the test set. For example, we apply
this calibration scheme for L1 ltering for h equal to 50 days. Figure 14 illustrates the
calibration procedure for the S&P 500 index with T1 = 400 and T2 = 50. Minimising the
cumulative prediction error over the validation set gives the optimal value = 7.03.
Figure 13: Cross-validation procedure for determining optimal parameters
Training set
|
|
T1
-|
Historical data
3.1.2
Forecasting
Test set
T2
|

Today
T2
Prediction
Calibration based on benchmark estimator
The trend ltering algorithm can be calibrated with a benchmark estimator. In order to
illustrate this idea, we present in this discussion the calibration procedure for L2 lters by
31
13/12/11 16:09
Figure 14: Calibration procedure with the S&P 500 index for the L1 lter
using spectral analysis. Though the L2 lter provides an explicit solution which is a great
advantage for numerical implementation, the calibration of the smoothing parameter is
not straightforward. We propose to calibrate the L2 lter by comparing the spectral density
of this lter with that obtained using the uniform moving average lter with horizon n for
which the spectral density is:
f
MA
1
() = 2
n
n1
it
t=0

1
For the L2 lter, the solution has the analytical form x
= 1 + 2D D
y. Therefore, the
spectral density can also be computed explicitly:
f HP () =
1
1 + 4 (3 4 cos + cos 2)
2
2

This spectral density can then be approximated by 1/ 1 + 2 4 . Hence, the spectral
1/4
for the L2 lter whereas it is 2n1 for the uniform moving average lter.
width is (2)
The calibration of the L2 lter could be achieved by matching these two quantities. Finally,
we obtain the following relationship:
=
1 n 4
2 2
In Figure 15, we represent the spectral density of the uniform moving average lter for
dierent window sizes n. We also report the spectral density of the corresponding L2 lters.
To obtain this, we calibrated the optimal parameter by least square minimisation. In
32
13/12/11 16:09
Issue # 8
Figure 16, we compare the optimal estimator with that corresponding to 10.27 . We
notice that the approximation is very good30 .
Figure 15: Spectral density of moving average and L2 lters
3.2
What about the variance of the estimator?
Let
t be the estimator of the slope of the trend. There may be a confusion between the
estimator of the slope and the estimated value of the slope (or the estimate). The estimator
is a random variable and is dened by a probability distribution function. Based on the
sample data, the estimator takes a value which is the estimate of the slope. Suppose that
we obtain an estimate of 10%. It means that 10% is the most likely value of the slope given
the data. But it does not mean that 10% is the true value of the slope.
3.2.1
Measuring the eciency of trend lters
Let 0t be the true value of the slope. In statistical inference, the quality of an estimator is
dened by the mean squared error (or MSE):

2
t 0t
MSE (
t ) = E
(1)
It indicates how far the estimates are from the true value. We say that the estimator
t
(2)
is more ecient than the estimator
t if its MSE is lower:

(1)
(2)
(1)
(2)
t
MSE
t
t MSE
t
30 We
estimated the gure 10.27 using least squares.
33
13/12/11 16:09
Figure 16: Relationship between the value of and the length of the moving average lter
We may decompose the MSE statistic into two components:

#
$2
2
t ] 0t
t E [
t ]) + E E [
MSE (
t ) = E (
The rst component is the variance of the estimator var (
t ) whereas the second component
is the square of the bias B (
t ). Generally, we are interested by estimators that are unbiased
(B (
t ) = 0). If this is the case, comparing two estimators is equivalent to comparing their
variances.
Let us assume that the price process is a geometric Brownian motion:
dSt = 0 St dt + 0 St dWt
In this case, the slope of the trend is constant and is equal to 0 . In Figure 17, we have
reported the probability density function of the estimator
t when the true slope 0 is 10%.
We consider the estimator based on a uniform moving average lter of length n. First, we
notice that using lters is better than using the noisy signal. We also observe that the
variance of the estimators increases with the parameter 0 and decreases with the length n.
3.2.2
Trend detection versus trend ltering
In the previous paragraph, we saw that an estimate of the trend may not be signicant if
the variance of the estimator is too large. Before computing an estimate of the trend, we
then have to decide if there is a trend or not. This process is called trend detection. Mann
(1945) considers the following statistic:
(n)
St
n2
n1
sgn (yti ytj )
i=0 j=i+1
34
13/12/11 16:09
Issue # 8
Figure 17: Density of the estimator

t
Figure 18: Impact of 0 on the estimator

t
35
13/12/11 16:09
with sgn (yti ytj ) = 1 if yti > ytj and sgn (yti ytj ) = 1 if yti < ytj . We
have31 :
n (n 1) (2n + 5)

(n)
=
var St
18
We can show that:
n (n + 1)
n (n + 1)
(n)
St
2
2
The bounds are reached if yt < yti (negative trend) or yt > yti (positive trend) for i N .
We can then normalise the score:
(n)
St
(n)
2St
n (n + 1)
(n)
St takes the value +1 (or 1) if we have a perfect positive (or negative) trend. If there is
(n)
no trend, it is obvious that St 0. Under this null hypothesis, we have:
(n)
Zt
N (0, 1)
with:
(n)
Zt
=%
(n)
St

(n)
var St
(n)
In Figure 19, we reported the normalised score St for the S&P 500 index and dierent
values of n. Statistics relating to the null hypothesis are given in Table 2 for the study
period. We notice that we generally reject the hypothesis that there is no trend when we
consider a period of one year. The number of cases when we observe a trend increases if we
consider a shorter period. For example, if n is equal to 10 days, we accept the hypothesis
that there is no trend in 42% of cases when the condence level is set to 90%.
Table 2: Frequencies of rejecting the null hypothesis with condence level
n = 10 days
n = 3 months
n = 1 year
90%
58.06%
85.77%
97.17%
95%
49.47%
82.87%
96.78%
99%
29.37%
76.68%
95.33%
(10)
Remark 7 We have reported the statistic St

against the trend estimate32
t for the S&P
(10)
is negative.
500 index since January 2000. We notice that
t may be positive whereas St
This illustrates that a trend measurement is just an estimate. It does not mean that a trend
exists.
31 If
there are some tied sequences (yti = yti1 ), the formula becomes:
1
(n)
var St
=
18
n (n 1) (2n + 5)
g
X
nk (nk 1) (2nk + 5)
k=1
with g the number of tied sequences and nk the number of data points in the kth tied sequence.
32 It is computed with a uniform moving average of 10 days.
36
13/12/11 16:09
Issue # 8
Figure 19: Trend detection for the S&P 500 index
Figure 20: Trend detection versus trend ltering
37
13/12/11 16:09
3.3
From trend ltering to trend forecasting
There are two possible applications for the trend following problem. First, trend ltering
can analyse the past. A noisy signal can be transformed into a smoother signal, which can be
interpreted more easily. An ex-post analysis of this kind can, for instance, clearly separate
increasing price periods from decreasing price periods. This analysis can be performed on
any time series, or even on a random walk. For example, we have reported four simulations
of a geometric Brownian motion without drift and annual volatility of 20% in Figure 21. In
this context, trend ltering could help us to estimate the dierent trends in the past.
Figure 21: Four simulations of a geometric Brownian motion without drift
On the other hand, trend analysis may be used as a predictive tool. Prediction is a
much more ambitious objective than analysing the past. It cannot be performed on any
time series. For instance, trend following predictions suppose that the last observed trend
inuences future returns. More precisely, these predictors suppose that positive (or negative)
trends are more likely to be followed by positive (or negative) returns. Such an assumption
has to be tested empirically. For example, it is obvious that the time series in Figure 21
exhibit certain trends, whereas we know that there is no trend in a geometric Brownian
motion without drift. Thus, we may still observe some trends in an ex-post analysis. It does
not mean, however, that trends will persist in the future.
The persistence of trends is tested here in a simple framework for major nancial indices33 . For each of these indices the average one-month returns are separated into two sets.
The rst set includes one-month returns that immediately follow a positive three-month
return (this is negative for the second set). The average one-month return is computed for
each of these two sets, and the results are given in Table 3. These results clearly show
33 The
study period begins in January 1995 (January 1999 for the MSCI EM) and nish in October 2011.
38
13/12/11 16:09
Issue # 8
Figure 22: Distribution of the conditional standardised monthly return
that, on average, higher returns can be expected after a positive three-month return than
after a negative three-month period. Therefore, observation of the current trend may have a
predictive value for the indices under consideration. Moreover, we consider the distribution
of the one-month returns, based on past three-month returns. Figure 22 illustrates the case
of the GSCI index. In the rst quadrant, the one-month returns are divided into two sets,
depending on whether the previous three-month return is positive or negative. The cumulative distributions of these two sets are shown. In the second quadrant, we consider, on
the one hand, the distribution of one-month returns following a three-month return below
5% and, on the other hand, the distribution of returns following a three-month return
exceeding +5%. The same procedure is repeated in the other quadrants, for a 10% and a
15% threshold. This simple test illustrates the usefulness of trend following strategies. Here,
trends seem persistent enough to study such strategies. Of course, on other time scales or
for other assets, one may obtain opposite results that would support contrarian strategies.
Table 3: Average one-month conditional return based on past trends
Trend
Eurostoxx 50
S&P 500
MSCI WORLD
MSCI EM
TOPIX
EUR/USD
USD/JPY
GSCI
Positive
1.1%
0.9%
0.6%
1.9%
0.4%
0.2%
0.2%
1.3%
Negative
0.2%
0.5%
0.3%
0.3%
0.4%
0.2%
0.2%
0.4%
Dierence
0.9%
0.4%
1.0%
2.2%
0.9%
0.4%
0.4%
1.6%
39
13/12/11 16:09
Conclusion
The ultimate goal of trend ltering in nance is to design portfolio strategies that may
benet from these trends. But the path between trend measurement and portfolio allocation
is not straightforward. It involves studies and explanations that would not t in this paper.
Nevertheless, let us point out some major issues. Of course, the rst problem is the selection
of the trend ltering method. This selection may lead to a single procedure or to a pool of
methods. The selection of several methods raises the question of an aggregation procedure.
This can be done through averaging or dynamic model selection, for instance. The resulting
trend indicator is meant to forecast future asset returns at a given horizon.
Intuitively, an investor should buy assets with positive return forecasts and sell assets
with negative forecasts. But the size of each long or short position is a quantitative problem
that requires a clear investment process. This process should take into account the risk
entailed by each position, compared with the expected return. Traditionally, individual
risks can be calculated in relation to asset volatility. A correlation matrix can aggregate
those individual risks into a global portfolio risk. But in the case of a multi-asset trend
following strategy, should we consider the correlation of assets or the correlation of each
individual strategy? These may be quite dierent, as the correlations between strategies
are usually smaller than the correlations between assets in absolute terms. Even when the
portfolio risks can be calculated, the distribution of those risks between assets or strategies
remains an open problem. Clearly, this distribution should take into account the individual
risks, their correlations and the expected return of each asset. But there are many competing
allocation procedures, such as Markowitz portfolio theory or risk budgeting methods.
In addition, the total amount of risk in the portfolio must be decided. The average target
volatility of the portfolio is closely related to the risk aversion of the nal investor. But this
total amount of risk may not be constant over time, as some periods could bring higher
expected returns than others. For example, some funds do not change the average size of
their positions during period of high market volatility. This increases their risks, but they
consider that their return opportunities, even when risk-adjusted, are greater during those
periods. On the contrary, some investors reduce their exposure to markets during volatility
peaks, in order to limit their potential drawdowns. Anyway, any consistent investment
process should measure and control the global risk of the portfolio.
These are just a few questions relating to trend following strategies. Many more arise in
practical cases, such as execution policies and transaction cost management. Each of these
issues must be studied in depth, and re-examined on a regular basis. This is the essence of
quantitative management processes.
40
13/12/11 16:09
Issue # 8
A
A.1
Statistical complements
State space model and Kalman ltering
A state space model is dened by a transition equation and a measurement equation. In

the measurement equation, we postulate the relationship between an observable vector and
a state vector, while the transition equation describes the generating process of the state
variables. The state vector t is generated by a rst-order Markov process of the form:
t = Tt t1 + ct + Rt t
where t is the vector of the m state variables, Tt is a m m matrix, ct is a m 1 vector
and Rt is a m p matrix. The measurement equation of the state-space representation is:
yt = Zt t + dt + t
where yt is a n-dimension time series, Zt is a n m matrix, dt is a n 1 vector. t and t
are assumed to be white noise processes of dimensions p and n respectively. These two last
uncorrelated processes are Gaussian with zero mean and respective covariance matrices Qt
and Ht . 0 N (a0 , P0 ) describes the initial position of the state vector. We dene at and
a t|t1 as the optimal estimators of t based on all the information available respectively at
time t and t 1. Let Pt and P t|t1 be the associated covariance matrices34 . The Kalman
lter consists of the following set of recursive equations (Harvey, 1990):
a t|t1 = Tt at1 + ct

t|t1 = Tt Pt1 Tt + Rt Qt Rt
y t|t1 = Zt a t|t1 + dt
vt = yt y t|t1
Ft = Zt P t|t1 Zt + Ht
at = a t|t1 + P t|t1 Zt Ft1 vt
Pt = Im P t|t1 Zt Ft1 Zt P t|t1

where vt is the innovation process with covariance matrix Ft and y t|t1 = Et1 [yt ]. Harvey
(1989) shows that we can obtain a t+1|t directly from a t|t1 :
a t+1|t = (Tt+1 Kt Zt ) a t|t1 + Kt yt + (ct+1 Kt dt )
where Kt = Tt+1 P t|t1 Zt Ft1 is the matrix of gain. We also have:

a t+1|t = Tt+1 a t|t1 + ct+1 + Kt yt Zt a t|t1 dt
Finally, we obtain:
yt
a t+1|t
= Zt a t|t1 + dt + vt
= Tt+1 a t|t1 + ct+1 + Kt vt
This system is called the innovation representation.

Let t be a xed given date. We dene a t|t = Et [t ] and P t|t = Et a t|t t a t|t t
with t t . We have a t |t = at and P t |t = Pt . The Kalman smoother is then dened
by the following set of recursive equations:
Pt
a t|t
P t|t
1

= Pt Tt+1
P t+1|t

= at + Pt a t+1|t a t+1|t
= Pt + Pt P t+1|t P t+1|t Pt
h
i
= Et [t ], a t|t1 = Et1 [t ], Pt = Et (at t ) (at t ) and P t|t1
h`
`
i
where Et indicates the conditional expectation operator.
Et1 a t|t1 t a t|t1 t
34 We
have at
41
13/12/11 16:09
A.2
A.2.1
L1 ltering
The dual problem
The L1 ltering problem can be solved by considering the dual problem which is a QP
programme. We rst rewrite the primal problem with a new variable z = D
x:
min
u.c.
1
2
y x
2 + z 1
2
z = D
x
We now construct the Lagrangian function with the dual variable Rn2 :
L (
x, z, v) =
1
2
y x
2 + z 1 + (D
x z)
2
The dual objective function is obtained in the following way:

1
x, z, ) = DD + y D
inf x,z L (
2
for 1 1. According to the Kuhn-Tucker theorem, the initial problem is equivalent
to the dual problem:
min
u.c.
1
DD y D
2
1 1
This QP programme can be solved by a traditional Newton algorithm or by interior-point

methods, and nally, the solution of the trend is:
x
= y D
A.2.2
Solving using interior-point algorithms
We briey present the interior-point algorithm of Boyd and Vandenberghe (2009) in the case
of the following optimisation problem:
min f0 ()

A = b
u.c.
fi () < 0
for i = 1, . . . , m
where f0 , . . . , fm : Rn R are convex and twice continuously dierentiable and rank (A) =
p < n. The inequality constraints will become implicit if the problem is rewritten as:
min f0 () +
m
I (fi ())
i=1
u.c.
A = b
where I (u) : R R is the non-positive indicator function35 . This indicator function is

discontinuous, so the Newton method can not be applied. In order to overcome this prob
(u) = 1 ln (u)
lem, we approximate I (u) using the logarithmic barrier function I
35 We
have:
I (u) =
u0
u>0
42
13/12/11 16:09
Issue # 8
with . Finally the Kuhn-Tucker condition for this approximation problem gives
rt (, , ) = 0 with:

f0 () + f () + A
r (, , ) = diag () f () 1 1
A b
The solution of r (, , ) = 0 can be obtained using Newtons iteration for the triple
= (, , ):
r ( + ) r () + r () = 0
This equation gives the Newton step = r ()
direction.
A.2.3
r (), which denes the search
The multivariate case
In the multivariate case, the primal problem is:

min
m
'2
1 '
'
' (j)
' + z 1
'y x
2 j=1
2
u.c.
z = D
x
The dual objective function becomes:

m

1
1 (j)
y y
y (j) y
x, z, ) = DD + y D +
inf x,z L (
2
2 j=1
for 1 1. According to the Kuhn-Tucker theorem, the initial problem is equivalent

to the dual problem:
min
u.c.
1
DD y D
2
1 1
The solution is then x

= y D .
A.2.4
The scaling of the smoothing parameter
We can attempt to estimate the order of magnitude of the parameter max by considering
the continuous case. We assume that the signal is a process Wt . The value of max in the
discrete case is dened by:
'
'
1
'
'
Dy '
max = ' DD
(T
can be considered as the rst primitive I1 (T ) = 0 Wt dt of the process Wt if D = D1
(T (t
(L1 C ltering) or the second primitive I2 (T ) = 0 0 Ws ds dt of Wt if D = D2 (L1 T
ltering). We have:
T
Wt dt
I1 (T ) =
0
= WT T

=
0
t dWt
(T t) dWt
43
13/12/11 16:09
The process I1 (T ) is a Wiener integral (or a Gaussian process) with variance:

T
# 2
$
T3
2
E I1 (T ) =
(T t) dt =
3
0
In this case, we expect that max T 3/2 . The second order primitive can be calculated in
the following way:
T
I1 (t) dt
I2 (T ) =
0
= I1 (T ) T

= I1 (T ) T
t dI1 (T )
T
0
2
tWt dt
T 2
t
T
WT +
dWt
2
2
0

T
T2
t2
dWt
= WT +
T2 Tt +
2
2
0

1 T
2
=
(T t) dWT
2 0
= I1 (T ) T
This quantity is again a Gaussian process with variance:

1 T
T5
4
E[I22 (T )] =
(T t) dt =
4 0
20
In this case, we expect that max T 5/2 .
A.3
Wavelet analysis
The time analysis can detect anomalies in time series, such as a market crash on a specic
date. The frequency analysis detects repeated sequences in a signal. The double dimension
analysis makes it possible to coordinate time and frequency detection, as we use a larger
time window than a smaller frequency interval (see Figure 23). In this area, the uncertainty
of localisation is 1/dt, with dt the sampling step and f = 1/dt the sampling frequency. The
wavelet transform can be a solution to analysing time series in terms of the time-frequency
dimension.
The rst wavelet approach appeared in the early eighties in seismic data analysis. The
term wavelet was introduced in the scientic community by Grossmann and Morlet (1984).
Since 1986, a great deal of theoretical research, including wavelets, has been developed.
The wavelet transform uses a basic function, called the mother wavelet, then dilates and
translates it to capture features that are local in time and frequency. The distribution of the
time-frequency domain with respect to the wavelet transform is long in time when capturing
low frequency events and long in frequency when capturing high frequency events. As an
example, we represent some mother wavelets in Figure 24.
The aim of wavelet analysis is to separate signal trends and details. These dierent
components can be distinguished by dierent levels of resolution or dierent sizes/scales
of detail. In this sense, it generates a phase space decomposition which is dened by two
44
13/12/11 16:09
Issue # 8
Figure 23: Time-frequency dimension
Figure 24: Some mother wavelets
45
13/12/11 16:09
parameters (scale and location) in opposition to a Fourier decomposition. A wavelet (t)

is a function of time t such that:
+
(t) dt = 0

| (t)| dt
The continuous wavelet transform is a function of two variables W (u, s) and is given by
projecting the time series x (t) onto a particular wavelet by:
+
W (u, s) =
x (t) u,s (t) dt

tu
1
u,s (t) =
s
s
which corresponds to the mother wavelet translated by u (location parameter) and dilated
by s (scale parameter). If the wavelet satises the previous properties, the inverse operation
may be performed to produce the original signal from its wavelet coecients:
+ +
W (u, s) (u, s) du ds
x (t) =
with:
The continuous wavelet transform of a time series signal x (t) gives an innite number
of coecients W (u, s) where u R and s R+ , but many coecients are close or equal to
zero. The discrete wavelet transform can be used to decompose a signal into a nite number
of coecients where we use s = 2j as the scale parameter and u = k2j as the location
parameter with j Z and k Z. Therefore u,s (t) becomes:

j
j,k (t) = 2 2 2j t k
where j = 1, 2, ..., J in a J-level decomposition. The wavelet representation of a discrete
signal x (t) is given by:
x (t) = s(0) (t) +
j1
J1
2
d(j),k j,k (t)
j=0 k=0
where (t) = 1 if t [0, 1] and J is the number of multi-resolution levels. Therefore,

computing the wavelet transform of the discrete signal is equivalent to compute the smooth
coecient s(0) and the detail coecients d(j),k .
Introduced by Mallat (1989), the multi-scale analysis corresponds to the following iterative scheme:

ssss
sss
ss

d

sd

ssd

sssd
46
13/12/11 16:09
Issue # 8
where the high-pass lter denes the details of the data and the low-pass lter denes the
smoothing signal. In this example, we obtain these wavelet coecients:
ssss
sssd
W =
ssd
sd
d
Applying this pyramidal algorithm to the time series signal up to the J resolution level gives
us the wavelet coecients:
s(0)
d(0)
d(1)
.
W =
.
d(J1)
A.4
Support vector machine
The support vector machine is an important part of statistical learning theory (Hastie et al.,
2009). It was rst introduced by Boser et al. (1992) and has been used in various domains
such as pattern recognition, biometrics, etc. This technique can be employed in dierent
contexts such as classication, regression or density estimation (see Vapnik, 1998). Recently,
applications in nance have been developed in two main directions. The rst employs the
SVM as a nonlinear estimator in order to forecast the trend or volatility of nancial assets.
In this context, the SVM is used as a regression technique with the possibility for extension
to nonlinear cases thank to the kernel approach. The second direction consists of using
the SVM as a classication technique which aims to dene the stock selection in trading
strategies.
A.4.1
SVM in a nutshell
We illustrate here the basic idea of the SVM as a classication method. Let us dene the
training data set consisting of n pairs of input/output points (xi , yi ) where xi X and
yi {1, 1}. The idea of linear classication is to look for a possible hyperplane that
can separate {xi X } into two classes corresponding to the labels yi = 1. It consists of
constructing a linear discriminant function h (x) = w x + b where w is the vector of weights
and b is called the bias. The hyperplane is then dened by the following equation:
H = {x : h (x) = w x + b = 0}
The vector w is interpreted as the normal vector to the hyperplane. We denote its norm
w and its direction w
= w/ w . In Figure 25, we give a geometric interpretation of the
margin in the linear case. Let x+ and x be the closest points to the hyperplane from the
positive side and negative side. These points determine the margin to the boundary from
which the two classes of points D are separated:
mD (h) =
1
1
w
(x+ x ) =
2
w
47
13/12/11 16:09
Figure 25: Geometric interpretation of the margin in a linear SVM
The main idea of a maximum margin classier is to determine the hyperplane that maximises
the margin. For a separable dataset, the margin SVM is dened by the following optimisation
problem:
min
w,b
u.c.
1
2
w
2

yi w xi + b > 1 for i = 1, . . . , n
The historical approach to solving this quadratic problem with nonlinear constraints is to
map the primal problem to the dual problem:
max
u.c.
n
i=1
1
i j yi yj x
i xj
2 i=1 j=1
i 0
for i = 1, . . . , n
Because of the Kuhn-Tucker

conditions, the optimised solution (w , b ) of the primal problem
n

is given by w = i=1 i yi xi where = (1 , . . . , n ) is the solution of the dual problem.
We notice that linear SVM depends on input data via the inner product. An intelligent
way to extend SVM formalism to the nonlinear case is then to replace the inner product
with a nonlinear kernel. Hence, the nonlinear SVM dual problem can be obtained by systematically replacing the inner product x
i xj by a general kernel K (xi , xj ). Some standard
kernels are widely used in pattern recognition, for example polynomial, radial basis or neural
48
13/12/11 16:09
Issue # 8
network kernels36 . Finally, the decision/prediction function is then given by:

n

i yi K (x, xi ) + b
f (x) = sgn h (x) = sgn
i=1
A.4.2
SVM regression
In the last discussion, we presented the basic idea of the SVM in the classication context.
We now show how the regression problem can be interpreted as a SVM problem. In the
general framework of statistical learning, the SVM problem consists of minimising the risk
function R (f ) depending on the form of the prediction function f (x). The risk function is
calculated via the loss function L (f (x) , y) which clearly denes our objective (classication
or regression):

R (f ) =
L (f (x) , y) dP (x, y)
where the distribution P (x, y) can be computed by empirical distribution37 or an approximated distribution38 . For the regression problem, the loss function is simply dened as
2
p
L (f (x) , y) = (f (x) y) or L (f (x) , y) = |f (x) y| in the case of Lp norm.
We have seen that the linear SVM is a special case of nonlinear SVM within the kernel
approach. We therefore consider the nonlinear case directly where the approximate function
of the regression has the following form f (x) = w (x) + b. In the VRM framework, we
assume that P (x, y) is a Gaussian noise with variance 2 :
n
R (f ) =
1
p
2
|f (xi ) yi | + 2 w
n i=1
We introduce the variable = (1 , . . . , n ) which satises yi = f (xi ) + i . The optimisation problem of the risk function can now be written as a QP programme with nonlinear
constraints:
min
n
1

1
2
p
w + 2n 2
|i |
2
i=1
u.c.
yi = w (xi ) + b + i
w,b,
for i = 1, . . . , n
In the present form, the regression looks very similar to the SVM classication problem and
can be solved in the same way by mapping to the dual problem. We notice that the SVM
regression can be easily generalised in two possible ways:
1. by introducing a more general loss function such as the -SV regression proposed by
Vapnik (1998);
2. by using a weighting distribution for the empirical distribution:
dP (x, y) =
n
i xi (x) yi (y)
i=1
`
p
2 `
2
have, respectively, K (xi , xj ) = x
or
i xj + 1 , K (xi , xj ) = exp xi xj / 2
`
K (xi , xj ) = tanh ax

x
b
.
i j
37 This framework called ERM was rst introduced by Vapnik and Chervonenskis (1991).
38 This framework is called VRM (Chapelle, 2002).
36 We
49
13/12/11 16:09
As nancial series have short memory and depend more on the recent past, an asymmetric weight distribution focusing on recent data would improve the prediction39 .
The dual problem in the case p = 1 is given by:
1
max y K
2

1=0
1

u.c.
1
|| 2n 2
As previously, the
optimal vector is obtained by solving the QP programme. We then
n

deduce that w = i=1 i (xi ) and b is computed using the Kuhn-Tucker condition:
w (xi ) + b yi = 0
for support vectors (xi , yi ). In order to achieve a good level of accuracy for the estimation
of b, we average out the set of support vectors and obtain b . The SVM regressor is then
given by the following formula:
f (x) =
n
i K (x, xi ) + b
i=1
with K (x, xi ) = (x) (xi ).

In Figure 26, we apply SVM regression with the Gaussian kernel to the S&P 500 index.
The kernel parameter characterises the estimation horizon which is equivalent to period
n in the moving average regression.
A.5
Singular spectrum analysis
In recent years the singular spectrum analysis (SSA) technique has been developed as a
time-frequency domain method40 . It consists of decomposing a time series into a trend,
oscillatory components and a noise.
The method is based on the principal component analysis of the auto-covariance matrix
of the time series y = (y1 , . . . , yt ). Let n be the window length such that n = t m + 1 with
m < t/2. We dene the n m Hankel matrix H as the matrix of the m concatenated lag
vector of y:
y1
y2
y3
ym
y2
y3
y4
ym+1
..
y4
y5
.
H = y3
.
.
.
..
..
..
..
. yt1
yn yn+1 yn+2
yt
We recover the time series y by diagonal averaging:
yp =
39 See
m
1 (i,j)
H
p j=1
(10)
Gestel et al. (2001) and Tay and Cao 2002.

by Broomhead and King (1986).
40 Introduced
50
13/12/11 16:09
Issue # 8
Figure 26: SVM ltering
where i = p j + 1, 0 < i < n + 1 and:
if p < m
p
p = t p + 1 if p > t m + 1
m
otherwise
This relationship seems trivial because each H(i,j) is equal to yp with respect to the conditions for i and j. But this equality no longer holds if we apply factor analysis. Let C = H H
be the covariance matrix of H. By performing the eigenvalue decomposition C = V V , we
can deduce the corresponding principal components:
Pk = HVk
where Vk is the matrix of the rst k th eigenvectors of C.
as follows:
Let us now dene the n m matrix H
= Pk V
H
k
= H if all the components are selected. If k < m, we have removed the noise and
We have H
the trend x
is estimated by applying the diagonal averaging procedure (10) to the matrix
H.
We have applied the singular spectrum decomposition to the S&P 500 index with dierent
using
lags m. For each lag, we compute the Hankel matrix H, then deduce the matrix H
only the rst eigenvector (k = 1) and estimate the corresponding trend. Results are given
in Figure 27. As for other methods, such as nonlinear lters, the calibration depends on the
parameter m, which controls the window length.
51
13/12/11 16:09
Figure 27: SSA ltering
52
13/12/11 16:09
Issue # 8
References
[1] Alexandrov T., Bianconcini S., Dagum E.B., Maass P. and McElroy T. (2008),
A Review of Some Modern Approaches to the Problem of Trend Extraction , US Census
Bureau, RRS #2008/03.
[2] Antoniadis A., Gregoire G. and McKeague I.W. (1994), Wavelet Methods for
Curve Estimation, Journal of the American Statistical Association, 89(428), pp. 13401353.
[3] Barberis N. and Thaler T. (2002), A Survey of Behavioral Finance, NBER Working
Paper, 9222.
[4] Beveridge S. and Nelson C.R. (1981), A New Approach to the Decomposition of
Economic Time Series into Permanent and Transitory Components with Particular
Attention to Measurement of the Business Cycle, Journal of Monetary Economics,
7(2), pp. 151-174.
[5] Boser B.E., Guyon I.M. and Vapnik V. (1992), A Training Algorithm for Optimal
Margin Classier, Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 114-152.
[6] Boyd S. and Vandenberghe L. (2009), Convex Optimization, Cambridge University
Press.
[7] Brockwell P.J. and Davis R.A. (2003), Introduction to Time Series and Forecasting,
Springer.
[8] Broomhead D.S. and King G.P. (1986), On the Qualitative Analysis of Experimental
Dynamical Systems, in Sarkar S. (ed.), Nonlinear Phenomena and Chaos, Adam Hilger,
pp. 113-144.
[9] Brown S.J., Goetzmann W.N. and Kumar A. (1998), The Dow Theory: William
Peter Hamiltons Track Record Reconsidered, Journal of Finance, 53(4), pp. 1311-1333.
[10] Burch N., Fishback P.E. and Gordon R. (2005), The Least-Squares Property of the
Lanczos Derivative, Mathematics Magazine, 78(5), pp. 368-378.
[11] Carhart M.M. (1997), On Persistence in Mutual Fund Performance, Journal of Finance, 52(1), pp. 57-82.
[12] Chan L.K.C., Jegadeesh N. and Lakonishok J. (1996), Momentum Strategies, Journal of Finance, 51(5), pp. 1681-1713.
[13] Chang Y., Miller J.I. and Park J.Y. (2009), Extracting a Common Stochastic Trend:
Theory with Some Applications, Journal of Econometrics, 150(2), pp. 231-247.
[14] Chapelle O. (2002), Support Vector Machine: Induction Principles, Adaptive Tuning
and Prior Knowledge, PhD thesis, University of Paris 6.
[15] Cleveland W.P. and Tiao G.C. (1976), Decomposition of Seasonal Time Series: A
Model for the Census X-11 Program, Journal of the American Statistical Association,
71(355), pp. 581-587.
[16] Cleveland W.S. (1979), Robust Locally Regression and Smoothing Scatterplots, Journal of the American Statistical Association, 74(368), pp. 829-836.
53
13/12/11 16:09
[17] Cleveland W.S. and Devlin S.J. (1988), Locally Weighted Regression: An Approach
to Regression Analysis by Local Fitting, Journal of the American Statistical Association, 83(403), pp. 596-610.
[18] Cochrane J. (2001), Asset Pricing, Princeton University Press.
[19] Cortes C. and Vapnik V. (1995), Support-Vector Networks, Machine Learning, 20(3),
pp. 273-297.
[20] DAspremont A. (2011), Identifying Small Mean Reverting Portfolios, Quantitative
Finance, 11(3), pp. 351-364.
[21] Daubechies I. (1992), Ten Lectures on Wavelets, SIAM.
[22] Daubechies I., Defrise M. and De Mol C. (2004), An Iterative Thresholding Algorithm for Linear Inverse Problems with a Sparsity Constraint, Communications on
Pure and Applied Mathematics, 57(11), pp. 1413-1457.
[23] Donoho D.L. (1995), De-Noising by Soft-Thresholding, IEEE Transactions on Information Theory, 41(3), pp. 613-627.
[24] Donoho D.L. and Johnstone I.M. (1994), Ideal Spatial Adaptation via Wavelet
Shrinkage, Biometrika, 81(3), pp. 425-455.
[25] Donoho D.L. and Johnstone I.M. (1995), Adapting to Unknown Smoothness via
Wavelet Shrinkage, Journal of the American Statistical Association, 90(432), pp. 12001224.
[26] Doucet A., De Freitas N. and Gordon N. (2001), Sequential Monte Carlo in Practice, Springer.
[27] Ehlers J.F. (2001), Rocket Science for Traders: Digital Signal Processing Applications,
John Wiley & Sons.
[28] Elton E.J. and Gruber M.J. (1972), Earnings Estimates and the Accuracy of Expectational Data, Management Science, 18(8), pp. 409-424.
[29] Engle R.F. and Granger C.W.J. (1987), Co-Integration and Error Correction: Representation, Estimation, and Testing, Econometrica, 55(2), pp. 251-276.
[30] Fama E. (1970), Ecient Capital Markets: A Review of Theory and Empirical Work,
Journal of Finance, 25(2), pp. 383-417.
[31] Flandrin P., Rilling G. and Goncalves P. (2004), Empirical Mode Decomposition
as a Filter Bank, Signal Processing Letters, 11(2), pp. 112-114.
[32] Fliess M. and Join C. (2009), A Mathematical Proof of the Existence of Trends in
Financial Time Series, in El Jai A., A L. and Zerrik E. (eds), Systems Theory:
Modeling, Analysis and Control, Presses Universitaires de Perpignan, pp. 43-62.
[33] Fuentes M. (2002), Spectral Methods for Nonstationary Spatial Processes, Biometrika,
89(1), pp. 197-210.
[34] Genay R., Seluk F. and Whitcher B. (2002), An Introduction to Wavelets and
Other Filtering Methods in Finance and Economics, Academic Press.
54
13/12/11 16:09
Issue # 8
[35] Gestel T.V., Suykens J.A.K., Baestaens D., Lambrechts A., Lanckriet G.,
Vandaele B., De Moor B. and Vandewalle J. (2001), Financial Time Series Prediction Using Least Squares Support Vector Machines Within the Evidence Framework,
IEEE Transactions on Neural Networks, 12(4), pp. 809-821.
[36] Golyandina N., Nekrutkin V.V. and Zhigljavsky A.A. (2001), Analysis of Time
Series Structure: SSA and Related Techniques, Chapman & Hall, CRC.
[37] Gonzalo J. and Granger C.W.J. (1995), Estimation of Common Long-Memory Components in Cointegrated Systems, Journal of Business & Economic Statistics, 13(1), pp.
27-35.
[38] Grinblatt M., Titman S. and Wermers R. (1995), Momentum Investment Strategies, Portfolio Performance, and Herding: A Study of Mutual Fund Behavior, American
Economic Review, 85(5), pp. 1088-1105.
[39] Groetsch C.W. (1998), Lanczos Generalized Derivative, American Mathematical
Monthly, 105(4), pp. 320-326.
[40] Grossmann A. and Morlet J. (1984), Decomposition of Hardy Functions into Square
Integrable Wavelets of Constant Shape, SIAM Journal of Mathematical Analysis, 15,
pp. 723-736.
[41] Hrdle W. (1992), Applied Nonparametric Regression, Cambridge University Press.
[42] Harvey A.C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press.
[43] Harvey A.C. and Trimbur T.M. (2003), General Model-Based Filters for Extracting
Cycles and Trends in Economic Time Series, Review of Economics and Statistics, 85(2),
pp. 244-255.
[44] Hastie T., Tibshirani R. and Friedman R. (2009), The Elements of Statistical Learning, second edition, Springer.
[45] Henderson R. (1916), Note on Graduation by Adjusted Average, Transactions of the
Actuarial Society of America, 17, pp. 43-48.
[46] Hodrick R.J. and Prescott E.C. (1997), Postwar U.S. Business Cycles: An Empirical
Investigation, Journal of Money, Credit and Banking, 29(1), pp. 1-16.
[47] Holt C.C. (1959), Forecasting Seasonals and Trends by Exponentially Weighted Moving Averages, ONR Research Memorandum, 52, reprinted in International Journal of
Forecasting, 2004, 20(1), pp. 5-10.
[48] Hong H. and Stein J.C. (1977), A Unied Theory of Underreaction, Momentum Trading and Overreaction in Asset Markets, NBER Working Paper, 6324.
[49] Johansen S. (1988), Statistical Analysis of Cointegration Vectors, Journal of Economic
Dynamics and Control, 12(2-3), pp. 231-254.
[50] Johansen S. (1991), Estimation and Hypothesis Testing of Cointegration Vectors in
Gaussian Vector Autoregressive Models, Econometrica, 52(6), pp. 1551-1580.
[51] Kalaba R. and Tesfatsion L. (1989), Time-varying Linear Regression via Flexible
Least Squares, Computers & Mathematics with Applications, 17, pp. 1215-1245.
55
13/12/11 16:09
[52] Kalman R.E. (1960), A New Approach to Linear Filtering and Prediction Problems,
Transactions of the ASME Journal of Basic Engineering, 82(D), pp. 35-45.
[53] Kendall M.G. (1973), Time Series, Charles Grin.
[54] Kim S-J., Koh K., Boyd S. and Gorinevsky D. (2009), 1 Trend Filtering, SIAM
Review, 51(2), pp. 339-360.
[55] Kolmogorov A.N. (1941), Interpolation and Extrapolation of Random Sequences,
Izvestiya Akademii Nauk SSSR, Seriya Matematicheskaya, 5(1), pp. 3-14.
[56] Macaulay F. (1931), The Smoothing of Time Series, National Bureau of Economic
Research.
[57] Mallat S.G. (1989), A Theory for Multiresolution Signal Decomposition: The Wavelet
Representation, IEEE Transactions on Pattern Analysis and Machine Intelligence,
11(7), pp. 674-693.
[58] Mann H.B. (1945), Nonparametric Tests against Trend, Econometrica, 13(3), pp. 245259.
[59] Martin W. and Flandrin P. (1985), Wigner-Ville Spectral Analysis of Nonstationary
Processes, IEEE Transactions on Acoustics, Speech and Signal Processing, 33(6), pp.
1461-1470.
[60] Muth J.F. (1960), Optimal Properties of Exponentially Weighted Forecasts, Journal
of the American Statistical Association, 55(290), pp. 299-306.
[61] Oppenheim A.V. and Schafer R.W. (2009), Discrete-Time Signal Processing, third
edition, Prentice-Hall.
[62] Pea D. and Box, G.E.P. (1987), Identifying a Simplifying Structure in Time Series,
Journal of the American Statistical Association, 82(399), pp. 836-843.
[63] Pollock, D.S.G. (2006), Wiener-Kolmogorov Filtering Frequency-Selective Filtering
and Polynomial Regression, Econometric Theory, 23, pp. 71-83.
[64] Pollock D.S.G. (2009), Statistical Signal Extraction: A Partial Survey, in Kontoghiorges E. and Belsley D.E. (eds.), Handbook of Empirical Econometrics, John Wiley
and Sons.
[65] Rao S.T. and Zurbenko I.G. (1994), Detecting and Tracking Changes in Ozone air
Quality, Journal of Air and Waste Management Association, 44(9), pp. 1089-1092.
[66] Roncalli T. (2010), La Gestion dActifs Quantitative, Economica.
[67] Savitzky A. and Golay M.J.E. (1964), Smoothing and Dierentiation of Data by
Simplied Least Squares Procedures, Analytical Chemistry, 36(8), pp. 1627-1639.
[68] Silverman B.W. (1985), Some Aspects of the Spline Smoothing Approach to NonParametric Regression Curve Fitting, Journal of the Royal Statistical Society, B47(1),
pp. 1-52.
[69] Sorenson H.W. (1970), Least-Squares Estimation: From Gauss to Kalman, IEEE
Spectrum, 7, pp. 63-68.
56
13/12/11 16:09
Issue # 8
[70] Stock J.H. and Watson M.W. (1988), Variable Trends in Economic Time Series,
Journal of Economic Perspectives, 2(3), pp. 147-174.
[71] Tay F.E.H. and Cao L.J. (2002), Modied Support Vector Machines in Financial Times
Series Forecasting, Neurocomputing, 48(1-4), pp. 847-861.
[72] Tibshirani R. (1996), Regression Shrinkage and Selection via the Lasso, Journal of
the Royal Statistical Society, B58(1), pp. 267-288.
[73] Vapnik V. (1998), Statistical Learning Theory, John Wiley and Sons, New York.
[74] Vapnik V. and Chervonenskis A. (1991), On the Uniform Convergence of Relative
Frequency of Events to their Probabilities, Theory of Probability and its Applications,
16(2), pp. 264-280.
[75] Vautard R., Yiou P., and Ghil M. (1992), Singular Spectrum Analysis: A Toolkit
for Short, Noisy Chaotic Signals, Physica D, 58(1-4), pp. 95-126.
[76] Wahba G. (1990), Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, 59, SIAM.
[77] Wang Y. (1998), Change Curve Estimation via Wavelets, Journal of the American
Statistical Association, 93(441), pp. 163-172.
[78] Wiener N. (1949), Extrapolation, Interpolation and Smoothing of Stationary Time
Series with Engineering Applications, MIT Technology Press and John Wiley & Sons
(originally published in 1941 as a Report on the Services Research Project, DIC-6037).
[79] Whittaker E.T. (1923), On a New Method of Graduation, Proceedings of the Edinburgh Mathematical Society, 41, pp. 63-75.
[80] Winters P.R. (1960), Forecasting Sales by Exponentially Weighted Moving Averages,
Management Science, 6(3), 324-342.
[81] Yue S. and Pilon P. (2004), A Comparison of the Power of the t-test, Mann-Kendall
and Bootstrap Tests for Trend Detection, Hydrological Sciences Journal, 49(1), 21-37.
[82] Zurbenko I., Porter P.S., Rao S.T., Ku J.K., Gui R. and Eskridge R.E. (1996),
Detecting Discontinuities in Time Series of Upper-Air Data: Demonstration of an Adaptive Filter Technique, Journal of Climate, 9(12), pp. 3548-3560.
57
13/12/11 16:09
58
13/12/11 16:09
Issue # 8
Lyxor White Paper Series

List of Issues
Issue #1 Risk-Based Indexation.
Paul Demey, Sbastien Maillard and Thierry Roncalli, March 2010.
Issue #2 Beyond Liability-Driven Investment: New Perspectives on
Dened Benet Pension Fund Management.
Benjamin Bruder, Guillaume Jamet and Guillaume Lasserre, March 2010.
Issue #3 Mutual Fund Ratings and Performance Persistence.
Pierre Hereil, Philippe Mitaine, Nicolas Moussavi and Thierry Roncalli, June 2010.
Issue #4 Time Varying Risk Premiums & Business Cycles: A Survey.
Serge Darolles, Karl Eychenne and Stphane Martinetti, September 2010.
Issue #5 Portfolio Allocation of Hedge Funds.
Benjamin Bruder, Serge Darolles, Abdul Koudiraty and Thierry Roncalli, January
2011.
Issue #6 Strategic Asset Allocation.
Karl Eychenne, Stphane Martinetti and Thierry Roncalli, March 2011.
Issue #7 Risk-Return Analysis of Dynamic Investment Strategies.
Benjamin Bruder and Nicolas Gaussel, June 2011.
59
13/12/11 16:09
60
13/12/11 16:09
Issue # 8
Disclaimer
Each of this material and its content is condential and may not be reproduced or provided
to others without the express written permission of Lyxor Asset Management (Lyxor AM).
This material has been prepared solely for informational purposes only and it is not intended
to be and should not be considered as an oer, or a solicitation of an oer, or an invitation
or a personal recommendation to buy or sell participating shares in any Lyxor Fund, or
any security or nancial instrument, or to participate in any investment strategy, directly
or indirectly.
It is intended for use only by those recipients to whom it is made directly available by Lyxor
AM. Lyxor AM will not treat recipients of this material as its clients by virtue of their
receiving this material.
This material reects the views and opinions of the individual authors at this date and in
no way the ocial position or advices of any kind of these authors or of Lyxor AM and thus
does not engage the responsibility of Lyxor AM nor of any of its ocers or employees. All
performance information set forth herein is based on historical data and, in some cases, hypothetical data, and may reect certain assumptions with respect to fees, expenses, taxes,
capital charges, allocations and other factors that aect the computation of the returns.
Past performance is not necessarily a guide to future performance. While the information
(including any historical or hypothetical returns) in this material has been obtained from
external sources deemed reliable, neither Socit Gnrale (SG), Lyxor AM, nor their afliates, ocers employees guarantee its accuracy, timeliness or completeness. Any opinions
expressed herein are statements of our judgment on this date and are subject to change without notice. SG, Lyxor AM and their aliates assume no duciary responsibility or liability
for any consequences, nancial or otherwise, arising from, an investment in any security or
nancial instrument described herein or in any other security, or from the implementation
of any investment strategy.
Lyxor AM and its aliates may from time to time deal in, prot from the trading of, hold,
have positions in, or act as market makers, advisers, brokers or otherwise in relation to the
securities and nancial instruments described herein.
Service marks appearing herein are the exclusive property of SG and its aliates, as the
case may be.
This material is communicated by Lyxor Asset Management, which is authorized and regulated in France by the Autorit des Marchs Financiers (French Financial Markets Authority).
c
2011
LYXOR ASSET MANAGEMENT ALL RIGHTS RESERVED
61
13/12/11 16:09
The Lyxor White Paper Series is a quarterly publication providing our

clients access to intellectual capital, risk analytics and quantitative
research developed within Lyxor Asset Management. The Series
covers in depth studies of investment strategies, asset allocation
methodologies and risk management techniques. Wehope you will
find the Lyxor White Paper Series stimulating and interesting.
PUBLISHING DIRECTORS
Alain Dubois, Chairman of the Board
Laurent Seyer, Chief Executive Officer
EDITORIAL BOARD
Nicolas Gaussel, PhD, Managing Editor.
Thierry Roncalli, PhD, Associate Editor
Lyxor Asset Management

Tour Socit Gnrale 17 cours Valmy
92987 Paris La Dfense Cedex France
[email protected] www.lyxor.com
Rf. 712100 Studio Socit Gnrale +33 (0)1 42 14 27 05 12/2011
Benjamin Bruder, PhD, Associate Editor
13/12/11 16:09

Trend Filtering Methods PDF

Uploaded by

Copyright:

Available Formats

Trend Filtering Methods PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Trend Filtering Methods PDF

Uploaded by

Copyright:

Available Formats

December

712100_215829_ white paper 8 lot 1.indd 1

712100_215829_ white paper 8 lot 1.indd 2

712100_215829_ white paper 8 lot 1.indd Sec1:1

712100_215829_ white paper 8 lot 1.indd Sec1:2

The principles of trend ltering

712100_215829_ white paper 8 lot 1.indd Sec1:3

Figure 1: Trend estimate of the S&P 500 index

Variations around a benchmark estimator

From trend ltering to forecasting

712100_215829_ white paper 8 lot 1.indd Sec1:4

Figure 2: L1 versus moving average ltering

Figure 3: Distribution of the conditional standardised monthly return

712100_215829_ white paper 8 lot 1.indd Sec1:5

712100_215829_ white paper 8 lot 1.indd Sec1:6

3 Trend ltering in practice

712100_215829_ white paper 8 lot 1.indd Sec1:7

712100_215829_ white paper 8 lot 1.indd Sec1:8

Trend Filtering Methods

712100_215829_ white paper 8 lot 1.indd Sec1:9

A review of econometric estimators for trend ltering

The trend-cycle model

Alexandrov et al. (2008).

712100_215829_ white paper 8 lot 1.indd Sec1:10

be a noise or a stochastic cycle. Let yt be a stochastic process. We assume that yt is the

We denote by y = {. . . , y2 , y1 , y0 , y1 , y2 , . . .} the ordered sequence of observations of the

712100_215829_ white paper 8 lot 1.indd Sec1:11

unobservable process. A ltering procedure consists of applying a lter L to the data y:

We nish this description by considering the lag representation:

with the lag operator L satisfying Lyt = yt1 .

Measuring the trend and its derivative

712100_215829_ white paper 8 lot 1.indd Sec1:12

and the estimator of t is6 :

We obtain the following correspondence:

t are related by the following expression:

Econometric methods principally involve x

Remark 3 In the previous analysis, x

Moving average lters

712100_215829_ white paper 8 lot 1.indd Sec1:13

If the trend is homogeneous, this average value is located at t (n 1) /2 by construction.

is equal to 1 if i = j and 0 otherwise.

712100_215829_ white paper 8 lot 1.indd Sec1:14

712100_215829_ white paper 8 lot 1.indd Sec1:15

The ltering equation of t then becomes:

In the discrete case, we have:

We deduce that the kernel is:

By computing an integration by parts, we obtain the trend lter:

In Figure 2, we have represented the dierent functions Li given in this paragraph. We

712100_215829_ white paper 8 lot 1.indd Sec1:16

Figure 2: Window function Li of moving average lters (n = 100)

Figure 3: Trend estimate for the S&P 500 index

712100_215829_ white paper 8 lot 1.indd Sec1:17

Table 1: Correlation between the uniform and Lanczos derivatives

Figure 4: Comparison of the derivative of the trend

Least squares lters

Estimating the ltered trend x

712100_215829_ white paper 8 lot 1.indd Sec1:18

t|t1 = Et1 [t ] and Pt|t1 = Et1

712100_215829_ white paper 8 lot 1.indd Sec1:19

at = a t|t1 + P t|t1 Zt Ft1 vt