Spectral
Spectral
1 Periodic processes
• Consider a periodic process of the form
xt = A cos(2πωt + φ) (1)
• Importantly, the quantity ω in the above definition is called frequency of the process; and the quan-
tity 1/ω is called the period or cycle. As t varies from 0 to 1/ω, note that the process goes through
one complete cycle (it ends up back where it started). See Figure 1
10
5
0
−5
−10
0 20 40 60 80 100
Time
Figure
√ 1: Two examples of cosine processes, the first (in red) having a frequency ω = 3/100
√ and amplitude
22 + 32 ≈ 3.6, and the second (in blue) having a frequency ω = 6/100 and amplitude 42 + 52 ≈ 6.4.
• The quantity A is called the amplitude and φ the phase of the process. The amplitude controls how
high the peaks are, and the phase determine where (along the cosine cycle) the process starts at the
origin t = 0
• We can introduce randomness into the process (1) by allowing A and φ to be random
1
• It will be useful to reparametrize. In general, recall the trigonometric identity (cosine compound
angle formula):
cos(a + b) = cos(a) cos(b) − sin(a) sin(b) (2)
Thus, starting with (1), we can rewrite this as xt = A cos(φ) cos(2πωt) − A sin(φ) sin(2πωt). Simply
letting U1 = A cos(φ), U2 = −A sin(φ), we can therefore write
with U1 , U2 our two random variables, determining the amplitude of the cosine and sine components
separately
• Note that another way of writing the relationship between A, φ and U1 , U2 is (why?):
q
A = U12 + U22 , φ = tan−1 (−U2 /U1 )
1.1 Stationarity
• If U1 , U2 are uncorrelated, each with mean zero and variance σ 2 , then the periodic process xt , t =
1, 2, 3, . . . defined in (3) is stationary
• To check this: simply compute the mean function
µt = E(xt ) = 0
γ(s, t) = Cov(xs , xt )
= Cov U1 cos(2πωs) + U2 sin(2πωs), U1 cos(2πωt) + U2 sin(2πωt)
= Cov U1 cos(2πωs), U1 cos(2πωt) + Cov U2 sin(2πωs), U1 cos(2πωt)
+ Cov U1 cos(2πωs), U2 sin(2πωt) + Cov U2 sin(2πωs), U2 sin(2πωt)
= σ 2 cos(2πωs) cos(2πωt) + 0 + 0 + σ 2 sin(2πωs) sin(2πωt)
= σ 2 cos(2πω(s − t))
which only depends on the lag s − t (where in the last line we used the identity (2) once again)
for Uj1 , Uj2 , j = 1, . . . , p all uncorrelated random variables with mean zero, where Uj1 , Uj2 have
variance σj2
• As a generalization of the above calculation, you’ll show on your homework that the process xt ,
t = 1, 2, 3, . . . defined in (4) is stationary, with auto-covariance function
p
X
γ(h) = σj2 cos(2πωj h)
j=1
2
• Figure 2 displays a couple of mixture processes of the form (4) (with p = 2 and p = 3). Note the
regular repeating nature of the mixture processes. One might wonder how we can decompose a such
a mixture into its frequency components (periodic processes, each of the form (3)). This is, in fact,
one of the main objectives in spectral analysis
10
5
0
−5
0 20 40 60 80 100
Time
15
5
−5
−15
0 20 40 60 80 100
Time
• And the answer, as we’ll see next, is given by something you’re already quite familiar with ... regres-
sion!
2 Fourier decomposition
• Given a time series xt , t = 1, . . . , n, consider seeking a decomposition like (4). We could do this by
regressing this time series onto cosine and sine features of different frequencies,
(which we call these “basis functions” in the context of this particular regression problem), and so the
regression model is
X p
xt ≈ a0 + (aj ctj + bj stj ), t = 1, . . . , n (5)
j=1
3
• That is, there are coefficients âj , b̂j , j = 1, . . . , p that will give us an equalities in (5), for all t
• (This assumes that n is odd; if n is even, then we need to add an additional component an/2 cos(πt) =
an/2 (−1)t , and the same claim holds: the representation is exact)
• To find the coefficients âj , b̂j , j = 1, . . . , p, we can simply perform regresion (least squares). We let
x ∈ Rn denote our time series represented as a vector, which serves as the response vector in our
regression problem, and we assemble our cosine and sine basis functions into a feature matrix
1
cos(2π n1 · 1) sin(2π n1 · 1) cos(2π n2 · 1) . . . sin(2π n−1
2n · 1)
√
2n
√1 1 1 2 n−1
2n cos(2π n · 2) sin(2π n · 2) cos(2π n · 2) . . . sin(2π 2n · 2)
∈ Rn×n
Z= ..
.
√1 cos(2π 1
· n) sin(2π 1
· n) cos(2π 2
· n) . . . sin(2π n−1
· n)
2n n n n 2n
(Z T Z)−1 Z T x
• However, something is very special about out matrix Z: it satisfies Z T Z = (n/2)I, where I the n × n
identity matrix. In other words, its columns are uncorrelated, and have squared `2 norm equal to n/2.
This is a very special property of the cosine and sine basis functions (and it is the foundation of the
discrete Fourier transform, to be discussed shortly)
• Thus, writing zj , j = 1, . . . , n as the columns of Z, we have
2 T
n z1 x
2 T
z2 x
T −1 T n
(Z Z) Z x = . ,
..
2 T
n zn x
so the multiple regression coefficients of x on Z are simply the marginal regression coefficients
• More explicitly, the coefficients are â0 = x̄, and
n
2 T 2X
âj = cj x = xt cos(2πj/n · t)
n n t=1
n
(6)
2 2X
b̂j = sT
jx= xt sin(2πj/n · t)
n n t=1
4
2.1 Periodogram
• Given a series xt , t = 1, . . . , n, we can define an object from the coefficients (6) in the decomposition
(7) that is called the periodogram, denoted Px . This takes values at frequencies j/n, for j = 1, . . . , (n −
1)/2, and is defined by
n 2
âj + b̂2j (8)
Px (j/n) =
4
When the underlying series is clear from the context, we will drop the subscript and simply write P
• Large values of the periodogram indicate which frequencies are predominant in the given series. This
is illustrated in Figure 3, which displays the periodogorams for the two series in Figure 2. Another
nice real data example (from a 1923 texbook on numerical analysis!) is given in Figure 4
800
Periodogram
400
0
Frequency
Periodogram
1500
0 500
Frequency
• If we think back to the mixture process (4) as a model for our data, then the periodogram gives us a
breakdown of which frequencies are the largest sources of variance: recall σj2 = E(Uj1
2 2
+ Uj2 )/2 is the
variance at frequency ωj
where i is the imaginary unit, which satisfies i2 = −1. Thus the DFT is complex-value. As before,
when the underlying series is clear from the context, we will drop the subscript and simply write d
5
30
Star magnitude
20
10
0
Day
10000
29 day cycle
Periodogram
24 day cycle
4000
0
Frequency
Figure 4: Periodogram for star magnitude data over 600 consecutive days (originally from “The Calculus
of Observations” by Whittaker and Robinson, adapted by SS). Note the large values of the periodogram at
0.35 and 0.41, which correspond to 1/0.35 ≈ 29 and 1/0.41 ≈ 24 day cycles. For more on the interpreta-
tion, see Example 4.3 of SS.
• Recalling Euler’s formula, eiθ = cos(θ) + i sin(θ), we also have (using the fact that cosine is an even
function and sine is odd):
n n
1 X i X
d(j/n) = √ xt cos(2πj/n · t) − √ xt sin(2πj/n · t), j = 0, 1, . . . , n − 1
n t=1 n t=1
• Thus, from the DFT, we can compute each cosine and sine coefficient in (6) by
2 2
âj = √ Re{d(j/n)} and b̂j = − √ Im{d(j/n)}
n n
where for a complex number z = a + bi, we use Re{z} = a and Im{z} = b to denote its real and
imaginary parts
• Note the following interesting connection to the periodogram. Since the the modulus of each entry of
the DFT satisfies (by definition) |d(j/n)|2 = Re{d(j/n)}2 + Im{d(j/n)}2 , the periodogram in (8) is
n 2
â + b̂2j
P (j/n) =
4 j
n 4 4
= Re{d(j/n)}2 + Im{d(j/n)}2
4 n n
= |d(j/n)|2 (10)
• In other words, the periodogram is simply the squared modulus of the DFT!
6
• Side note: the entire DFT can be computed rapidly using an algorithm called the fast Fourier trans-
form (FFT), which takes O(n log n) operations (most efficient in practice when n is a highly com-
posite integer, such as a power of 2). The modern generic FFT algorithm is credited to Cooley and
Tukey in the 1960s, but similar ideas were arround much earlier
• Side side note: different software implementations scale the FFT/DFT differently, so you have to be
careful to consult the documentation. For example, the fft() function in R computes it without the
leading factor of n−1/2 , and with an additional factor of exp(2πij/n), but this doesn’t matter since
we’re only using it in our examples to compute the squared modulus, i.e., the periodogram, across
frequencies
3 Interlude: ST decomposition
• As an interlude, we’ll demo how to use the periodogram, in combination with smoothing (as we
learned in the last lecture) to detect the presence of seasonality in time series, and fit a seasonal-
trend (ST) decomposition
• Warning: what we do here is very simple and may very well offend researchers well-versed in classical
time series decomposition and econometrics alike. It is only meant as a demo. We repeat: it is just a
demo!
• (You can learn more about decomposition methods in time series in Chapters 3.4–3.6 of the HA
book. It is worth mentioning that in statistics, the STL decomposition is pretty standard and pop-
ular, which stands for “seasonal-trend decomposition using LOESS”, with LOESS being a particular
type of smoother that we didn’t cover, but it acts like kernel smoother with a varying bandwidth.
For an econometrics perspective, see the references from the last lecture: Hodrick-Prescott → Hamil-
ton → Hodrick again. The last reference is especially scholarly and reviews what has been done over
the years)
• Ok with all those caveats laid out, a pretty simple and generic method to perform an ST decomposi-
tion of a time series yt , t = 1, . . . , n is as follows:
– Use a smoother to estimate a trend θ̂t , t = 1, . . . , n, aiming to undersmooth somewhat so as to
ignore a (possible) cyclic or seasonal component
– Compute residuals rt = yt − θ̂t , t = 1, . . . , n
– Absent any prior knowledge about the periods of the seasonal component (i.e., without want-
ing to specify a priori that there might be weekly, monthly, quarterly, etc. components), just
compute the periodogram of the residuals
– Identify large peaks in the periodogram, and fit and add to the model seasonal components that
correspond to the predominant periods (inverse frequencies) that are present
• Of course the method presented above is limited in several ways (e.g., it assumes that the seasonal
components have constant amplitude over time), and other methods are more advanced in various
ways. You can read more in the aforementioned references if you are curious
• Figure 5 gives an example applied to US retail employment data. We can see very clear cycles at
about a year, half-year, third-year, and quarter-year
4 Spectral representation
• Now we turn to a general fact about stationary processes in time series, which is called the spectral
representation of the auto-covariance function
• In particular, if xt , t = 1, 2, 3, . . . is stationary with auto-covariance function γ(h) = Cov(xt+h , xt ),
then there exists a unique increasing function F , called the spectral distribution function correspond-
7
15000
Employed
13000
Date
1500000
Frequency
Figure 5: ST decomposition, using an HP filter for the smoother, and a periodogram on the residuals to
detect predominant frequency components.
ing to the process, with F (ω) = 0 for ω ≤ −1/2, and F (ω) = γ(0) for ω ≥ 1/2, such that for any
h = 0, ±1, ±2, . . . ,
Z 1/2
γ(h) = exp(2πiωh) dF (ω)
−1/2
• Above, we can think of F as being analogous to a cumulative distribution function (CDF), and thus
the integral as being analogous as an expectation defined with respect the distribution that governs
F . The only difference is that the total mass here of need not be 1, but is instead γ(0)
• We will generally ignore this distinction and call F a distribution anyway, and in the case F is differ-
entiable, we will denote its derivative by f and call this the spectral density
• When the spectral density exists,1 note that we have the representation
Z 1/2
γ(h) = exp(2πiωh) f (ω) dω, h = 0, ±1, ±2, . . .
−1/2
In other words, the spectral density and auto-covariance function are Fourier transform pairs
1 It
P∞
exists when the auto-covariance function is absolutely summable, which means that it satisfies h=−∞ |γ(h)| < ∞.
See Appendix C of SS for details.
8
• The important high-level perspective to remember here: the auto-covariance function and spectral
density contain the same information about a time series, but express it in different ways
• The auto-covariance function expresses the variation broken down by lags, whereas the spectral
density expresses variation broken down by frequencies—or by cycles (remembering that the inverse
of a frequency is a cycle)
• Next we compute the spectral density in a number of our favorite example stationary time series
models. It will be helpful to point out that γ(h) = γ(−h) implies f (ω) = f (−ω), so we only need to
keep track of f for ω ∈ [0, 1/2]
• So, white noise has the simplest spectral density there is: it is constant!
• For the moment you (may) have been waiting for: we can finally explain where a lot of the nomen-
clature is coming from ... if we think about a time series as being comprised of a mixture of colors—
which we can generally do since any time series has the periodic representation (7)—then spectral
analysis provides us a tool like a prism, for decomposing this series into its primary colors, or spectra.
And, just like the color white, a white noise series is an equal mix of all colors (frequencies)
4.2 MA model
• Moving on to a moving average: consider xt = wt + θwt−1 , t = 0, ±1, ±2, ±3, . . . , where wt , t =
0, ±1, ±2, ±3, . . . is a white noise sequence. The right-hand side here is not exactly the same as an
equal-weights moving average (as we’ve looked at many times before), but a linear filter of the past,
with weights a0 = 1 and a1 = θ. It is generally what we’ll call a moving average (MA) model in the
context of ARIMA, as we’ll learn soon
• By direct calculation, which you can check, the auto-covariance function is:
2 2
(1 + θ )σ h = 0
2
γ(h) = θσ |h| = 1
0 |h| > 1
where in the second line we used cos(θ) = (eiθ + e−iθ )/2, which follows from Euler’s formula eiθ =
cos(θ) + i sin(θ)
• So an MA process has a spectral density that decays from zero, and larger θ means a steeper decay
from ω = 0 to ω = 1/2. See Figure 6 for an illustration
9
theta = 0
5
theta = 0.13
theta = 0.26
theta = 0.39
theta = 0.51
4 theta = 0.64
theta = 0.77
Spectral density
theta = 0.9
3
2
1
Frequency
4.3 AR model
• Lastly, we’ll turn to an autoregressive model: consider xt = φ1 xt−1 +φ2 xt−2 +wt , t = 0, ±1, ±2, ±3, . . . ,
where wt , t = 0, ±1, ±2, ±3, . . . is a white noise sequence. This is a second-order autoregressive (AR)
model, which we’ll learn more about when we cover ARIMA soon
• Calculating the auto-covariance function is going to a bit nasty for this process (as you’ll learn more
about later when we cover ARIMA). However, for our purposes here, we can get away with a trick:
finding an equation that relates the auto-covariance of white noise to that of the AR process
• Since wt = xt − φ1 xt−1 − φ2 xt−2 , t = 0, ±1, ±2, ±3, . . . , we have
γw (h) = Cov(wt+h , wt )
= Cov(xt+h − φ1 xt+h−1 − φ2 xt+h−2 , xt − φ1 xt−1 − φ2 xt−2 )
= γx (h) − φ1 γx (h + 1) − φ2 γx (h + 2) − φ1 γx (h − 1) + φ21 γx (h) + φ1 φ2 γx (h + 1)
− φ2 γx (h − 2) + φ1 φ2 γx (h − 1) + φ22 γx (h)
= (1 + φ21 + φ22 )γx (h) − φ1 (1 − φ2 ) γx (h − 1) + γx (h + 1) − φ2 γx (h − 2) + γx (h + 2)
• Now represent the auto-covariance function γx as an integral with respect to the spectral density fx :
Z 1/2
γw (h) = (1 + φ21 + φ22 ) − φ1 (1 − φ2 )(e−2πiω + e2πiω ) − φ2 (e−4πiω + e4πiω ) exp(2πiωh) fx (ω) dω
−1/2
• However, note that the representation of γw in terms of its own spectral density fw is of course
Z 1/2
γw (h) = exp(2πiωh) fw (ω) dω
−1/2
10
• Since Fourier transforms are uniquely identifying, we infer that the integrands in the last two display
must match:
(1 + φ21 + φ22 ) − φ1 (1 − φ2 )(e−2πiω + e2πiω ) − φ2 (e−4πiω + e4πiω ) exp(2πiωh) fx (ω) = fw (ω)
• And based on our earlier calculation for white noise, we know that fw (ω) = σ 2 , the noise variance, so
plugging that into the above, we learn
σ2
fx (ω) =
(1 + φ21 φ22 )
− φ1 (1 − φ2 )(e−2πiω + e2πiω ) − φ2 (e−4πiω + e4πiω )
+
σ2
= 2 2
(1 + φ1 + φ2 ) − 2φ1 (1 − φ2 ) cos(2πω) − 2φ2 cos(4πω)
where in the second line we used the identity cos(θ) = (eiθ + e−iθ )/2 once again
• Plotting this, as we do in Figure 7, we learn that a second-order AR process has a spectral density
that is concentrated around a particular frequency. For example, when φ1 = 1 and φ2 = −0.9, it is
concentrated around ω ≈ 0.16 (a cycle of 1/ω ≈ 6 time points)
140
80
60
40
20
0
Frequency
Figure 7: Spectral density for the second-order AR model, for a few parameter choices φ1 , φ2 .
11
for ωj = j/n, j = 0, 1, . . . , n − 1
Pn
• We claim
Pn thatt t=1 exp(−2πiωj t) = 0 for any j < n. This can be check by viewing it as a geometric
series t=1 z in z = e2πij/n , and hence we can use the formula
n
X 1 − zn
zt = z , for z 6= 1
t=1
1−z
• The periodogram, recalling (10), is given by the squared modulus of the DFT, P (ωj ) = |d(ωj )|2 .
Using the fact that the squared modulus of z = a + bi is |z|2 = a2 + b2 = (a + ib)(a − ib),
n n
1 XX
P (ωj ) = (xs − x̄)(xt − x̄) exp − 2πiωj (t − s)
n s=1 t=1
n−1
1 X X
= (xs − x̄)(xs+h − x̄) exp(−2πiωj h)
n
h=−(n−1) t=s+h
n−1 n−|h|
X 1 X
= (xs − x̄)(xs+|h| − x̄) exp(−2πiωj h)
n s=1
h=−(n−1)
n−1
X
= γ̂(h) exp(−2πiωj h)
h=−(n−1)
12