Te 1555
Te 1555
Tesis
P R E S E N T A:
Director:
16 Agosto de 2016
Integrantes del Jurado.
Asesor:
Sustentante:
To Israel M Hdz.
Acknowledgements
I would like to express my deepest gratitude to my advisor, Dr Joaquín Or-
tega, for his patience, suggestions and guidance during the last four years.
I thank CIMAT for the facilities to carry out this research and CONACYT
for the PhD scholarship given.
ii
Contents
1 Introduction 1
1.1 Previous Results . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Spectral theory for a stationary process . . . . . . . . . 2
1.1.2 Change point detection . . . . . . . . . . . . . . . . . . 4
1.1.3 Spectral theory for a locally stationary process . . . . . 6
1.1.4 Time Series Clustering . . . . . . . . . . . . . . . . . . 9
3 Clustering Methods 55
3.1 TV distance in a clustering method . . . . . . . . . . . . . . . 57
3.2 Hierarchical spectral merger (HSM) method . . . . . . . . . . 59
3.3 TV distance and other dissimilarity measures . . . . . . . . . 63
3.3.1 Simulation of a process based on the spectral density . 63
3.3.2 Comparative study . . . . . . . . . . . . . . . . . . . . 65
3.4 Detection of transitions between spectra . . . . . . . . . . . . 73
3.4.1 Simulation of transitions between two spectra . . . . . 74
iii
iv Contents
4 Applications to Data 87
4.1 Ocean wave analysis . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1 Data description . . . . . . . . . . . . . . . . . . . . . 88
4.1.2 Results using the TV distance as a similarity measure . 89
4.2 Clustering of EEG data . . . . . . . . . . . . . . . . . . . . . 97
4.2.1 Data description . . . . . . . . . . . . . . . . . . . . . 98
4.2.2 Results using the HSM method . . . . . . . . . . . . . 100
Appendices 109
A R Codes 111
A.1 Computing the TV distance . . . . . . . . . . . . . . . . . . . 111
A.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Introduction
Xt Xt Xt
Figure 1.1: Processes that change in (a) mean, (b) variance and (c) spectra.
Our interest lies in considering changes in spectra (see panel (c) in Figure
1.1). A change in spectra means a change in the waveforms of the signal and
it is very important in many applications. For example, during a storm
sea waves become higher and slower, this has many implications in the
construction of maritime structures. Another example is the study of brain
signals taken by electroencephalograms, in this case an activation of a brain
1
2 Chapter 1. Introduction
region shows faster oscillations of the signal (i.e. more transfered energy in
shorter periods of time). In both cases, the understanding of where and how
these changes happen is relevant. A large part of the literature on spectral
analysis is based on the stationarity assumption for the process. However, in
some cases, we need to carry out spectral analysis for processes that are not
stationary. There are several points of view from which an analysis of this
sort can be assessed. The most frequently used is the detection of change
points in the process. The main assumption of this approach is that a process
{Xt } changes in k specific time points, τ1 , τ2 , . . . , τk , where both the number
and location of the change points are unknown and the process is assumed
to be stationary between them. Another approach is to model the process
{Xt } as a locally stationary process.
This project considers, as an alternative to the change point approach,
the use of clustering methods for time series to identify periods or segments
that have similar spectra. Time series clustering has captured the attention
of many researchers in the past few years.
300
300
1.0
250
250
0.8
200
200
0.6
150
150
0.4
100
100
0.2
50
50
0.0
0
−1.0 −0.5 0.0 0.5 1.0 0 1 2 3 4 0 1 2 3 4
x Hz Hz
Figure 1.2: Estimation of the spectral density, the true density is the dashed red curve.
(a) Parzen window used in the lag window estimator. (b) Estimator using the periodogram.
(c) Estimator using the smoothed lag window
the spectrum of the kth segment, i.e., the spectrum of {Xt } for τk ≤ t < τk+1 .
Davis et al. (2006) developed a methodology called Auto-PARM where the
main idea is to fit an AR model to each piece. The minimum description
length principle is applied to find the “best” combination of the number of
segments, the lengths of the segments, and the orders of the piecewise AR
processes. The estimates are strongly consistent, however, approximating
a nonstationary time series by a parametric AR model which may not be
reasonable in some applications.
Another parametric framework is the Detection of Changes by Penalized
Contrasts (DCPC) developed by Lavielle (1999) and collaborators. They
considered a sequence of real random variables {Xt }t=1,...,n and assumed
that the distribution of the process depends on a parameter θ that changes
abruptly at some unknown instants {τi , 1 ≤ i ≤ K}, where K is also
unknown. To estimate both K and the change points {τi , 1 ≤ i ≤ K}
they used a penalized contrast function of the form J(t, y) + βpen(t), where
the contrast function is defined as
K
X
J(t, y) = C(Xτk−1 , . . . , Xτk ),
k=1
Dette and Paparoditis (2009) and Dette and Hildebrandt (2012) studied
this hypothesis test. The test statistic proposed is a functional of the
euclidean norm of the difference between the spectra integrated over [−π, π).
To establish a rejection region they proposed two options; 1) it can be proved
that the test statistic is asymptotically normally distributed or 2) a bootstrap
procedure based on a Wishart distribution. The power of the test is good in
the examples shown in the paper. The asymptotic distribution is dependent
on the smoothing of the periodogram.
Two other possible statistics are studied in Jentsch and Pauly (2012)
with special interest in the case of unequal length time series (n1 < n2 ). The
statistics are based on the periodogram,
n2
1 X
Tn(1) = cj (In1 ,X (ωj ) − In2 ,Y (ωj ))2
n2 j=1
and
n2
1 X In1 ,X (ωj )
Tn(2) = cj log 2
1{In1 ,X (ωj )In2 ,Y (ωj )6=0} .
n2 j=1 In2 ,Y (ωj )
From the definition and other results (see Dahlhaus, 2011) it can be shown
that for u0 ∈ [0, 1] there exists a stationary process X̃t (u) such that
t 1
|Xt,T − X̃t (u)| = Op − u0 + ,
T T
which justifies the name “locally stationary process”. Xt,T has a unique
time varying spectral density which is, locally, the same as the spectral
density of X̃t (u). Furthermore, it has, locally, the same auto-covariance
since cov(X[uT ],T , X[uT ]+k,T ) = c(u, k) + O(T −1 ) uniformly in u and k, where
c(u, k) is the covariance function of X̃t (u). This justifies taking c(u, k) as the
local covariance function of Xt,T at time u = t/T . This suggests to estimate
a local spectra with rolling windows.
A more formal estimation method for the time varying spectral density
was proposed by Dahlhaus (2000), using an approximation to the Gaussian
8 Chapter 1. Introduction
Change point detection and the locally stationary approaches are not
completely separated. Last and Shumway (2008) studied the problem of
1.1. Previous Results 9
In our case, we will use the total variation distance on the real line, then
X = R and A = B(R), where B(R) is the class of the Borel sets on the real
line.
An important property of the TV distance is that it is bounded between
0 and 1. This property can be easily deduced from the definition. A value
of 1 for the distance can be attained if P and Q have disjoint support. This
property is very useful in order to interpret distances: values close to 1 mean
that the two measures are quite different while distance values close to 0
mean that they are very similar, almost equal. In terms of spectral densities,
if the TVD is equal to 1 then the spectral content of the two signals are
completely different, i.e., they not share a common frequency band.
If P and Q have density functions (typically with respect to the Lebesgue
measure, µ), f and g, the TV distance between them can be computed using
13
14 Chapter 2. Total Variation Distance
Density f
Density g
Figure 2.1: The TV distance measures the similarity between the two densities. The
blue shaded area, that is equal to the pink, is the value of the TV distance.
P = (1 − ε)P0 + εN.
2.1. TV distance and the Wasserstein distance 15
If one only considers two measures, the similarity between them is defined as
follows.
Definition 2.2. Two probability measures P and Q on the same sample
space are α-similar if there exist probability measures λ, P 0 , and Q0 such that
P = (1 − ε1 )λ + ε1 P 0
Q = (1 − ε2 )λ + ε2 Q0 (2.1)
with 0 ≤ εi ≤ α, i = 1, 2.
Smaller values of α correspond to more similar probability measures.
Before we connect this concept with the total variation distance, we consider
the case of measures on the real line and the definition of the Wasserstein
distance.
Definition 2.3. Given α ∈ (0, 1), we define the set of α-trimmed versions
of P by
dQ 1
Rα (P ) := Q ∈ P : Q P, ≤ ,
dP 1−α
where P denotes the set of Borel probability measures on R. Or equivalently,
1
Rα (P ) := Q ∈ P : Q P, Q(A) ≤ P (A) for all A ∈ B(R) .
1−α
Definition 2.4. The Wasserstein distance between two probability measures
P, Q ∈ P, is defined as
Z
W2 (P, Q) = inf{ ||x − y||2 µ(dx, dy), µ ∈ M (P, Q)},
2
(2.2)
In the case of spectral densities, the estimation of the spectrum has noise. In
terms of a model, the signals X1 (t) and X2 (t) are observed with uncorrelated
noises N1 (t) and N2 (t), i.e., we observe
Where the unobserved process (the true signal) is contaminated by the noise
Ni (t). This gives rise to the spectrum in the following sense,
This causes the estimations to be different, even in the case of two processes
with the same spectral density, f X1 (ω) = f X2 (ω). So, the total variation
distance quantifies the level of similarity between two spectra, in the sense
of (2.1).
if there exist a constant value c > 0 such that f1 (ω) = cf2 (ω) for almost all
ω. The non-negativity, symmetry and subadditivity properties are obtained
by restricting the TV distance to this space. The identity of indiscernibles is
satisfied on M, since
(dH )2
≤ dT V ≤ dH ,
2
that could produce similar results when one compares two density functions.
This distance is not considered in the rest of this thesis, however, it is still
an option to explore. A possible disadvantage of the Hellinger distance is
the lack of interpretation in terms of the spectral densities. In the frequency
domain approach, the spectral density and the log spectral density have a
physical interpretation while the interpretation of the square root of the
density is not clear.
Chapter 1 described some of the distances used to compare spectral
densities. Two of the most frequently employed are the L2 norm and the
Kullback-Leibler (KL) divergence. The KL is not symmetric, but there
exists a symmetric version (SKL). Remember that, if f and g are two
density functions,
Z 1/2
2
dL2 (f, g) = (f − g) ,
Z
f
dKL (f, g) = f log
g
and dSKL (f, g) = (dKL (f, g) + dKL (g, f ))/2.
18 Chapter 2. Total Variation Distance
0.20
0.20
0.15
0.15
0.10
0.10
0.05
0.05
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Hz Hz Hz
Figure 2.2: (a) Spectra with different peak frequency but close support. (b) Spectra with
disjoint support. (c) Spectra with close peak frequency and similar support but different
dispersion.
2.2. We look at three different cases. Case 1 - The first spectrum peaks at
10 Hz (black continuous curve) and the second peaks at 15 Hz. Case 2 - The
first spectrum peaks at 5 Hz, the second peaks at 30 Hz, and the supports
are disjoint. Case 3 - The first spectrum peaks at 15 Hz, the second peaks
at 16 Hz, and they have different dispersion.
For each case, we compute the TV distance, the L2 distance, and the SKL
divergence. Table 2.1 shows these values. When the spectra are different
as in Case 1, we would expect all distances to be “big”, and indeed, all
the considered distance values are big enough to distinguish between them.
Now, if we observe Case 2, the spectra are completely different, since they
have different supports. This example shows one of the disadvantages of
the SKL divergence, since we cannot compute the value in this case. Notice
that the TV distance has no problem, and the value is equal to one, which
indicates that the spectra are completely different. On the other hand, the
L2 distance has a value comparable to that of Case 1, even though in this case
the densities have disjoint support. Then, in Case 3 the spectra are different
but it could be difficult to conclude that from the L2 and SKL distances.
The difference would be clearer using the TV distance. A more exhaustive
simulation exercise to compare these distances is performed in Chapter 3.
The TV distance has a finite range, however, we need to establish a
statistical notion of “big”. This is important because, even in the case of two
2.3. Distribution of the TV distance between estimated spectra 19
Table 2.1: Distance values of the TV distance, L2 norm and SKL divergence between
the spectra plot in Figure 2.2.
samples with the same spectral representation, the estimated spectra have a
TV distance value not equal to zero. We would like to choose a threshold
for the TV distance between estimated spectra to decide if the samples were
generated from the same spectral density or not, so that the probability of
type I error is controlled at some level α. In addition, the procedure to choose
this threshold must have enough power to detect when the true spectra are
different. The next sections deal with the distribution of the TV distance
between estimated spectral densities.
f Xi
fNXi = R 1/2 , i = 1, 2.
1/2
f Xi (ω)dω
2.3.1 Estimation of dT V
At this point dT V , defined as (2.3), is not a random quantity because it is
based on the true (though unknown) spectral density. The next step is to
20 Chapter 2. Total Variation Distance
c n−1 !
c−b k(c − b)
Z
d(b) + d(c) X
d(x)dx ≈ + d b+ , (2.4)
b n 2 k=1
n
where n is the number of elements in the partition of the interval [b, c]. We
could choose another numerical approximation and the procedure to obtain
the asymptotic distribution would be similar.
For real data sets, f X1 and f X2 are not known and have to be estimated.
As we mentioned before, the raw periodogram is not mean-square consistent
because its variance does not decrease even when the length of the time series
increases. So, we choose the lag window estimator (smoothed periodogram)
defined in (1.2), with a Parzen window of width a. A lag window estimator
can be rewritten as a spectral average estimator, i.e., the properties of the
lag window estimators are similar to the spectral average.
2.3. Distribution of the TV distance between estimated spectra 21
fˆXi (ω)
fˆNXi (ω) = Xi . (2.5)
γ̂ (0)
Using the estimator (2.5) and the numerical approximation (2.4), we can
write an estimator of the TV distance, dˆT V , as follows.
T
1 X ˆX1 k 1 k 1
dˆT V ˆ 2
X
(2.6)
= f N − − f N − ,
2T k=1 T 2 T 2
is satisfied, then
Sn
−→ N (0, 1).
sn
Lemma 2.1. If the Lyapunov Condition: there exists a δ > 0 such that
n
1 X T →∞
E|Xn,j |2+δ −→ 0,
s2+δ
n j=1
The proof of the theorem and lemma can be found in Brockwell and Davis
(2006), Chapter 6, or Gnedenko and Kolmogorov (1968).
We shall now consider two processes X1 (t) and X2 (t) that satisfy the fol-
lowing hypothesis.
Assumption A1. Suppose that X1 (t) and X2 (t) are independent stationary
processes with mean µ, variance σ 2 , absolutely summable covariance
functions and the spectral densities f Xi are continuous functions with the
first three spectral moments finite.
X
Lemma 2.2. Let fˆL (ω) = (2π)−1 β(h/a)γ̂(h)e−ihω , where β is a taper
|h|≤a
function,Pa is the bandwidth, T is the length of the time series Xt ,
T −|h|
γ̂(h) = t=1 (Xt − X)(Xt+|h| − X) the sample covariance function, and
X = T t=1 Xt . Under assumption A1, if a → ∞ when T → ∞ and
1 T
P
a/T → 0, then
r Z 1
T ˆ
w 2 2
fL (ω) − f (ω) −→ N 0, f (ω) β (u)du
a −1
(2.9)
The proof can be found in Brillinger (1981), Section 5.6.
Proof. Let fˆNXi be the normalized lag window estimator using the time
series Xi (t) with bandwidth a, T the length of the observed time series,
σ 2 = Var(Xi (t)) i = 1, 2, and dˆT V as (2.6).
Let ZT,k and ZT,k
∗
, k = 1, . . . , T , be independent Gaussian random variables
with parameters
Z 1
1 a
µ = 2 and σa,T
2
= β 2 (u)du.
σ T σ4 −1
T
X
Step 1. First, we will show that dˆT V − ∗
cT,k |ZT,k − ZT,k | converges in
k=1
probability to zero.
Notice that cT,k ≥ 0. Without loss of generality, we can assume that
cT,k > 0 (if not, the elements being summed up will be zero). Now, under
H0 ,
T
T
X X
ˆ ∗ ∗
dT V − cT,k |ZT,k − ZT,k | = cT,k |DT,k | − |ZT,k − ZT,k |
k=1 k=1
T
X
∗
≤ cT,k |DT,k | − |ZT,k − ZT,k |
k=1
XT
∗
≤ cT,k DT,k − (ZT,k − ZT,k )
k=1
Then,
T
T
X X
ˆ ∗
∗
0 ≤ dT V − cT,k |ZT,k − ZT,k | ≤ cT,k DT,k − (ZT,k − ZT,k ) .
k=1 k=1
(2.11)
Then,
!!2 !2
fˆNX1 (ωk ) fˆNX2 (ωk ) ∗
ˆX1 (ωk )
f N
E − ZT,k − − ZT,k =E − ZT,k
f (ωk ) f (ωk ) f (ωk )
! ! !2
fˆNX1 (ωk ) fˆNX2 (ωk ) ∗ fˆNX2 (ωk ) ∗
−2E − ZT,k E − ZT,k + E − ZT,k .
f (ωk ) f (ωk ) f (ωk )
(2.12)
fˆNX1 (ωk )
GT,k = − ZT,k
f (ωk )
and
fˆNX2 (ωk )
G∗T,k = ∗
− ZT,k .
f (ωk )
Substituting this notation and using the previous inequalities we get.
T T
c2T,k
X
X
∗ ε 2
E GT,k − G∗T,k
P GT,k − GT,k >
≤ 2
k=1
cT,k k=1
ε
T
c2T,k
X
2 ∗ ∗ 2
= 2
E[GT,k ] − 2E[GT,k ]E[GT,k ] + E[(GT,k ) ]
k=1
ε
T
c2T,k
X
= 2
Var[GT,k ] + E2 [GT,k ] − 2E[GT,k ]E[G∗T,k ]
k=1
ε
+ Var[GT,k ] + E [GT,k ] .
∗ 2 ∗
Notice that the moments of GT,k and G∗T,k are equal and have the same value
2.3. Distribution of the TV distance between estimated spectra 27
Assuming that the first three spectral moments are finite then
T Z 1/2
1X 2 T →∞
f (ωk ) −→ f 2 (ω)dω. (2.15)
T k=1 −1/2
when T → ∞. The convergence of (2.16) and the bound in (2.11) prove that
T
X
ˆ
dT V − ∗
cT,k |ZT,k − ZT,k | converges in probability to zero.
k=1
Step 2. Now, we show that
T
1 X ∗
w
cT,k |ZT,k − ZT,k | − µT,k −→ N (0, 1).
sT k=1
1.0
Standard deviation
1
0.8
1.5
0.6
0.6
pdf(x)
cdf(x)
0.4
0.4
0.2
0.2
Standard deviation
1
1.5
0.0
0.0
2
0 1 2 3 4 5 0 1 2 3 4 5
x x
Figure 2.3: Probability density and distribution functions of the Half Normal (HN)
distribution with different standard deviation values.
Z 1
2
where Σ = 4 2
β 2 (u)du and HN denotes the Half-Normal distribution.
σ −1
The HN distribution is a particular case of the folded normal (see Leone
et al., 1961), when the mean is equal to zero. This distribution is used when
the measurements are Gaussian but only their absolute value is considered.
Figure 2.3 plots examples of the density and distribution functions.
Using properties of this distribution,
r
2a
∗
E(|ZT,k − ZT,k |) = Σ (2.18)
πT
2 a
Var(|ZT,k − ZT,k
∗
|) = Σ2 1 − . (2.19)
π T
Let YT,k = cT,k |ZT,k − ZT,k
∗
| − µT,k , where
r r
2a a
µT,k = cT,k Σ = m1 f (ωk ) ,
πT T3
and m1 is a constant. Then, {(YT,k , 1 ≤ k ≤ T ), k ≥ 1} is an independent
triangular array with mean zero and variance
2 2 2 2 a a
sT,k = cT,k Σ 1 − = m2 f 2 (ωk ) 3 ,
π T T
fact, we will verify the Lyapunov condition, since it implies the Lindeberg
condition.
Lyapunov Condition. There exist a δ > 0 such that
T
1 X T →∞
E|YT,k |2+δ −→ 0.
s2+δ
T k=1
Then,
T
!3/2 T
!3/2
a3/2 X a3/2 1X 2
s3T = m5 9/2 f 2 (ωk ) = m5 3 f (ωk ) .
T k=1
T T k=1
30 Chapter 2. Total Variation Distance
So,
PT h 3/2 3/2 q i
1 3 a a 2a
1 X
T
T k=1 f (ωk ) m 3 T 7/2 + m4 T 7/2 Φ Σ πT
E|YT,k |3 =
s3T k=1
P 3/2
m5 aT 3 T1 Tk=1 f 2 (ωk )
3/2
1
PT 3
h
1
1
q 2a i
T k=1 f (ωk ) m3 T 1/2 + m4 T 1/2 Φ Σ πT
= P 3/2 .
1 T 2
m5 T k=1 f (ωk )
(2.22)
Notice that,
r r !
2 a T →∞
Φ Σ −→ 1/2,
π T
1 T →∞
m3 1/2
−→ 0,
T
1 T →∞
m4 1/2
−→ 0,
T
then, r !
1 1 2a T →∞
m3 + m4 Φ Σ −→ 0.
T 1/2 T 1/2 πT
Since ωk = (k/T − 1/2) and assuming that the first three spectral moments
are finite then
T Z 1/2
1X 3 T →∞
f (ωk ) −→ f 3 (ω)dω,
T k=1 −1/2
T Z 1/2
1X 2 T →∞
f (ωk ) −→ f 2 (ω)dω.
T k=1 −1/2
So, !
T T
1 X 1 X
∗ w
YT,k = cT,k |ZT,k − ZT,k | − µT,k −→ N (0, 1),
sT k=1 sT k=1
when T → ∞.
Finally, we conclude from Step 1 and Step 2 that dˆT V converges to the
XT
same distribution as ∗
cT,k |ZT,k − ZT,k |. It means that dˆT V is asymptoti-
k=1
cally Normal with the same parameters as
PT ∗
k=1 cT,k |ZT,k − ZT,k |.
where cT,k = f (ω
2T
k)
, ZT,k and ZT,k
∗
k = 1, . . . , T are independent random
variables with distribution,
r
σ 2 ∗ χ22Lh T
Zk ∼ , with Lh = R 1 .
2π 2Lh a −1 β 2 (u)du
2.3.4 Bootstrapping
As an alternative to the asymptotic distribution, one can obtain a critical
value of the statistic dˆT V based on a bootstrap procedure. If X(t) is a linear
process, i.e.
X∞
X(t) = ψj W (t − j)
j=−∞
where W (t) is white noise, it can be proved (see Bloomfield, 1976) that the
periodogram of X(t) satisfies
I X (ωk ) ≈ G(ψ, ω)I W (ωk ),
where G(ψ, ω) = ψj e−2πiωj and I W (·) is the periodogram of the white noise
P
W (t). Moreover, if the white noise has variance equal to one then G(ψ, ω)
is equal to the spectral density of X. So, the observed periodogram consist
of the spectral density of X multiplied by the periodogram of a white noise
process. A natural way to get a replicate of the observed spectral density is
to multiply the density f times the estimated periodogram of white noise.
We will explain this with more detail when the consistency of the bootstrap
estimator is proved.
This proposal is motivated by the method presented in Kreiss and Paparo-
ditis (2015).
Algorithm:
1. From X1 (t) and X2 (t), estimate fˆNX1 (ω) and fˆNX2 (ω).
5. Repeat 3 and 4 and estimate dˆT V using the bootstrap spectral densities,
i.e.,
dˆB ˆ ˆB1 ˆB2
T V = dT V (fN , fN ),
Consistency of the Bootstrap estimator for the spectral density. Suppose that
X∞
X(t) is a linear processes, i.e. X(t) = ψj Zt−j where Zt is white noise
j=−∞
∞
X
with variance equal to 1, and assume that |ψj ||j|1/2 < ∞ and EZ14 < ∞.
j=−∞
From Theorem 10.3.1 in Brockwell and Davis (2006), we know that,
Define the functional J as J(f )(ω) = fNX (ω)fˆNZ (ω), where fˆZ is the lag
window estimator for the spectral density of Gaussian standard white noise.
Since we are interested in bootstrapping the lag window estimator instead
of just the periodogram, we now study the behavior of J(f )(ω). We shall
consider the equivalent representation of a lag window estimator as an
averaged periodogram. If
X
fˆL (ω) = β(h/a)γ̂(h)e−i2πhω ,
|h|≤a
is the lag window estimator with bandwidth a, then we can approximate fˆL
as X
fˆL (ω) ≈ β T (ωj )I(ωj ),
|j|<b(T −1)/2c
max E|RT (ωk )|2 = O(T −1 ) ⇒ max E|RT (ωk )|2 = o(1)
ωk ∈[0,π] ωk ∈[0,π]
2
⇒ E|RT (ωk )| = o(1), ∀ωk
P
⇒ RT (ωk ) −→ 0, ∀ωk
P
⇒ max |RT (ωk )| −→ 0.
ωk ∈[0,π]
Hence,
X
max |R̃T (ωk )| = max β(ωj )RT (ωk+j )
ωk ∈[0,π] ωk ∈[0,π]
j
P
X
≤ β(ωj ) max |RT (ωk+j )| −→ 0. (2.30)
ωk ∈[0,π]
j
We conclude from (2.30) that (2.29) holds and the bootstrap estimator of
the spectral density is consistent. Then, the consistency of the bootstrap
approximation for the distribution of dˆT V is a consequence, since dˆT V is a
continuous function of the spectral density.
2.0
1.0
1.0
0.6
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Frequency (Hz) Frequency (Hz)
Figure 2.4: Spectra used in the simulation study to draw Xi (t), i = 1, 2..
Density
50
50
0
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 1000 , a= 31 T= 2000 , a= 44
100 150 200
Density
50
50
0
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 5000 , a= 70 T= 10000 , a= 100
(a) p = 2
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 1000 , a= 99 T= 2000 , a= 158
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 5000 , a= 292 T= 10000 , a= 464
(b) p = 3
p−1
Figure 2.5: Simulation results for AR(1), T = 1000, 2000, 5000, 10000, a = T p ,
p = 2, 3.
38 150 Chapter 2. Total Variation Distance
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 1000 , a= 177 T= 2000 , a= 299
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 5000 , a= 594 T= 10000 , a= 1000
(a) p = 4
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
T= 1000 , a= 251 T= 2000 , a= 437
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
T= 5000 , a= 910 T= 10000 , a= 1584
(b) p = 5
p−1
Figure 2.6: Simulation results for AR(1), T = 1000, 2000, 5000, 10000, a = T p ,
p = 4, 5.
2.4. Simulation Study 39
Density
50
50
0
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 1000 , a= 31 T= 2000 , a= 44
100 150 200
Density
50
50
0
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 5000 , a= 70 T= 10000 , a= 100
(a) p = 2
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 1000 , a= 99 T= 2000 , a= 158
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 5000 , a= 292 T= 10000 , a= 464
(b) p = 3
p−1
Figure 2.7: Simulation results for AR(2), T = 1000, 2000, 5000, 10000, a = T p ,
p = 2, 3.
40 150 Chapter 2. Total Variation Distance
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 1000 , a= 177 T= 2000 , a= 299
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
T= 5000 , a= 594 T= 10000 , a= 1000
(a) p = 4
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
T= 1000 , a= 251 T= 2000 , a= 437
150
150
TVD TVD
Normal Normal
Chisquare Chisquare
100
100
Density
Density
50
50
0
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
T= 5000 , a= 910 T= 10000 , a= 1584
(b) p = 5
p−1
Figure 2.8: Simulation results for AR(2), T = 1000, 2000, 5000, 10000, a = T p ,
p = 4, 5.
2.4. Simulation Study 41
These results show that we should have at least 5000 points to get a good
approximation. However, sometimes we are not able to get so much data.
We explore the case of a “small” sample size and propose a modification to
get a better approximation. “Small” will be relative to a time series case,
because we cannot have a good estimation of the spectral density if T is too
small. We consider the cases when T = 1000 and T = 2000.
First, we would like to verify if the convergence will be faster with a better
choice of the bandwidth a. Consider the same AR(1) and AR(2) processes
as before but with a = 100, 200, 300, and 400. Figures 2.9 and 2.10 show the
results for each case. Increasing the value of a we can approximate better
the dispersion of the dˆT V , however, a bias appears.
r r
2a a
2E(d˜T V ) = σ 2 Σ = K1 , (2.32)
πT T
Z
˜ 2 2 2 a a
4Var(dT V ) = f (ω)d(ω)Σ 1 − = K2 ,
π T T
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 100 T= 1000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 300 T= 1000 , a= 400
(a) T = 1000
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 100 T= 2000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 300 T= 2000 , a= 400
(b) T = 2000
Figure 2.9: Simulation results for AR(1) with small sample size and different values of
the bandwidth, T = 1000, 2000, a = 100, 200, 300, 400.
2.4. Simulation Study 43
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 100 T= 1000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 300 T= 1000 , a= 400
(a) T = 1000
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 100 T= 2000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 300 T= 2000 , a= 400
(b) T = 2000
Figure 2.10: Simulation results for AR(2) with small sample size and different values of
the bandwidth„ T = 1000, 2000, a = 100, 200, 300, 400.
44 Chapter 2. Total Variation Distance
a is closer to T we increase the dispersion but also the mean, and a bias
appears. This is the phenomenon we observe in the simulations.
We would like to increase the dispersion but not the mean, and also we
would like to know how much we should increase it. To prove the convergence
of the smoothed periodogram the following inequality is used (Brockwell and
Davis, 2006),
−1 −1 P 2
β 2T (ωj ) Var(fˆ(ω)) ≤ f 2 (ω) +
P P 2 2a+1
β T (ωj ) o β T (ωj ) + c2 T
,
Z 1
X a
where c2 is a constant and β 2T (ωj ) ≈ β 2 (u)du. This inequality and
T −1
|j|<a
(2.32) motivate the following proposal.
Consider a transformation of the random variable d˜T V , that approximates
ˆ
dT V by the following function,
2a + 1 ˜ 2a + 1
1+ dT V − E(d˜T V ). (2.33)
T T
The results obtained are presented in Figures 2.11 and 2.12. The
transformed approximations, for values of a bigger than 300, capture the
right dispersion of the distribution and reduce the bias. However, the
approximations are not completely accurate. This is to be expected since
we have “small” values of T .
Bootstrapping. Now, we explore the approximation using the bootstrap
procedure. The simulation setting in this case is T = 1000, 2000 and
a = 100, 150, 200, 250. We consider the AR(1) and AR(2) processes as before.
In this case we take one pair of samples from [X1 (t), X2 (t)], then we draw a
bootstrap sample based on them. Finally, we compare this bootstrap density
with a density obtained by dˆT V from different replicates of [X1 (t), X2 (t)].
Figures 2.13 and 2.14 show the results.
The bootstrap density is a good approximation to the dˆT V density.
It doest not depend on a, in the sense that under any value of a the
approximations are very close to the empirical. The performance of the
bootstrap is equally precise for both processes.
2.4. Simulation Study 45
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 100 T= 1000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 300 T= 1000 , a= 400
(a) T = 1000
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 100 T= 2000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 300 T= 2000 , a= 400
(b) T = 2000
Figure 2.11: Results using the transformed values for AR(1), T = 1000, 2000, a =
100, 200, 300, 400.
46 Chapter 2. Total Variation Distance
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 100 T= 1000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 1000 , a= 300 T= 1000 , a= 400
(a) T = 1000
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 100 T= 2000 , a= 200
TVD TVD
Normal Normal
20 40 60 80
20 40 60 80
Chisquare Chisquare
Density
Density
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
T= 2000 , a= 300 T= 2000 , a= 400
(b) T = 2000
Figure 2.12: Results using the transformed values for AR(2), T = 1000, 2000, a =
100, 200, 300, 400.
2.4. Simulation Study 47
TVD TVD
50
50
Bootstrap Bootstrap
Density
Density
30
30
0 10
0 10
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 100 T=1000 a= 150
TVD TVD
50
50
Bootstrap Bootstrap
Density
Density
30
30
0 10
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 200 T=1000 a= 250
(a) T = 1000
TVD TVD
50
50
Bootstrap Bootstrap
Density
Density
30
30
0 10
0 10
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 100 T=1000 a= 150
TVD TVD
50
50
Bootstrap Bootstrap
Density
Density
30
30
0 10
0 10
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 200 T=1000 a= 250
(b) T = 2000
Figure 2.13: Results using bootstrap for AR(1), T = 1000, 2000, a = 100, 150, 200, 250.
48 40 Chapter 2. Total Variation Distance
40
TVD TVD
30
30
Bootstrap Bootstrap
Density
Density
20
20
10
10
0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 100 T=1000 a= 150
40
40
TVD TVD
30
30
Bootstrap Bootstrap
Density
Density
20
20
10
10
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 200 T=1000 a= 250
(a) T = 1000
40
40
TVD TVD
30
30
Bootstrap Bootstrap
Density
Density
20
20
10
10
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 100 T=1000 a= 150
40
40
TVD TVD
30
30
Bootstrap Bootstrap
Density
Density
20
20
10
10
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
T=1000 a= 200 T=1000 a= 250
(b) T = 2000
Figure 2.14: Results using bootstrap for AR(2), T = 1000, 2000, a = 100, 150, 200, 250.
2.4. Simulation Study 49
Test statistic:
T
1 X ˆX1 k 1 k 1
dˆT V ˆ 2
X
= f N − − f N − .
2T k=1 T 2 T 2
• We draw two time series from the AR(2) process with the same
parameters.
Under H0
Table 2.3 shows the proportion of times that the null hypothesis is rejected
using the critical value associated to each approximation. If we observe
the theoretical approximation, the transformed values, compared to the non
transformed, have proportions closer to α. As we expect, for the theoretical
approximation the value of a has an influence on the proportion of rejection,
bigger values of T need bigger values of a. On the other hand, the bootstrap
procedure outperforms the rest in all cases. The proportion of rejection using
the bootstrap procedure is almost equal to α, and is not influenced by the
choice of a.
Power. Now, we draw a time series from the AR(2) process but with
different parameters. Figure 2.15 shows the spectrum for X1 , the continuous
black curve, and for X2 we use three different cases, the dotted curves. So,
we fix the spectra for X1 and we use one of the others for X2 . They are very
close and we would like to see how many false positive there will be as the
spectra get closer.
2.4. Simulation Study 51
AR(2) Spectrum
Peak Frequency
2.0
0.27
0.28
0.29
1.5
0.3
1.0
0.5
0.0
We use T = 1000, a = 200 (Table 2.4) and T = 2000, a = 300 (Table 2.5),
and 1000 replicates to approximate the distributions and 1000 replicates to
study the test performance.
In all cases, the power decreases when the spectra are closer. The Chi-
squared approximation has the biggest power. If we compare the power of
the theoretical and the transformed approximations, the transformation does
not improve the power. This fact is a consequence of the subestimation of
the dispersion of the distribution of dˆT V .
The power in the bootstrap case is closer to the theoretical approximation
when the spectrum of X2 has the peak frequency at .27. When the spectra
are closer the power decreases but it is still acceptable (around .7) when the
peak frequency is at .28.
The comparison between the power of the bootstrap procedure and the
asymptotic approximations is not completely fair, since the significance
level when one uses the asymptotic distributions is bigger than the fixed
level. Since the bootstrap procedure is the only option that preserves the
significance level, it will be the best option to use in practice even when the
power could be low.
52 Chapter 2. Total Variation Distance
Proportion of Rejection H0
1.0
●
0.8
H0 HA ●
●
0.6
0.4
●
●
0.2
Normal
● Transformed Normal
● Chi−square
α
●
● Transformed Chi−squared
0.0
Bootstrapping
Proportion of Rejection H0
1.0
● ●
● ●
0.8
H0 HA
●
0.6
0.4
●
0.2
●
Normal
● ● Transformed Normal
Chi−square
α ● Transformed Chi−squared
0.0
Bootstrapping
2.5 Discussion
In comparison with other similarity measures, the total variation distance
has some desirable properties. The intuition and easy interpretation is one
of them. Also, contamination models give an interpretation of the distance as
the level of similarity. It is important to note that we use the total variation
distance to compare continuous functions, since the total variation distance
is not useful to compare discrete with continuous functions.
We explored the statistical properties of the estimator of the total
variation distance, dˆT V . Two approximations of the distribution are
proposed, using Gaussian or Chi-squared variables, and a transformation
of them is introduced in the case of small samples. In the simulation study,
the test based on these distribution has a bigger significance level than the
nominal α. The transformations gave a closer significance level to α, however,
they are not sufficiently precise for “small” T .
As an alternative, we propose a bootstrap algorithm and the results are
very good. The bootstrap outperforms the asymptotic methods and the
significance level is almost equal to α. It has the limitation of low power when
the spectral densities are very close. In general, the bootstrap procedure
is the best option to approximate the distribution of dˆT V , under the null
hypothesis.
The developed theory can be extended to the multivariate case. Another
posible extension is to consider the distances between some operator of the
spectral density such that the first or second derivative of the spectra, this
would be useful in some applications.
Chapter 3
Clustering Methods
Our main goal is to detect changes in spectra and the previous chapter
explores the proposal of considering the TV distance as a similarity measure
between spectra. As was mentioned in the introduction, several methods for
detecting instantaneous breaks in time series have been proposed, but they
do not produce good results when the changes are slow. In this situation
it is convenient to change the point of view from detecting change-points to
determining time intervals during which the spectra are similar, in the sense
that their TV distance is small. If one considers that time series that have
similar spectral densities also share similar properties, one could think about
them as a group. Taking this into account, clustering methods are a natural
approach. Clustering based on spectral densities will be intuitive in many
applications.
In general, clustering is a procedure whereby a set of unlabeled data
is divided into groups so that members of the same group are similar, while
members of different groups differ as much as possible. Our goal is to develop
a method that produces groups or clusters consisting of time series having
similar spectral representation.
The subject of time series clustering is an active research area with
applications in many fields. Frequently, finding similarity between time
series plays a central role in the applications. In fact, time series clustering
problems arise in a natural way in a wide variety of fields, including
economics, finance, medicine, ecology, environmental studies, engineering,
and many others. This is not an easy task, since it requires a notion
of similarity between time series. Liao (2005) and Caiado et al. (2015)
give a revision of the field and Montero and Vilar (2014) present an R
55
56 Chapter 3. Clustering Methods
package (TSclust) for time series clustering with a wide variety of alternative
procedures. According to Liao (2005), there are three approaches to time
series clustering: methods based on the comparison of raw data, feature-
based methods, where the similarity between time series is gauged through
features extracted from the data, and methods based on parameters from
models adjusted to the data.
The first approach, comparison of raw data, will be very complicated
when we have long time series, since it becomes a computational problem.
The third approach, based on parameters, is one of the most frequently used,
however, it has the limitation of considering a specific parametric model.
Our proposals are feature-based and the spectral density of the time series
is considered the central feature for classification purposes. The resulting
clusters will be similar in the sense that the time series in a cluster will have
similar spectral density. This will have an interpretation depending on the
application, Chapter 4 presents two different cases and the interpretation for
each one.
To build a clustering method the first question is how to measure the
similarity between spectral densities. We propose the use of the total
variation distance as a measure of similarity. Then, we need a clustering
algorithm, and we use a hierarchical algorithm with classical linkage functions
as our first proposal.
However, hierarchical clustering algorithms with linkage functions (such
as complete, average, Ward, and so on) are based on geometrical ideas
where the distances between new clusters and old ones are computed by
a linear combination of the distance of their members, which may not be
meaningful for clustering time series since these linear combinations may not
have a meaning in terms of the spectral densities. So, our second proposal
considers a new clustering algorithm, which takes advantage of the spectral
theory. We propose the Hierarchical Spectral Merger algorithm, which is
a modification of the classical hierarchical algorithms. The main difference
is the consideration of a new representative, i.e. a new estimation of the
spectral density for an updated cluster. This is intuitive and the updated
spectral estimates are smoother, less noisy and hence give better estimates
of the TV distance.
We explain each proposal in detail and compare through simulation
studies their performance.
3.1. TV distance in a clustering method 57
Example 3.1. Consider two different AR(2) models with spectra concen-
trated at 0.05 Hz and 0.06 Hz, respectively. We simulate three time series
for each process, each one consisting of 1000 points with a sampling fre-
quency of 1 Hz, being X1 , X3 , X5 from the first process and X2 , X4 , X6 from
the second process. Figure 3.1(a) shows the estimated spectra for each series.
We compute the dissimilarity matrix with the TV distance and the values
are shown in 3.1(b). These values are not big since the spectra are close.
When we apply the hierarchical algorithm with complete and average link
58 Chapter 3. Clustering Methods
250
0.25
X5
0.20
X4
150
150
0.15
X3
0.10
X2
50
50
0.05
X1
0
0
0.00
0.00 0.05 0.10 0.15 0.20 X1 X2 X3 X4 X5 X6 0.00 0.05 0.10 0.15 0.20
w (Hz) w (Hz)
Figure 3.1: (a) Estimated spectra for Example 3.1, (b) Dissimilarity Matrix, computed
usig the TV distance, and (c) Clustering result using either the complete or average link
functions.
Dendrogram with Complete link function Dendrogram with Average link function
0.25
0.15
0.15
Height
Height
X3
X3
X6
X6
0.05
0.05
X2
X4
X2
X4
X1
X5
X1
X5
(a) (b)
Figure 3.2: Dendrograms obtained for Example 3.1 using (a) the complete link function
and (b) the average link function.
3.2. Hierarchical spectral merger (HSM) method 59
Algorithm:
Table 3.1: Hierarchical Merger Algorithm proposed using the total variation distance
and the estimated spectra.
300
300
200
200
100
100
0
0
0 10 20 30 40 50 0 10 20 30 40 50
w (Hz) w (Hz)
(a) (b)
Figure 3.3: Estimated spectra. (a.) Different colors correspond to different time series,
(b.) Red spectra are from the AR(2) model with activity at alpha (8-12 Hz) and beta
(12-30 Hz) bands and black spectra are from the AR(2) model with activity at alpha and
gamma (30-50 Hz) bands.
power at 21 Hz while the other has power at 40 Hz. We simulate three time
series for each process, 10 seconds of each one with a sampling frequency
of 100 Hz (t = 1, . . . , 1000). Figure 3.3(a) shows the estimated spectra for
each series and Figure 3.3(b) shows by different colors (red and black) which
one belongs to the first or second process. If we only look at the spectra,
it is hard to recognize the number of clusters and their memberships. We
probably could not identify some cases, like the red and purple spectra.
The dynamics of the HSM method is shown in Figure 3.4. We start with
six clusters; at the first iteration we find the closest spectra, represented in
Figure 3.4(a) with the same color (red). After the first iteration we merge
these time series and get 5 estimated spectra, one per cluster, Figure 3.4(b)
shows the estimated spectra where the new cluster is represented by the
dashed red curve. We can follow the procedure in Figures 3.4(c), (d), (e)
and (f). In the end, the proposed clustering algorithm reaches the correct
solution, Figures 3.4(g) and 3.3(b) coincide. Also, the estimated spectra for
the two clusters, Figure 3.4(h), is better than any of the initial spectra and
we can identify the dominant frequency bands for each cluster.
We developed the HSMClust package written in R that implements
our proposed clustering method. The package can be downloaded from
http://ucispacetime.wix.com/spacetime#!project-a/cxl2.
The principal function is called HSM, which executes the HSM method given
a matrix X, which has the signals by column. HSMClust has also some
other useful functions. One of them is the Sim.Ar function, which draws
62 Chapter 3. Clustering Methods
300
200
200
100
100
0
0
0 10 20 30 40 50 0 10 20 30 40 50
w (Hz) w (Hz)
(a) (b)
300
200
200
100
100
0
0 10 20 30 40 50 0 10 20 30 40 50
w (Hz) w (Hz)
(c) (d)
300
200
200
100
100
0
0 10 20 30 40 50 0 10 20 30 40 50
w (Hz) w (Hz)
(e) (f)
300
200
200
100
100
0
0 10 20 30 40 50 0 10 20 30 40 50
w (Hz) w (Hz)
(g) (h)
Figure 3.4: Dynamic of the hierarchical merger algorithm. (a), (c), (e) and (g) show the
clustering process for the spectra. (b), (d), (f) and (h) show the evolution of the estimated
spectra, which improves when we merge the series in the same cluster.
3.3. TV distance and other dissimilarity measures 63
g2 4 4 exp(−(ω−ωp )2 /2ωp2 s2 )
S(ω) = exp(−5ωp /4ω )γ
ω5
where g is the acceleration of gravity, s = 0.07 if ω ≤ ωp and s =
0.09 otherwise;
√ ωp = π/Tp and γ = exp(3.484(1 − 0.1975(0.036 −
0.0056Tp / Hs )Tp4 /(Hs2 ))). The parameters for the model are the significant
wave height Hs , which is defined as 4 times the standard deviation of
the time series, and the spectral peak period Tp , which is the period
corresponding to the modal frequency of the spectrum. This spectral family
was empirically developed after analysis of data collected during the Joint
North Sea Wave Observation Project, JONSWAP, (Hasselmann √ et al., 1973).
√
It is a reasonable model for wind-generated seas when 3.6 Hs ≤ Tp ≤ 5 Hs .
where t is a white noise process. The characteristic polynomial for this model
is φ(z) = 1 − φ1 z − φ2 z 2 . The roots of the polynomial equation indicate the
properties of the oscillations. If the roots, denoted z01 and z02 are complex-
valued then they have to be complex-conjugates, i.e., z01 = z02 . These roots
have a polar representation
2πη
|z01 | = |z02 | = M, arg(z0 ) = , (3.2)
Fs
AR(2)
M = 1.01 −− Sampling Freq= 1000 Hz
150000
40000
6e+05
30000
100000
4e+05
20000
50000
2e+05
10000
0e+00 10 21 40
0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Hz Hz Hz
1000
600
300
400
200
500
200
100
0
0
0
−100
−200
−500
−300
−600
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
seg seg seg
Figure 3.5: Top: Spectra for the AR(2) process with different peak frequency; η =
10, 21, 40. Bottom: Realization from the corresponding AR(2) process.
M → 1+ .
Then, given (η, M, Fs ) we take
2 cos(w0 ) −1
φ1 = and φ2 = , (3.3)
M M2
where w0 = 2πηFs
. If one computes the roots of the characteristic polynomial
with the coefficients in (3.3), they satisfy (3.2). To illustrate the type of
oscillatory patterns that can be observed in time series from processes with
corresponding spectra, we plot in Figure 3.5 the spectra (top) for different
values of η, M = 1.1 and Fs = 100Hz; and the generated time series
(bottom). Larger values of η gives rise to faster oscillation of the signal.
2.5
8
20
2
Tp= 3.6
Tp= 4.1
6
15
1.5
4
10
1
2
0.5
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Frequency (Hz) Frequency (Hz) Frequency (Hz)
Figure 3.6: Spectra used in the simulation study to compare the TV distance with other
similarity measures. Each spectrum, with different color and line type, corresponds to a
cluster.
2|Gj ∪ Ci |
where Sim(Gj , Ci ) = . Note that this similarity measure will
|Gj | + |Ci |
return 0 if the two clusterings are completely dissimilar and 1 if they are the
same.
In the comparative study, we replicate each simulation setting N times, and
compute the rate of success for each one. The mean of this index is shown
in Tables 3.2, 3.3 and 3.4 and a box plot of the values obtained is shown in
Figures 3.7, 3.8 and 3.9.
We consider three different experiments. The first one is motivated by
the applications in Oceanography, where the differences between spectra
could be produced by a small change in the modal frequency. The second
experiment was designed to test if the proposals are able to distinguish
between a unimodal and a bimodal spectrum. Finally, the third one considers
models that are frequently used in the study of signals using a frequency
domain approach. For all the experiments, the lengths of the time series
were T = 500, 1000, and 2000.
to 1.28 Hz, which is a common value for wave data recorded using sea
buoys.
• Experiment 2 is based on the AR(2) process. Let Ztj be the j-th
component, j = 1, 2, 3, having AR(2) process with Mj = 1.1 for all
j and peak frequency ηj = .1, .13, .16 for j = 1, 2, 3, respectively. Ztj
represents a latent signal oscillating at a pre-defined band. Define the
observed time series to be a mixture of these latent AR(2) processes.
Xt1 eT1 1 ε1t
X2 eT Zt ε2
t 2 t
.. = .. Zt2 + .. (3.5)
. . 3 .
Z t 3×1
XtK K×1 eTK K×3 εK
t K×1
where εjt is Gaussian white noise, Xtj is a signal with oscillatory behavior
generated by the linear combination eTi Ztj and K is the number of
clusters. In this experiment, K = 3, with eT1 = c(1, 0, 0), eT2 = c(0, 1, 0)
and eT3 = c(0, 1, 1), and the number of draws of each signal Xti is
5, Figure 3.6(b) plots the three different spectra. So, we have three
clusters with five members each. For this experiment N = 1000
replicates were made, and the sampling frequency was set to 1 Hz.
• Experiment 3 considers time series with two additive components.
The first component is an oscillating process with random amplitude
while the second one is a random noise with an autoregressive structure.
The general model is:
X(t) = A cos(2πtω0 ) + B sin(2πtω0 ) + Z(t),
where Z(t) is an AR(2) process with parameters (η, M ), and A, B
are independent Gaussian N (0, 1) random variables. We look at three
different models, one per cluster.
a) Model 1 ω0 = .3, η = .2, M = 1.3
b) Model 2 ω0 = .1, η = .18, M = 1.3
c) Model 3 ω0 = .25, η = .22, M = 1.3
For each model five time series were generated, with T = 500, 1000, 2000
and N = 1000 replicates of the experiment. Figure 3.6(c) presents the
spectral densities for the AR(2) components and shows that they are
close to each other. The sampling frequency was set to 1 Hz
3.3. TV distance and other dissimilarity measures 69
1.0
●
0.9
0.9
● ● ● ● ● ● ● ● ● ● ●
0.8
0.8
●
● ● ● ● ●
0.7
0.7
● ● ● ●
0.6
0.6
● ● ●
0.5
0.5
NP LNP CEP TV SKL HSM1 HSM2 NP LNP CEP TV SKL HSM1 HSM2
Similarity Index
1.0
0.9
●
0.8
●
●
0.7
●
●
0.6
●
●
●
0.5
● ●
Figure 3.7: Box plots of the rate of success for the replicates under the simulation setting
of Experiment 1, by using different distances.
Experiment 1
Table 3.2: Mean values of the similarity index obtained using different distances and the
two proposed methods in Experiment 1. The number of replicates is N = 500.
Experiment 2
Table 3.3: Mean values of the similarity index obtained using different distances and the
two proposed methods in Experiment 2. The number of replicates is N = 1000.
Experiment 3
Table 3.4: Mean values of the similarity index obtained using different distances and the
two proposed methods in Experiment 3. The number of replicates is N = 500.
3.3. TV distance and other dissimilarity measures 71
1.0
● ● ● ● ● ●
0.9
0.9
●
● ● ● ● ● ●
0.8
0.8
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ●
0.7
0.7
● ● ● ● ● ● ● ● ●
● ● ● ● ●
●
● ●
● ●
● ●
● ● ●
● ● ●
NP LNP CEP TV SKL HSM1 HSM2 NP LNP CEP TV SKL HSM1 HSM2
Similarity Index
1.0
● ● ● ● ● ●
0.9
● ●
0.8
●
0.7
● ●
Figure 3.8: Box plots of the rate of success for the replicates under the simulation setting
of Experiment 2, by using different distances
1.0
● ●
0.9
0.9
● ●
● ●
0.8
0.8
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ● ●
0.7
0.7
● ● ● ● ● ● ●
● ●
0.6
0.6
●
●
0.5
0.5
●
●
●
●
0.4
0.4
● ●
NP LNP CEP TV SKL HSM1 HSM2 NP LNP CEP TV SKL HSM1 HSM2
Similarity Index
1.0
●
0.9
●
●
0.8
● ● ● ● ●
● ● ● ●
●
● ● ● ● ●
0.7
● ● ● ●
0.6
0.5
0.4
Figure 3.9: Box plots of the rate of success for the replicates under the simulation setting
of Experiment 3, by using different distances
3.4. Detection of transitions between spectra 73
Figures 3.7, 3.8, and 3.9 show the boxplot for the values of the rate
of success obtained in each experiment. We can see from the box plots
corresponding to Experiment 1 that the CEP distance has many values
smaller than 0.9 even in the case of T = 2000.
In Experiment 2, the HSM method did not have a good performance, in
small and medium sized time series, compared to the others. It is necessary
to have T = 2000, for the HSM method to identify the clusters more precisely.
The NP distance has the worst performance overall. In Experiment 3, the
performance of all distances does not improve significantly when we increases
the length of the series. The LNP and CEP distances instead of improving,
get worse when we increases the value of T .
In general, the rate of success for the methods that used the TV distance
are good, in some cases they have the best results, and when they do not, they
are close to the best. The methods based on logarithms, such as the LNP and
CEP, have in some cases a good performance but in others their performance
is very poor. The SKL has the best results in many cases, however, as we
mentioned in Chapter 2, this distance cannot be computed when two spectra
have disjoint support. In addition, methods that use logarithmic functions
require more computational time than methods based directly on spectra.
It is important to mention that these methods could be applied to big
data, long time series or several series. In general, the proposed methods
are efficient in this sense. The computational complexity is O(n3 T ), where
n is the number of time series to be clustered and T the length of each time
series. It implies that the computational time does not increases exponentialy
as with other methods.
Considering the properties of the TV distance and its performance when
used as a dissimilarity measure in a clustering method, we consider the
procedures proposed as a good option as time series clustering methods.
where a and b are functions with slow changes, a(t) ≈ a(t + h) and
b(t) ≈ b(t + h) if h is small, a(0) = b(T ) = 1 and a(T ) = b(0) = 0. Then it
is easy to see that, for small values of h,
cov(Xt , Xt+h ) = a(t) a(t + h)r1 (h) + b(t) b(t + h)r2 (h)
p p p p
So, Xt is a process that for t near 0 has locally covariance function r1 and
for t near T it has r2 .
3.4. Detection of transitions between spectra 75
0.5
0.1 0.4
0.3
0.05 0.2
0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency (Hz) Frequency (Hz)
(a) (b)
Figure 3.10: Simulation of a transition between two spectral densities. (a) The transition
starts with Tp = 5 (the blue curve) and finishes with Tp = 3.6 (the red curve). (b)
Estimated densities from the simulated data during the transition.
Example 3.3. To test our method we have to produce data from a process
that has a transition period, with a slow change from one spectrum to
another. Take f1 and f2 as JONSWAP spectra with Hs = 1 in both cases and
Tp = 5 and 3.6 respectively. We choose a(t) = 1 − T /t and b(t) = 1 − a(t),
where T = 5 hours is the total observation time. Figure 3.10(a) shows
the spectra involved in the transition, starting with the blue spectrum and
finishing with the red spectrum. Figure 3.10(b) shows the estimated densities
after we apply the algorithm, we can observe the form of the transition and
how the process starts at f1 and finishes at f2 .
Jonswap Tp=3.6
Jonswap Tp=4.2
Torsethaugen Tp=5
0.1
Transition Transition
0.05
Stationary Period Stationary Period Stationary Period
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Frequency (Hz)
(a) (b)
Figure 3.11: Elements of Experiment 4. (a) Spectra involved in the stationary periods.
(b) Sketch of the simulation sequence (100 points per period).
simulation has three stationary periods and two transitions between them,
both transitions lasts 3 hours from one stationary period to the next.
Stationary Period 1 - the simulated series starts with waves from a stationary
period of 4 hours, from a JONSWAP spectrum with peak period Tp = 3.6,
Stationary Period 2 - the second stationary period corresponds to another
4 hours of waves drawn from a JONSWAP spectrum with Tp = 4.2, and
Stationary Period 3 - a third 4-hour stationary period but in this case from
a bimodal family, Torsethaugen spectrum with Tp = 5.0.
In this case, we simulate N = 1000 replicates and the sampling frequency
was set to 1.28 Hz.
Figure 3.11(b) shows a sketch of the simulation setting, where we get
one continuous signal. We start with the stationary period in red which
corresponds to the red spectrum in Figure 3.11(a), then a transition period
in gray color, and so on. Figure 3.11(a) plots the spectra involved in the
experiment for the stationary periods.
The test procedure is the following.
1. Each time series has 82944 time points, 4608 points per hour (18 hrs).
First, we consider that there are just three genuine clusters, since
the transition periods, by definition, do not represent intervals with a
homogeneous behavior. We consider the TV distance in a hierarchical
algorithm with the complete link function and the HSM with the two possible
algorithms, HSM 1 and HSM 2.
Figure 3.12 shows the results obtained if we set the number of clusters to
3. Each plot represents by the corresponding color (red, blue and black) the
members that are assigned to the same cluster. For the three methods, the
resulting clusters contain each one of the stationary periods, 0-4 hours, 7-11
hours and 14-18 hours. The beginning of the transitions are mostly assigned
to the previous stationary period, for example the two segments from 4 to 5
hours are assigned to the same cluster as the first stationary period. While
the end of the transitions are assigned to the next stationary period. This is
reasonable since these are the more similar periods, respectively. The middle
of the transitions seem to be assigned randomly between the two closest
stationary periods.
It is interesting to observe that, in general, the elements in a cluster are
contiguous in time, even when no information about the time structure of
the series is included in the procedure, and the methods identify the changes
in the transition periods.
The problem of deciding whether the intervals close to the border belong
to a cluster or should be classified as transition periods requires a criterion
for deciding whether a given interval is “well classified” within a given cluster.
In Alvarez-Esteban et al. (2016a), we explore the use of the silhouette index,
proposed by Rousseeuw (1987), which gives a measure of the adequacy of
each point to its cluster.
Another approach that was also attempted was the use of trimming
procedures in the clustering process, as is considered in the work of Cuesta-
Albertos and Fraiman (2007) for functional data. In this context, the
spectral densities would be the functional data to be classified. The trimming
procedure “discards” a certain fraction of the information in the classification
process, in order to robustify the result, and it seems reasonable to consider
the trimmed information as data objects that do not fit properly within any
of the clusters. In consequence they could be labelled as transition periods.
An important shortcomming of this method is the long time it takes even
with moderately sized samples, and therefore the difficulty in handling real-
life information.
As an alternative one could consider that there should be five clusters,
78 Chapter 3. Clustering Methods
Figure 3.12: Members of groups 1 (red), 2 (blue), and 3 (black). If we consider only 3
clusters.
3.4. Detection of transitions between spectra 79
Figure 3.13: Members of groups 1 (red), 3 (blue), and 5 (black), which correspond to
each one of the stationary periods.
80 Chapter 3. Clustering Methods
From the definition of VD it is clear that high values point to suitable values
of k. However, the maximum value of VD (k) is not always the best choice,
specially when we have patterns which include clusters which are close to
each other. This situation is common in random sea waves where consecutive
stationary periods can have similar characteristics. All these issues make the
choice of the “optimal” k not an automatic process.
Since this is a well known criterion, we will not present any simulation in
this case. In applications, the results obtained with this index are similar to
other indices such as the David-Bouldin Index.
Test based on the distribution of dˆT V . Another procedure for deciding the
number of clusters is based on the bootstrap algorithm proposed in Chapter
3. We will use this methodology to approximate the distribution of the total
variation distance between two clusters. Note that due to the hierarchical
structure of the algorithms used in all the methods proposed, the test
H0 : k − 1 Clusters vs HA : k Clusters,
H0 : 1 Cluster vs HA : 2 Clusters.
This is because the (k − 1) clusters are built by joining two of the k clusters.
The distribution of the total variation distance between two clusters
depends on the clustering procedure. When using the HSM method we aim
to approximate the distribution of the distance between the mean spectra in
each cluster while for the hierarchical clustering with the TV distance, we
need to produce samples from each cluster to approximate the distribution
of the distance calculated through the link function.
The procedure of this test will be:
• Run the clustering procedure, either the HSM method or hierarchical
clustering with the average or complete linkage.
• Identify the two clusters that are joined to get the (k − 1) clusters.
• Consider as the estimation of the common spectra, fˆ, the mean spectra
over all elements in both clusters.
Experiment 1
Table 3.5: Proportion of times that the null hypothesis is rejected. Complete corresponds
to the TV distance in hierarchical algorithm with the complete link function, and Average
with the average link.
Case 1. When using the HSM method simulate two spectral densities from
the common spectra f and compute the TV distance between
them. We repeat this procedure M times.method.
Case 2. When using hierarchical clustering with the TV distance simulate
two sets of spectral densities of size g1 and g2 from the common
spectra f , where gi are the number of members in cluster i = 1, 2
(clusters to be joined). We compute the link function (complete or
average) between these two sets of spectra using the TV distance.
Remark. Notice that this test assumes that there exits a common spectra f .
To explore the performance of our proposals, we used Experiments 1
and 2. We consider the TV distance to feed a hierarchical algorithm with
two different link functions, average and complete. Also, we consider the
HSM method with the hierarchical spectral merger algorithm. In this case,
we just use 500 replicates for each experiment.
Tables 3.5 and 3.6 present the proportion of times that the null hypothesis
is rejected. To reject we consider a bootstrap quantile of probability α. We
do not expect to have a proportion of rejection equal to α, since in the case of
using the complete or average link, these values are not a direct observation
of the TV distance. However, we expect to have a good performance. In
general, it could be possible to overestimate the number of clusters.
In Experiment 1 the true number of clusters is 2. From Table 3.5,
we observe that all methods reject the hypothesis of one cluster, at all the
3.5. Unknown number of clusters 83
Experiment 2
Table 3.6: Proportion of times that the null hypothesis is rejected, in Experiment 2.
significance levels. This means that the procedure will not under estimate
the number of clusters. To test 2 vs 3, the proportion of rejection is high
when we use the average link function, except in the case of α = 0.01. If we
use the complete link, the results are better. However, the best results are
for the HSM method.
In Experiment 2 the true number of clusters is 3. This is a more a
difficult case, since the spectra are very close. From Table 3.6 when testing 2
vs 3 clusters, we observe that the complete and average link functions do not
under estimate the number of clusters. However, the HSM method can not
distinguish 3 clusters at a level α = 0.01, but it is possible at higher levels.
For testing 3 vs 4 clusters, the performance for the HSM and complete are
better, being again necessary a small value of α for the average link to have
reasonable performance.
Figure 3.14 shows the p-values obtained comparing the value obtained at
each simulation with the bootstrap distribution. We confirm the fact that
the underestimation of the number of clusters has low probability, almost
zero in some cases, for the three methods. When the number of clusters to
test is the correct one, 2 in Experiment 1 and 3 in Experiment 2, the
p-values are widely distributed in the case of the complete link and HSM
method. With the average link, the p-values are smaller compared to the
other methods. In general, this test has a good performance when one uses
the complete link or the HSM method.
Experiment 1
1.0
● ●
0.8
●
● ●
●
●
●
●
●
● ●
●
0.6
●
P−values
●
●
●
●
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
●
0.2
0.0
1 vs 2 2 vs 3 1 vs 2 2 vs 3 1 vs 2 2 vs 3
Test
(a)
Experiment 2
1.0
● ●
●
● ●
●
●
●
0.6
●
P−values
●
●
●
●
●
●
●
●
0.4
●
●
0.2
●
●
●
●
●
●
0.0
●
●
2 vs 3 3 vs 4 2 vs 3 3 vs 4 2 vs 3 3 vs 4
Test
(b)
Figure 3.14: P-values obtained in the test of number of clusters using bootstrap samples.
3.5. Unknown number of clusters 85
Experiment 1 Experiment 2
Table 3.7: Proportion of times that the null hypothesis is rejected, using the permutation
test, in Experiment 1 and 2.
Let be G1 = {f11 , f21 , . . . , fn11 } and G2 = {f12 , f22 , . . . , fn22 } two clusters where
fijj are the spectral densities of the time series Xijj , members of the clusters
Gj , j = 1, 2 and ij = 1, . . . , nj . If the two clusters belong to a one bigger
cluster, G1 ∪ G2 ⊆ G, we can take a subsample of this clusters as follows.
• Finally, compute the average link function between G∗1 and G∗2 .
Then, the permutation test will take the link functions values computed
between the subsample clusters as a sample of the distribution of our statistic.
So, the test will reject the null hypothesis using the quantiles of this sample.
Table 3.7 shows, for both experiments, the proportion of times that we
reject the null hypothesis using the average link function and the permutation
test. We observe that the level of the test is improved, however, it loses some
power.
Remark. This test is not useful when the complete link function is used.
Due to the hierarchical algorithm, the maximum between the original groups
will always be bigger or equal than the maximum of any of the subsamples.
86 Chapter 3. Clustering Methods
3.6 Discussion
The use of the TV distance as a dissimilarity measure for clustering has
shown good results compared to other dissimilarity measures proposed in
the literature. In some of the experiments in the simulation study we got
the best rate of success or close to the best ones. In addition, the clusters
generated by our proposal have an intuitive interpretation in terms of real
application problems. In the case of transitions, it is still difficult to identify
the beginning or end of the transitions, however, the results are acceptable
and a good approximation of the true clusters, if we consider a transition as
a cluster. The HSM method seems not to be a good option for the detection
of transitions.
The election of the number of clusters will always be complicated.
However, the test proposed is a promising option. In particular, the HSM
method has a good performance using this test to choose the number of
clusters.
We proposed the use of time series clustering methods to detect changes
in spectra. However, the resolution of the change point detected will depend
on the time series length, since we need a reasonable number of time points
to have a good estimation of the spectral density.
The proposed methods are general for time series clustering, they can be
used to identify similarities in time and/or space. Evenmore, they can be
used to cluster any set of time series where the goal is to find similarities
in spectra. These methods were proposed in Alvarez-Esteban et al. (2016b);
Euán et al. (2015). Further details and discussion can be found in them.
Chapter 4
Applications to Data
87
88 Chapter 4. Applications to Data
1.5
1
1.0
Height (m)
0
0.5
−1
0.0
−2
Figure 4.1: One interval of the data set taken by Buoy 106. (a) Buoy at Waimea Bay,
Hawaii. (b) A 30 minute wave (centered) taken at 1.28 Hz. (c) Estimated spectrum of the
wave process.
4.5
0.08
4.0
0.073
3.5
wp (Hz)
Hs (m)
3.0
0.066
2.5
2.0
0.059
1.5
20 40 60 80
Time (Hrs)
Figure 4.2: Buoy 106, significant wave height and modal frequency of each segment.
We use raw wave height time series obtained from the U.S. Coastal Data
Information Program (CDIP) website. The data considered were measured
by a moored buoy: Buoy number 106 (number 51201 for the National Data
Buoy Center) which is located in Waimea Bay, Hawaii, with a water depth of
200 m. We consider the month of January, 2003. Figure 4.1 shows the wave
height time series corresponding to a 30 minute interval and the estimated
spectral density.
Z ∞
where m0 = f (ω)dω.
−∞
The modal frequency, ωp , is the frequency at which a wave spectrum
reaches its maximum, the inverse of the peak period. Figure 4.2 shows Hs
(black line) and ωp = T1p (blue line) computed for each time segment. From
this plot we have a description about the behavior of the waves recorded by
Buoy 106. During the first 50 hours the significant wave height stays mainly
below 2.5 m. and for a long interval it is below 2 m. Around 50 hours it
rises to about 4.5 m. and stays above 3 m. for the rest of the period. The
dominant frequency decreases from a 0.073 to 0.06 Hz, approximately.
Our goal is to find the stationary intervals and also look at the changes
in spectra between the different intervals. The procedure to analyze the data
is the following:
• If two consecutive (in time) segments are in the same cluster, then they
will be considered to be part of a stationary period.
We get two different (but similar) results, one with the complete link and
the other with the average link. Figure 4.3 shows the value of Dunn’s index
in each case. The “best” number of clusters should be the one where Dunn’s
index reaches its maximum, however, the maximum value is not always the
best choice. The two highest values were considered and, after analyzing the
results it was observed that in most cases, the second highest value gave the
best clustering (see Alvarez-Esteban et al., 2016b). So, for the complete link
function we choose 6 clusters and for the average link function we choose 5
clusters. Figures 4.4 and 4.5 show the dendrograms resulting in each case
and the resulting branches if we cut the tree at 6 and 5 clusters respectively.
As in the simulation study (Experiment 4 in Section 3.4.2), most of
the members in a cluster are contiguous segments in time, even though the
time structure plays no role in the clustering algorithm.
4.1. Ocean wave analysis 91
Dunn's Index
Buoy 106
5
4
3
Dunn
2
1
Complete
Average
0
2 4 6 8 10
Number of clusters
Figure 4.3: Dunn’s Index computed between 2 to 10 clusters for the complete and average
link functions.
53
55
1
4
54 106 22 1 28
41
43
56 48
115 23 4 49
50
57 52
51
28 35
59 117 154 46
45
47
60 41 2
147 25 3
15
63 18
43 14
36
136 34 44
64 39
48 40
42
75 179 104 6
11
49 13
Branch 4
65 17
24
110 107 21
70 50 33
26
29
83 8
112 111 52 9
12
67 37
38
119 105 51 5
68 10
27
19
35 31
58 113 148 7
32
20
61 46 16
146 169 30
22
23
45 154
25
121 171 34
104
47 107
111
143 108 105
148
2 169
171
108
137 109 109
3 120
125
0.00 0.10 0.20 98
135 120 99
153
15 100
101
157
122 125 102
18 103
156
84 158
166 98 116
14 162
160
161
89 168
149 99 36 163
164
123
92 167
165 153 44 124
150
159
155
90 39 172
173 100 126
128
129
40 106
88 175 101 115
117
147
42 136
179
91 178 157 110
112
6 119
113
177 102 146
93 121
Branch 5
Branch 3
Branch 2
Branch 1
11 143
137
135
174 103 122
96 13 166
149
165
173
97 114 156 17 175
178
177
174
134 158 24 114
94 134
118
139
21 127
118 116 152
95 131
144
33 151
139 162 130
132
140
26 133
138
127 160 141
145
29 142
170
152 161 176
53
8 55
54
56
131 168 57
59
Dendrogram with complete link function
Branch 6
32 76
186 80
79
142 126 69
20 71
73
189 74
81
170 128 16 86
77
the number of the segment (each segment is a 30 minute recording). Top: Complete
Figure 4.4: Dendogram of Buoy 106 using the complete link function, the index is
Chapter 4. Applications to Data
Height
0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.12 0.00 0.04 0.08 0.12 0.00 0.05 0.10 0.15
0.00 0.05 0.10 0.15 0.20 0.25 0.30
180 1
22 1 4
28
57 23 35
4 41
43
25 182 2
3
34 28 15
59 18
40
154 35 42
183 36
104 44
41 39
60 16
107 30
43 20
111 21
181 33
26
105 2 29
63 6
109 11
3 13
17
110 184 24
15 5
58 112 7
32
18 10
148 19
31
169 188 27
64 40 8
9
171 12
42 37
106 14
38
75 185 36 98
108 99
153
116 44 100
101
157
Branch 2
162 190 102
70 39 103
120 45
47
16 46
125 48
49
50
4.1. Ocean wave analysis
65 124 186 30 52
51
150 20 53
55
159 54
67 56
189 21 180
155 182
183
172 33 181
184
188
68 123 26 185
187 190
167 186
29 189
187
156 191
61 192
158 6 22
191 23
25
126 11 34
154
62 128 104
13 107
111
129 192 105
17 109
160 110
66 112
148
161 24 169
171
168 106
5 108
116
69 163 162
7 120
164 125
124
32 150
113 159
71 155
146 172
10 123
167
121 156
0.00 0.04 0.08 19 158
Branch 4
Branch 3
Branch 1
73 143 126
128
31 129
137 160
161
135 27 168
163
74 164
122 113
8 146
166 121
88 143
9 137
81 149 135
122
127 12 166
149
127
152 37 152
86 130
130 91 132
140
132 14 131
144
141
76 140 38 145
151
133
131 98 138
94 142
144 170
99 117
80 141 147
136
165
153
Dendrogram with average link function
145 115
119
173
85 151 95 100 175
178
133 177
101 174
114
138 134
79 157 118
142 139
176
96 102 179
170 57
59
83 117 60
103 63
58
147 64
45 75
136 70
97 65
Branch 5
84 47 67
165 68
61
115 46 62
66
69
72 119 71
48 73
173 89 74
81
49 86
175 76
78 80
178 50 85
79
83
177 92 52 84
72
174 78
77 77
51 82
114 87
88
134 53 91
94
82 90 95
118 55 96
97
139 89
54 92
90
176 93
the number of the segment (each segment is a 30 minute recording). Top: Complete
Figure 4.5: Dendrogram of Buoy 106 using the average link function, the index is
93
94 Chapter 4. Applications to Data
of the same cluster. In part (b), we show (with the corresponding color)
the estimated spectra of all members in a cluster and in black the mean
spectrum.
As we mentioned before, the clustering procedure captures the time
structure in the data using only information about the TV distance between
normalized spectral densities. In addition, using either the complete or
average link, the members in a cluster have very similar spectra and the
method is able to identify small differences between clusters. For example,
the method is able to discriminate between unimodal and bimodal spectra.
From Figure 4.6, we observe that from 0 to 27 hours, almost all segments
belong to Cluster 1 (black), just a few of the members in Cluster 1 are mixed
with Cluster 2 (red). This is reasonable since both spectra are unimodal and
the modal frequencies are close, the one from Cluster 2 being smaller than
the modal frequency of Cluster 1. Then, the members in Clusters 3 (blue), 4
(cyan) and 5 (magenta) are more mixed (in time) than the members in other
clusters. These could be related with a transition between Cluster 1 and 2.
Finally, Cluster 6 (green) has a bimodal spectrum, however we could not give
a precise interpretation because it is close to the border and one should take
a look at the following intervals.
In the case of the average link, we choose one cluster less than for the
complete link case. We observe in Figure 4.7 that the clusters between 28
and 45 hours (3 and 4 in the complete link case) merge into one cluster,
Cluster 3 (blue) in this case. However, some members of Cluster 1 (black)
appear between Cluster 4 (cyan) and 2 (red). On the other hand, the average
linkage function seems to produce clusterings that are more homogeneous in
time than those obtained using the complete link, although further research
in this respect is needed. Since, Cluster 5 in case 1 (magenta, when we use
the complete link) and Cluster 4 in case 2 (cyan, when we use the average
link) are located when Hs increases and wp is moving, we could consider this
as a transition period. So, a possible conclusion from this analysis is that
there are three stable periods: 1 - 27, 28 - 45 and 52 - 89 hours, and the
other intervals correspond to transition periods.
This methodology has been applied to a longer data series. Results
show that the method is able to detect stable intervals, during which
the distribution of the energy as a function of frequency have similar
patterns, and also allows the identification of unstable or transition periods.
This analysis gives statistical characteristics for the duration of stationary
intervals, which may vary for different periods of the year. The complete
4.1. Ocean wave analysis 95
4.5
4.0
3.5
Hs (m)
3.0
2.5
2.0
1.5
20 40 60 80
Time (Hrs)
(a)
6
5
5
4
4
3
3
2
2
1
1
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
6
5
5
4
4
3
3
2
2
1
1
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
(b)
Figure 4.6: Clustering result using the complete link function and 6 clusters
96 Chapter 4. Applications to Data
20 40 60 80
Time (Hrs)
(a)
6
5
5
4
4
3
3
2
2
1
1
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
6
5
5
4
4
3
3
2
2
1
1
0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
w (Hz) w (Hz)
(b)
Figure 4.7: Clustering result using the average link function and 5 clusters
4.2. Clustering of EEG data 97
10
0.20
5
0.10
0
−5
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50
Time (sec) w (Hz)
0.4
5
0.3
0
0.2
−5
0.1
−10
0.0
Figure 4.9: One second recording (1000 pts) of a brain signal and the estimated spectra.
participants were asked to hold still with the forearms resting on the anterior
thigh and to direct their gaze at a fixation cross displayed on the computer
monitor. Data were recorded at 1000 Hz using a high input impedance
Net Amp 300 amplifier (Electrical Geodesics) and Net Station 4.5.3 software
(Electrical Geodesics). Data were preprocessed. The continuous EEG signal
was low-pass filtered at 100 Hz, segmented into non-overlapping 1 second
epochs, and detrended. The original number of channels (256) had to be
reduced to 194 because of the presence of artifacts in channels that could not
be corrected (e.g. loose leads).
Smoothing the periodogram curves. To determine a reasonable value for
the smoothing bandwidth, we adapted the Gamma-deviance generalised cross
validation (Gamma GCV) criterion in Ombao et al. (2001) to the multi-
channel setting. We applied the Gamma GCV criterion to each channel for all
epochs. Trajectories of the Gamma GCV for each channel were very different
because this criterion depends on the shape of the estimated spectra. There
is not a common optimal bandwith for all channels. A minimum appears at
100 Chapter 4. Applications to Data
Figure 4.10: Minimum value obtained at the k-th step of the algorithm for each epoch.
a = 80. From the spectral estimation point of view, one could select a = 80
over a = 100. However, in our simulations, choosing the smaller bandwidth
results in selecting unnecessarily too many clusters. The choice of a slightly
large bandwidth, a = 100, gave better overall results.
even if, in some cases, some clusters are close to each other. The following
table shows the number of epochs where the null hypothesis (9 clusters) is
rejected.
α .01 .05 .1
0 0 2
There is not significant evidence to reject 9 clusters in any of the epochs, so,
we take 9 clusters as the number of clusters for all epochs.
Even though the number of clusters remains constant across epochs,
the cluster formation (i.e., location, spatial distribution, specific channel
memberships) of the clusters may vary across epochs. In this EEG analysis,
the total number of epochs was divided into three different phases of the
resting state: early (epoch numbers 1 to 50), middle (epochs 51-110) and
late (epochs 111-160).
In Figure 4.11, we show the “affinity matrix" which is the proportion
of epochs when a pair of channels belong to the same cluster. The (i, j)
element of the affinity matrix is the proportion of epochs such that channels
i and j are clustered together – regardless of how they cluster with other
channels. On the lower left corner of the affinity matrix, there are a few
small red squares that represent channels that are always clustered together
and completely separated from the rest.
It is evident that clustering evolved across the three phases. The affinity
matrix for the early and late phases shows darker red colors which has a
wider spread than that during the middle phase.
The next step in our analysis is to compare the clustering results across
the different phases of resting state. Since there are 50 epochs per phase,
in order to present a summary of the clustering results for each phase, we
focus only on the “representative” clustering. Using the affinity matrices
(in Figure 4.11), we consider the 9 clusters where the members remains
most of the time in the same cluster, as the representative clustering. The
procedure to get these clusters was a hierarchical cluster analysis with the
complete linkage applied to the affinity matrices (considering each matrix as
a similarity matrix).
Figure 4.12 shows formation of the clusters (location, spatial distribution
and specific channel membership) and the shape of the corresponding spectral
densities, coded in different colors, for the subject BLAK in each phase.
102 Chapter 4. Applications to Data
115
127
135
119
141
132
124
126
9
96
87
45
79
134
133
125
118
117
110
116
82
90
19
12
83
76
177
165
170
171
172
175
173
174
108
113
8
176
142
114
109
123
103
95
78
86
81
89
97
98
75
88
80
74
53
51
64
58
60
66
71
73
69
72
70
65
52
59
163
162
164
161
160
169
168
159
158
155
154
148
131
140
16
17
43
61
166
44
67
42
23
30
41
57
24
50
36
49
63
29
40
56
68
183
182
6
13
3
180
181
4
193
5
192
194
185
178
179
189
188
186
187
104
105
2
191
112
122
102
106
107
111
92
93
101
94
100
157
147
139
130
167
146
153
121
120
149
129
136
143
152
138
156
151
150
145
137
144
77
84
91
184
190
1
99
18
10
20
25
31
54
128
11
85
26
32
46
38
37
33
35
48
62
14
21
27
47
55
39
34
28
22
7
15
Affinity Matrix −− BLAK_REST1
(Number of Clusters per epoch = 9 )
190
184
1
10
11
20
18
54
2
27
21
14
38
37
1.0
33
46
26
101
100
111
94
93
92
107
102
112
122
106
85
150
145
151
156
137
129
144
138
143
136
120
121
149
139
130
147
153
167
146
0.8
157
152
128
22
15
28
7
39
55
34
47
48
35
62
187
186
188
189
179
178
185
193
192
194
5
4
3
181
180
191
6
13
41
36
0.6
49
63
29
30
23
57
24
50
56
40
68
42
43
67
166
44
105
104
99
91
84
77
31
32
25
182
183
61
59
52
65
70
64
58
0.4
72
69
51
73
66
71
60
16
17
75
81
80
88
53
74
86
78
95
103
114
113
108
123
109
142
176
8
173
172
174
175
171
170
165
0.2
177
140
131
148
155
158
159
160
168
169
154
161
162
164
163
98
97
89
90
82
83
76
19
12
116
110
117
118
125
133
134
79
45
87
96
0.0
9
132
124
126
119
141
127
115
135
135
115
127
141
119
126
124
132
9
96
87
45
79
134
133
125
118
117
110
116
12
19
76
83
82
90
89
97
98
163
164
162
161
154
169
168
160
159
158
155
148
131
140
177
165
170
171
175
174
172
173
8
176
142
109
123
108
113
114
103
95
78
86
74
53
88
80
81
75
17
16
60
71
66
73
51
69
72
58
64
70
65
52
59
61
183
182
25
32
31
77
84
91
99
104
105
44
166
67
43
42
68
40
56
50
24
57
23
30
29
63
49
36
41
13
6
191
180
181
3
4
5
194
192
193
185
178
179
189
188
186
187
62
35
48
47
34
55
39
7
28
15
22
128
152
157
146
167
153
147
130
139
149
121
120
136
143
138
144
129
137
156
151
145
150
85
106
122
112
102
107
92
93
94
111
100
101
26
46
33
37
38
14
21
27
2
54
18
20
11
10
1
184
190
Figure 4.11: The affinity matrix: proportion of epochs where channel i and j belong to
the same cluster, with 9 clusters, by segments 1-50, 51-110 and 111-160..
4.2. Clustering of EEG data 103
(a)
0.12
0.12
0.12
0.12
0.12
0.12
0.06
0.06
0.06
0.06
0.06
0.06
0.00
0.00
0.00
0.00
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz)
0.12
0.12
0.12
0.12
0.12
0.12
0.06
0.06
0.06
0.06
0.06
0.06
0.00
0.00
0.00
0.00
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz)
0.12
0.12
0.12
0.12
0.12
0.12
0.06
0.06
0.06
0.06
0.06
0.06
0.00
0.00
0.00
0.00
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz)
0.12
0.12
0.12
0.12
0.12
0.12
0.06
0.06
0.06
0.06
0.06
0.06
0.00
0.00
0.00
0.00
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz) Freq (Hz)
0.12
0.12
0.12
0.06
0.06
0.06
0.00
0.00
0.00
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
(b)
Figure 4.12: Clustering results for BLAK’s resting state during different phases: early
resting state (epochs 1-50), middle resting state (51-110) and late resting state (111-160).
a) Distribution of clusters across the cortical surface and b) Mean spectral estimates across
epochs by cluster.
104 Chapter 4. Applications to Data
Comparing the early and middle phases of resting state, we note that the
formation of clusters during these phases were heavily influenced by specific
bands: seven (out of the nine) clusters were dominated by the theta and alpha
bands; while the formation of the remaining two clusters had also influence
of the gamma band. During the late phase, the influence of the alpha band
was reduced in some of the clusters but the influence of the delta and beta
bands increased. The increased power in the beta band is interesting. This
suggests that this subject was engaged in some cognitive task (which could
not have been a response to an experimental stimulus but something that is
self-induced). An study related to evidence of a relation between the beta
band and attention disorder is shown in Barry et al. (2010), they report
decreased levels of absolute beta and gamma power during resting state in
children with attention-deficit hyperactivity disorder (ADHD), compare to
healthy controls.
The formation of the clusters at the cortical surface varies across the three
phases during resting state. In the early phase, channels at the left pre-motor
region belong to one cluster (green) and most of the channels at prefrontal
and right pre-motor region belong to another cluster (purple). However, the
clustering structure at these regions changes during the middle phase where
the channels in the pre-motor (which were originally clustered with the other
non pre-motor channels) are assigned back with the rest of the pre-motor
channels (dark blue cluster). As we transition from the middle to the late
phase, channels that were assigned to the right pre-motor reverted back to the
channels at the prefrontal region. These changes in cluster assignment was
not entirely unexpected since many of these channels lie at the boundaries
between the two anatomical regions.
Also, some channels which belong to the yellow cluster during the early
phase switched to the orange cluster during the middle phase of resting-state.
In this switch, the alpha and beta bands played the key roles. The late phase
of the resting state shows more changes. For example, three channels located
at the right occipital region switched from the yellow to the brown cluster,
due to an increase of power in the alpha band and a decrease at the gamma
band. Another interesting change appears on the prefontal region. There
we observe three of the purple-colored channels switched to light blue cluster
and a new cluster was formed. This fact lets the dark blue channels at the
middle state go back to the purple ones and the underlying process that
characterized this cluster (dark blue) changes completely its location.
While some channels displayed dynamic behavior across phases, there
4.2. Clustering of EEG data 105
are some clusters, such as the red and black, which showed consistent
membership. The red cluster is characterized by the presence of the delta,
theta bands and small activity on the gamma band while the black cluster
was dominated by the theta and alpha bands.
This subject had low improvement during the task compared with the
others. It is not possible to say if this has implications for the change
on perceptual improvement since a causal analysis was not performed, but
the presence of beta activity could produce a difference in the improvement
during the task of the individuals .
The clusters produced are consistent for the most part with the
anatomical-based parcellation of the cortical surface and thus cluster
formation based on the spectra of the EEGs can be used to recover the
spatial structure of the underlying brain process.
In addition, the HSM method has been used to analyze an epileptic seizure
data. This epileptic seizure recording captures brain activity of a subject
who suffered a spontaneous epileptic seizure while being connected to the
EEG. The recording is digitized at 100 Hz and about 500 sec long, providing
us with a time series of length T = 50000. We analyzed multichannel
electroencephalograms when they exhibit “non-stationary” behavior. Our
goal was to analyze the changes on the clustering of the EEG signals before,
during and after the epileptic seizure. Using the HSM method we observe
that only lower frequencies are mostly involved before and after the epileptic
seizure (approximately 90 seconds post seizure). In contrast, immediately
following seizure onset, the higher frequency bands dictated the clustering
distribution of the channels. Moreover, immediately following the seizure
onset but before the last subinterval, the channels were clustered similarly
but the clustering was heavily influenced by the beta and gamma frequency
bands. The complete analysis can be found in Ombao et al. (2016).
106 Chapter 4. Applications to Data
Conclusions
In this thesis we presented the proposal of using the total variation (TV)
distance as a similarity measure in a clustering method to detect similarities
in spectra.
First, we studied the theoretical properties of the TV distance between
estimated spectra. We considered two asymptotic approximations of the
distribution for the TV distance, a modified version for small sample size
and a bootstrap procedure. The asymptotic convergence strongly depends
on the election of the bandwidth, while the bootstrap procedure shows a
better approximation for all bandwidth value. We established a hypothesis
test which is able to detect differences in spectra and explored its power.
When one uses the TV distance in a clustering method, the results are
satisfactory. The rate of success of right classification is close to one and
in some cases outperforms other alternatives and in other cases it is as good
as other distances. The methods proposed are efficient and have shown good
results in the simulation experiments.
We have used the proposed methods to analyze two different data sets,
from different areas. The first analysis is related to the study of ocean waves,
where the interest is to find stationary intervals. This goal was achieved using
the TV distance in a hierarchical clustering algorithm, where segments in the
same cluster and contiguous in time were considered as a stationary period.
The second analysis is related to the study of brain signals, here we used the
HSM method to detect channels of a dense EEG array that were spectrally
synchronized. In both cases of study the results are good.
In general, the proposed methods have shown a good performance to
detect similarities in spectra, and they can also be seen as methods to detect
changes. The methodologies do not require very high computational time
when one analyzes long time series or several time series. Even though we
just explored two different applications, the methods are not limited to those
problems and could be used in other areas.
107
The work performed in this project provides many possible directions for
future research. They include:
109
Appendix A
R Codes
Description:
Computes the total variation distance between f1 and f2 with respect
to the values w using the trapezoidal rule.
Arguments:
w - Sorted vector of w values.
f1,f2 - Numeric vectors with the values of f1(w) and f2(w) which
111
112 Appendix A. R Codes
Description:
One-side estimated spectrum using a lag window estimador with a
parzen window.
Usage:
spec.parzen(x, a = 100, dt = 1, w0 = 10^(-5), wn = 1/(2 * dt), nn
= 512)
Arguments
x - Time series.
a - Bandwidth value.
Value
A matrix of 2 columns and nn rows, where the first column corresponds
to the grid of frequencies and the second column corresponds to the
spectrum at those frequencies.
Examples.
f2<-dnorm(w,2.5,.5)
diss<-TVD(w,f1,f2)
plot(w,f1,type="l",lwd=2,col=2,main=paste("TVD =",round(diss,3)),
xlab="x",ylab="")
lines(w,f2,col=3,lwd=2)
A.2 Methods
We present the code related to the example in Chapter 3.
Example 1.
#################
set.seed(2786)
library(HSMClust)
normaliza<-function(f,w){
nor<-((w[2:length(w)]-w[1:(length(w)-1)])%*%
(f[2:length(w)]+f[1:(length(w)-1)])/2)
return(f/nor)
}
#################
# Simulated Data
M<-1.05
eta1<-.053
114 Appendix A. R Codes
eta2<-.06
Time<-1000
k<-2
nk<-3
X<-matrix(0,nrow=Time,ncol=k*nk)
for(i in seq(1,k*nk,2))X[,i]<-Sim.Ar(Time,eta1,M)
for(i in seq(1,k*nk,2)+1)X[,i]<-Sim.Ar(Time,eta2,M)
# 3- Results
clus<-slice(as.dendrogram(arbol),k=2)
clus2<-slice(as.dendrogram(arbol2),k=2)
# HSM Method
ClustHSM<-HSM(X)
cutk(HSM,2)
A.2. Methods 115
Description:
Compute the hierarchical merger clustering algorithm or
the hierarchical spectral merger clustering algorithm
for a set of time series X.
Usage:
HSM(X, freq = 1, Merger = 1, par.spectrum = c(100, 1/(2 * dt), 512))
Arguments
Value:
A HSM object with the following variables:
Diss.Matrix = Initial dissimilarity matrix.
min.value = trayectory of the minimum value.
Groups = list with the groupping structure at each step.
116 Appendix A. R Codes
Description:
Returns k groups from a HSM object.
Usage
cutk(Clust, kg = NA, alpha = NA)
Arguments
kg - Number of groups.
where dt = 1/Fs and δ(u) is the impulse function or Dirac delta function,
which satisfies that
Z
δ(u) = 0, if u 6= 0 and δ(u)du = 1.
Then, the spectral density of the discrete signal can be written as a folding
of the original spectral density,
∞
X
fd (ω) = f (ω − m/dt),
m=−∞
117
118 Appendix B. Effect of Sampling Frequency
when the sampling frequency is different to one (see Mandal and Asif, 2007).
Consider the Fourier transform of (B.1), using the following calculation. Let
X̂d (ω) be the Fourier transform of the discrete signal, X̂(ω) the Fourier
transform
P∞ of the continuous signal X(t) and ĝ(ω) the Fourier transform of
m=−∞ δ(t − mdt). Then,
To obtain (B.2), we have to use the Fourier transform of δ(t − mdt) which is
2π 2π
δ ω−m .
dt dt
Aage, C., Allan, T., Carter, D., Lindgren, G., and Olagnon, M. (1999). Oceans from
Space: A textbook for offshore engineers and naval architects. Edition Ifremer.
Barry, R., Clarke, A., Hajos, M., McCarthy, R., Selikowitz, M., and Dupuy,
F. (2010). Resting-state eeg gamma activity in children with attention-
deficit/hyperactivity disorder. Clinical Neurophysiology, 121(11):1871–1877.
Brillinger, D. R. (1981). Time series: data analysis and theory. Holden-Day, Inc.,
Oakland, Calif., second edition.
119
120 Bibliography
Brodtkorb, P. A., Johannesson, P., Lindgren, G., Rychlik, I., Rydén, J., and Sjö,
E. (2011). WAFO - a matlab toolbox for analysis of random waves and loads.
Mathematical Statistics, Centre for Mathematical Sciences, Lund University.
Euán, C., Ombao, H., and Ortega, J. (2015). Spectral synchronicity in brain
signals. arXiv:1507.05018v1.
Bibliography 121
Gavrilov, M., Anguelov, D., Indyk, P., and Motwani, R. (2000). Mining the stock
market: which measure is best. In In proceedings of the 6 th ACM International
Conference on Knowledge Discovery and Data Mining, pages 487–496.
Hasselmann, K., Barnett, T., Bouws, E., Carlson, H., Cartwright, D., Enke,
K., Ewing, J., Gienapp, H., Hasselmann, D., Kruseman, P., Meerburg,
A., Mller, P., Olbers, D., Richter, K., Sell, W., and Walden, H. (1973).
Measurements of wind-wave growth and swell decay during the joint north sea
wave project (jonswap). Deutschen Hydrographischen Zeitschrift 12, Deutsches
Hydrographisches Institut Hamburg.
Lachiche, N., Hommet, J., Korczak, J., and Braud, A. (2005). Neuronal clustering
of brain fmri images. Pattern Recognition and Machine Intelligence: Lecture
Notes in Computer Science, 3776:300–305.
Lavielle, M. and Ludeña, C. (2000). The multiple change-points problem for the
spectral distribution. Bernoulli, 6(5):845–869.
Leone, F. C., Nelson, L. S., and Nottingham, R. B. (1961). The folded normal
distribution. Technometrics, 3(4):543–550.
Mandal, M. and Asif, A. (2007). Continuous and discrete time signals and systems.
Cambridge University Press, New York, first edition.
Montero, P. and Vilar, J. (2014). TSclust: An R package for time series clustering.
Journal of Statistical Software, 62(1).
Ochi, M. K. (1998). Ocean waves: the stochastic approach. Cambridge, U.K. ; New
York : Cambridge University Press.
Ombao, H., Schröder, A. L., Euán, C., Ting, C.-M., and Samdin, B. (2016).
Handbook of Neuroimaging Data Analysis, chapter Advanced topics for modeling
electroencephalograms (to appear), pages 567–621. Chapman & Hall/CRC
Handbooks of Modern Statistical Methods. Taylor & Francis.
Ombao, H., von Sachs, R., and Guo, W. (2005). Slex analysis of multivariate
nonstationary time series. Journal of the American Statistical Association,
100(470):519–531.
Ombao, H. C., Raz, J. A., Strawderman, R. L., and Sachs, R. V. (2001). A simple
generalised crossvalidation method of span selection for periodogram smoothing.
Biometrika, 88(4):1186–1192.
Preuss, P., Vetter, M., and Dette, H. (2013). Testing semiparametric hypotheses
in locally stationary processes. Scandinavian Journal of Statistics. Theory and
Applications, 40(3):417–437.
Priestley, M. B. (1981). Spectral analysis and time series. Vol. 1. Academic Press,
Inc. [Harcourt Brace Jovanovich, Publishers], London-New York. Univariate
series, Probability and Mathematical Statistics.
Shumway, R. H. and Stoffer, D. S. (2011). Time series analysis and its applications.
With R examples. Springer, New York, third edition.
Wu, J., Srinivasan, R., Kaur, A., and Cramer, S. C. (2014). Resting-state cortical
connectivity predicts motor skill acquision. NeuroImage, 91:84–90.