0% found this document useful (0 votes)

45 views26 pages

Quantization Clustering Banach

This document discusses L1-quantization and clustering in Banach spaces. It begins with an introduction to clustering and quantization. It then establishes: 1) The existence of an optimal quantization of a random variable with distribution μ in a Banach space H, with respect to the L1 distance. 2) Several estimators of the optimal quantizer in potentially infinite-dimensional H, along with associated algorithms. 3) Practical results obtained by applying these methods to real-life data sets.

Uploaded by

Mario Cabrera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views26 pages

Quantization Clustering Banach

Uploaded by

Mario Cabrera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

L1-quantization and clustering in Banach spaces

Thomas LALOË
Institut de Mathématiques et de Modélisation de Montpellier
UMR CNRS 5149, Equipe de Probabilités et Statistique
Université Montpellier II, Cc 051
Place Eugène Bataillon, 34095 Montpellier Cedex 5, France
[email protected]

Abstract
Let X be a random variable with distribution µ taking values in
a Banach space H. First, we establish the existence of an optimal
quantization of µ with respect to the L1 -distance. Second, we propose
several estimators of the optimal quantizer in the potentially infinite-
dimensional space H, with associated algorithms. Finally, we discuss
practical results obtained from real-life data sets.

Key-words and phrases: Quantization, clustering, L1 -distance, Banach

space.

1 Introduction
Clustering consists in partitioning a data set into subsets (or clusters), so
that the data in each subset share some common trait. Proximity is de-
termined according to some distance measure. For a thorough introduction
of the subject, we refer to the book by Kaufman and Rousseeuw [14]. The
origin of clustering goes back to 45 years ago, when some biologists and so-
ciologists began to search for automatics methods to build different groups
with their data. Today, clustering is used in many fields. For example,
in medical imaging, it can be used to differentiate between different types
of tissue and blood in a three dimensional image. Market researchers use
it to partition the general population of consumers into market segments
and to better understand the relationships between different groups of con-
sumers/potential customers. There are also many different applications in
artificial intelligence, sociology, medical research, or political sciences.

1
In the present paper, the clustering method we investigate lays on the
technique of quantization, commonly used in signal compression (Graf and
Luschgy [12], Linder [17]). Given a normed space (H, k.k), a codebook (of
size k) is defined by a subset C ⊂ H with cardinality k. Then, each x ∈ H
is represented by a unique x̂ ∈ C via the function q,

q: H →C
x → x̂,

which is called a quantizer. Here we come back to the clustering, as we

create clusters in the data by regrouping the observations which have the
same image by q.

Denote by d the distance induced by the norm on H:

d : H × H → R+
(x, y) → kx − yk.

In this paper, observations are modeled by a random variable X on H, with

distribution µ. The quality of the approximation of X by q(X) is then given
by the distortion E d X, q(X) . Thus the aim is to minimize E d X, q(X)
among all possible quantizers. However, in practice, the distribution µ of
the observations is unknown, and we only have at hand n independent ob-
servations X1 , . . . , Xn with the same distribution than X. The goal is then
to minimize the empirical distortion:
n
1X
d(Xi , q(Xi )).
n
i=1

Since the early work of Hartigan [13] and Pollard [19, 20, 21], the perfor-
mances of clustering have been considered by many authors. Convergence
properties of the minimizer qn∗ of the empirical distortion have been mostly
studied in the case when H = Rd . Consistency of qn∗ was shown by Pollard
[19, 21] and Abaya and Wise [1]. Rates of convergence have been considered
by Pollard [20], Linder, Lugosi, and Zeger [18], Linder [17].

As a matter of facts, in many practical problems, input data items are in the
form of random functions (speech recordings, spectra, images) rather than
standard vectors, and this casts the clustering problem into the general class
of functional data analysis. Even though in practice such observations are
observed at discrete sampling points, the challenge in this context is to infer
the data structure by exploiting the infinite-dimensional nature of the obser-
vations. The last few years have witnessed important developments in both
the theory and practice of functional data analysis, and many traditional
data analysis tools have been adapted to handle functional inputs. The

2
book by Ramsay and Silverman [22] provides a comprehensive introduction
to the area. Recently, Biau, Devroye, and Lugosi [2] gave some consistency
results in Hilbert spaces and with a L2 -based distortion.

Thus, the first novelty in this paper is to consider data taking place in a
separable and reflexive Banach space, with no restriction on their dimen-
sion. The second novelty is that we consider a L1 -based distortion, which
leads to more robust estimators. For a discussion of the advantage of the
L1 -distance we refer the reader to the paper by Kemperman [15].

This setup calls for substantially different arguments to prove results which
are known to be true when considering finite dimensional spaces and a L2 -
based distortion. In particular, specific notions will be required, such as
weak topology (Dunford and Schwartz [10]), lower semi-continuity (Ekeland
and Temam [10]) and entropy (Van der Vaart and Wellner [23]).

The document is organized as follows. We first provide the formal context

of quantization in Banach space in the first part of Section 2. Then, we
focus on the problem of the existence of an optimal quantizer. In Sections 3
and 4 we study two consistent estimators of this optimal quantizer, and we
confront them to real-life data in Section 5. Proofs are collected in Appendix
A.

2 Quantization in a Banach space

2.1 General framework
The fact that the closed bounded balls are not compact is a major problem
when considering infinite dimensional spaces. To overcome this, the classi-
cal solution is to consider reflexive spaces, i.e., spaces in which the closed
bounded balls are compact for the weak topology (Dunford and Schwartz
[9]). Thus, throughout the document, (H, k.k) will denote a reflexive and
separable Banach space. We let X be a H-valued random variable with
distribution µ such as EkXk < ∞.

Given a set C = {yi }ki=1 of points in Hk , any Borel application q : H → C is

called a quantizer. The set C is called a codebook, and the yi , i = 1, . . . , k
are the centers of C. The error made by replacing X by q(X) is measured
by the distortion:
Z
D(µ, q) = E d(X, q(X)) = kx − q(x)kµ(dx).
H

Note that D(µ, q) < ∞ since EkXk < ∞. For a given k, the aim is to
minimize D(µ, .) among the set Qk of all possible k-quantizers. The optimal

3
distortion is then defined by

Dk∗ (µ) = inf D(µ, q).

q∈Qk

When it exists, a quantizer q ∗ satisfying D(µ, q ∗ ) = Dk∗ (µ) is said to be an

optimal quantizer.

Any quantizer is characterized by its codebook C = {yi }ki=1 and a partition

of H in cells Si = {x ∈ H : q(x) = yi }, i = 1, . . . , k via the rule

q(x) = yi ⇐⇒ x ∈ Si .

Thus, from now on, we will define a quantizer by its codebook and its cells.

Let us consider the particular family of Voronoi partitions, constructed by

the nearest neighbors rule. That is, for each center of the codebook, a
cell is constituted by the elements x ∈ H which are the closest to him
(Gersho and Gray [11]). A quantizer with such a partition is named a
nearest neighbor quantizer, and we denote by Qknn the set of all k-nearest
neighbor quantizers. It can be easily proven (see Lemma 1 in Linder [17])
that
inf D(µ, q) = inf D(µ, q).
q∈Qk q∈Qknn

More precisely, given two quantizers q ∈ Qk and q 0 ∈ Qknn with the same
codebook, we have
D(µ, q 0 ) ≤ D(µ, q).
Therefore, in the following, we will restrict ourselves to nearest neighbor
quantizers.

A complementary result (see Lemma 2 in Linder [17]) is that for a quantizer

q with codebook C and partition S, a quantizer q 0 with the same partition
but with a codebook defined by

yi0 ∈ arg min E [kX − yk | X ∈ Si ] , i = 1, . . . , k,

y∈H

satisfies
D(µ, q 0 ) ≤ D(µ, q).
From the two previous optimality results, on the codebook and associated
partition, we can derive a simple algorithm in order to find a good quantizer.
This algorithm is called the Lloyd algorithm and based on the so-called Lloyd
iteration (Gersho and Gray [11], Chapter 6). The outline is as follows:

1. Choose randomly an initial codebook;

4
2. Given a codebook Cm , build the associated Voronoi partition;

3. Build Cm+1 , the optimal codebook for the previous partition;

4. Stop when the distortion no longer decreases.

Unfortunately, this algorithm has two drawbacks: it depends on the initial

codebook chosen, and it does not necessarily converge to the optimal distor-
tion. In Section 4 we will discuss an alternative to this algorithm, leading
to an optimal quantizer.

2.2 Existence of an optimal quantizer

The aim of this section is to show that the minimization problem of D(µ, q)
has at least one solution. Recall that we consider only nearest neighbor quan-
tizers, which can be entirely characterized by their codebook (y1 , . . . , yk ),
and set yk = (y1 , . . . , yk ).

We denote by
D(µ, q) = D(µ, yk )
the associated distortion. Therefore our first task is to prove that the func-
tion D(µ, .) has at least one minimum, or, in other words, that there exists
at least one optimal codebook.

Theorem 2.1 Assume that H is a reflexive and separable Banach space.

Then, the function D(µ, .) admits at least one minimum.

Theoretically speaking, it is of interest to search for an optimal quantizer.

To make the link with clustering, Theorem 2.1 states that there exists at
least one optimal repartition of the space H in different clusters. The next
step is to consider the statistical case, in which the distribution of X is
unknown.

3 A consistent estimator
3.1 Construction and consistency
In a statistical context, the distribution µ of X is unknown and we only
have at hand n random variables, X1 , . . . , Xn , independent and distributed
as X. Let the empirical measure µn be defined as
n
1X
µn (A) = 1[Xi ∈A] ,
n
i=1

5
for any measurable set A ⊂ H. For any quantizer q, the associated empirical
distortion is then given by
n
1X
D(µn , q) = kXi − q(Xi )k.
n
i=1

An (empirical) quantizer qn∗ = qn∗ (., X1 , . . . , Xn ) satisfying

n
X
qn∗ ∈ arg min kXi − q(Xi )k
q∈Qk i=1

is said to be empirically optimal. In particular, if we set (with a slight abuse

of notation)
D(µ, qn∗ ) = E [kX − qn∗ (X)k | X1 . . . , Xn ] ,
we have
D(µn , qn∗ ) = Dk∗ (µn ).
From Theorem 2.1, we know that for every n, an empirically optimal quan-
tizer always exists.

The following theorem, which is an adaptation of Theorem 2 in Linder [17],

establishes the asymptotic optimality of the quantizer qn∗ with respect to the
distortion.

Theorem 3.1 Assume that H is a reflexive and separable Banach space,

and set k ≥ 1. Then, any sequence of empirically optimal k-quantizers
(qn∗ )n≥1 satisfies

lim D(µ, qn∗ ) = Dk∗ (µ) a.s.

n→∞

and
lim D(µn , qn∗ ) = Dk∗ (µ) a.s.
n→∞

3.2 Rate of convergence

Most results in the literature concern the situation when H = Rd and the
distortion is a L2 -based one (Pollard [20], Linder, Lugosi, and Zeger [18],
Linder [17]). For example, it is shown in [17] that if there exists T > 0 such
that P[kXk ≤ T ] = 1, then
r
k(d + 1) ln(k(d + 1))
E D(µ, qn∗ ) − D∗ (µ) ≤ CT 2 ,
n
where C > 0 is a universal constant.

6
Recently, Biau, Devroye, and Lugosi [2] proved that when H is an Hilbert
space, and the distortion is a L2 -based one, then
k
E D(µ, qn∗ ) − D∗ (µ) ≤ C √ ,
n

where C > 0 is a universal constant.

In the sequel, our goal is to establish a rate of convergence in a Banach space

and with a L1 -criterion. This will require some new notions.

Let P(H) be the set of all probability measures on H.

Definition 3.1 Let p ∈ [1, ∞[.

1. The Lp -Wasserstein distance between φ, ξ ∈ P(H) is defined by:

p p1
ρp (φ, ξ) = inf E d X, Y .
X∼φ,Y ∼ξ

2. A probability φ ∈ P(H) satisfies a transportation inequality Tp (λ) if

there exists λ > 0 such that, for any probability ξ ∈ P(H),
r
p 2
ρp (φ, ξ) ≤ H(ξ|φ),
λ
Z
dξ dξ
where H(ξ|φ) = log dφ is the Kullback information be-
H dφ dφ
tween φ and ξ.

Remarks:

• The Lp -Wasserstein distance, also called Lp -Kantorovich distance, is

known to be appropriate for the quantization problem (Graf and Luschgy,
Section 3 [12]);

• For this choice of distance, in view of getting rates of convergence,

the so-called transportation inequalities, or Talagrand inequalities, are
well designed (Ledoux [16]).

Generally speaking, it is a difficult task to determine whether a probability

µ ∈ P(H) satisfies a transportation inequality Tp (λ). However, the problem
is simpler when p = 1, as expressed in the theorem below proven in Djellout,
Guillin, and Wu [7] (Theorem 2.3 and Section 1).

7
Theorem 3.2 A probability φ ∈ P(H) satisfies a transportation inequality
T1 (λ) if and only if, for all α < λ/2,
Z
2
eαkx−yk dµ(x) < ∞
H

for one (and therefore for all) y in H.

In the sequel, we will only consider the case p = 1, and we set ρ = ρ1 . For
any set Λ ⊂ H, let P(Λ) be the set of all probability measures on Λ. Let
also N (r, Λ) be the smallest number of balls of radius r (for the metric ρ)
required to cover P(Λ), that is

N (r, Λ)

Sn
= inf n ∈ N s.t. ∃x1 , . . . , xn ∈ P(Λ) : i=1 BP(Λ) (xi , r) ⊃ P(Λ) ,

where BP(Λ) (xi , r) is the ball in P(Λ) centered at xi and with radius r (for
the metric ρ). The quantity ln(N (r, Λ)) is the entropy of P(Λ) (Van der
Vaart and Wellner [23]).

In the same way, let N (r, Λ) be the smallest number of balls of radius r/2
required to cover Λ, with respect to the metric of H.

In order to state a rate of convergence for D∗ (µn ), we introduce the following

assumptions:
H1: There exists λ > 0 such that µ satisfies a transportation inequality
T1 (λ);
H2: Any closed bounded ball B ⊂ H is totally bounded. That is, for
all r > 0, N (r, B) is finite.

Note that H1 is satisfied for paths of stochastic differential equations

dXt = b(Xt )dt + s(Xt )dWt ,

where t ∈ [0, T ], T < ∞, and b(.), s(.) satisfy suitable properties (Djellout,
Guillin and Wu [7], Corollary 4.1). H2 is satisfied, for example, if H is a
Sobolev space on a compact domain of Rd (Cucker and Smale [6], example 3).

From now on, BR stands for the ball of center 0 and radius R in H. Ac-
cording to assumption H2 and Theorem A.1 in Bolley, Guillin, and Villani
[4], there exists a positive constant C such that for all r, R > 0,

CR N (r/2,R)

N (r, BR ) ≤ . (3.1)
r

8
Theorem 3.3 Assume that H is a reflexive and separable Banach space,
and H1, H2 are satisfied. Then, for all λ0 < λ and ε > 0, there exist three
1/2
positive constants K, γ, and R1 such that if R = R1 max 1, ε2 , ln 1/ε2
and n ≥ K ln (N (γε, BR )) /ε2 , we have:
0 2
P [ρ(µ, µn ) ≥ ε] ≤ e−(λ /2)nε .

Using the inequality

D(µ, qn∗ ) − D∗ (µ) ≤ 2ρ(µ, µn ),
we deduce the following corollary.
Corollary 3.1 Assume that H is a reflexive and separable Banach space,
and H1, H2 are satisfied. Then, for all λ0 < λ and ε > 0, there exist three
1/2
positive constants K, γ, and R1 such that if R = R1 max 1, ε2 , ln 1/ε2
and n ≥ K ln (N (γε, BR )) /ε2 , we have:
0 2
P [D(µ, qn∗ ) − D(µ, q ∗ ) ≥ ε] ≤ e−(λ /8)nε .
Let R be the function from R∗+ to R∗+ defined by
1/2
R(x) = R1 max 1, x2 , ln 1/x2 ,
and denote M the function from R∗+ to R∗+ defined by
M(x) = K ln N (γx, BR(x)) /x2 .

(3.2)
Theorem 3.4 below gives us the desired rate of convergence.
Theorem 3.4 Assume that H is a reflexive and separable Banach space,
H1, H2 are satisfied, and M is invertible on some interval ]0, a]. Then,
there exists C0 > 0 such that
E D (µ, qn∗ ) − D (µ, q ∗ ) ≤ C0 max(M−1 (n), n−1/2 ).
Note there is no restriction on the support of µ. In particular, we do not
require that the support of µ is bounded. This is an important point, since
such an assumption is not verified, for example, by the distributions of clas-
sical diffusion processes, yet widely used in stochastic modeling.

Example: Suppose that assumptions H1 and H2 are satisfied. Consider

the example 3 in Cucker and Smale [6], in which H is a Sobolev space on a
compact domain set of Rd . Using the entropy of the balls BR ⊂ H (Cucker
and Smale [6]) and Theorem 3.4, we have
C
E D (µ, qn∗ ) − D (µ, q ∗ ) ≤ ,
(ln n)s/d
where C is a positive constant.

9
3.3 Algorithm
Calculating qn∗ appears to be a N P -complete problem. In order to approxi-
mate qn∗ one can adapt the Lloyd Lloyd algorithm, which has been presented
in Section 2, to the statistical context in which we use µn instead of µ. More-
over, rather to calculate empirical medians in each cell, a possible solution is
to consider medoids, i.e., centers taken within the sample {X1 , . . . , Xn }. For
more details about the Lloyd algorithm and medoids, we refer the reader to
the book by Kaufman and Rousseeuw [14].

However, this Lloyd algorithm with medoids has the same drawbacks as the
Lloyd algorithm presented in section 2: non optimality and dependence on
initial codebook. Thus, in the next section, we will present a new estimator,
in order to overcome these drawbacks.

4 Minimization on data
4.1 Construction and Consistency
The basic idea of the estimator presented in this section consists in search-
ing the minimum of the empirical distortion D(µn , .) within the sample
{X1 , . . . , Xn }. It is a generalization of a method of Cadre [5], who consid-
ered the case k = 1 only. Formally, our estimator yk,n ∗ = (y ∗ , . . . , y ∗ ) is
1,n k,n
defined by
∗
yk,n ∈ arg min D(µn , z).
z∈{X1 ,...,Xn }k

Note k.kk a norm on Hk (as an example, for z = (z1 , . . . , zk ) ∈ Hk , kzkk =

max kzi k), and BHk (z, r) the associated closed ball in Hk centered at z
i=1,...,k
and with radius r.

Theorem 4.1 Assume that H is a reflexive and separable Banach space,

and there exists yk∗ an optimal codebook for µ, which satisfies

∀ε > 0, P [(X1 , . . . , Xk ) ∈ BHk (yk∗ , ε)] > 0. (4.1)

Then,
∗
lim D(µ, yk,n ) = Dk∗ (µ) a.s.
n→∞

Remark: The condition (4.1) in Theorem 4.1 simply requires that the prob-
ability that k observations fall in the neighborhood of yk∗ is not zero. The
necessity of this condition is easy to understand. Indeed, suppose there exists
ε > 0 such that for all optimal codebook yk∗ for µ, (X1 , . . . , Xk ) ∈
/ BHk (yk∗ , ε)
with probability 1. Then, by construction, D(µ, yk,n ∗ ) can not converge to
∗
Dk (µ).

10
Theorem 4.2 Assume that H is a reflexive and separable Banach space,
and (4.1), H1 and H2 hold. Then, we have
∗
lim ED(µ, yk,n ) = Dk∗ (µ).
n→∞

4.2 Rate of convergence

The next theorem states that D(µn , yn,k ∗ ) converges to D ∗ (µ) at the same
k
∗
rate as Dk (µn ). Remember that the function M is defined in (3.2), and let
yk∗ be an optimal codebook for µ. For ε > 0 we set
f (yk∗ , ε) = P (X1 , . . . , Xk ) ∈ BHk (yk∗ , ε) .

We also introduce the assumption:

H3: There exist a decreasing function V : N∗ → R∗+ and positive
constants u, v, C such that
Z u Z +∞
∗ bn/kc ∗ bn/kc
max (1 − f (yk , ε)) dε, (1 − f (yk , ε)) dε ≤ V (n).
0 v

Theorem 4.3 Assume that H is a reflexive and separable Banach space,

and H1 and H2 are satisfied. Let yk∗ be an optimal codebook for µ satisfying
H3. Then, if M is invertible on some interval ]0, b] there exists a positive
constant C0 such that, for n large enough,
∗
− Dk∗ (µ) ≤ C0 max(M−1 (n), V (n), bn/kc−1/2 ).

ED µ, yk,n

Remarks:
• Assumption H3 is a necessary one. Indeed, we will see in the proof
of Theorem 4.3 that if H3 is not checked, there exists no decreasing
function V1 : N∗ → R∗+ such that
∗
− Dk∗ (µ) ≤ V1 (n).

E D µ, yk,n

• Assumption H3 is satisfied if the following assumptions hold:

H4: There exists c1 > 0 such that f (yk∗ , ε) ≥ 1 − exp(−ε2 ) for
ε ∈]0, c1 ];
H5: There exists c2 > 0 such that f (yk∗ , ε) ≥ 1 − exp(−ε2 ) for
ε ∈ [c2 , +∞[.
• Assume that H4 and H5 are satisfied. Then we have
∗
− Dk∗ (µ) ≤ C0 max(M−1 (n), bn/kc−1/2 ).

ED µ, yk,n
∗ ) converges to D ∗ (µ) at the same rate as D ∗ (µ ).
That is, D(µn , yn,k k k n

• Assumption H5 is satisfied if µ has a bounded support.

11
4.3 Algorithms
∗ , we provide an algorithm which we will call Alter
In order to calculate yk,n
algorithm. The outline is the following:

1. List all possible codebooks, i.e., all possible k-tuple of data;

2. Calculate the empirical distortion associated to the first codebook;

3. For each successive codebook, calculate the associated empirical dis-

tortion. Each time a codebook has an associated empirical distortion
smaller than the previous, store the codebook;

4. Return the codebook which has the smallest distortion.

This algorithm overcomes the two drawbacks of the Lloyd algorithm: it does
not depend on initial conditions and it converges to the optimal distortion.
Unfortunately its complexity is o(nk ) and it is impossible to use it for high
values of n or k.

In order to overcome this complexity problem, we define the Alter-fast iter-

ation, working as follows:

1. Select randomly n1 < n data in the whole data set (n1 should be
small);

2. Run the Alter algorithm on these n1 data (empirical distortions should

be calculated using the whole data set);

3. Store the obtained codebook.

Then we derive an accelerated version of the Alter algorithm, which we call

Alter-fast algorithm. The outline is the following:

1. Run n2 times the Alter-fast iteration (n2 should be high);

2. Select, among all the obtained codebooks, the one which minimizes
the associated empirical distortion (calculated using the whole data
set).

The Alter-fast algorithm provide a usable alternative for the Alter algorithm,
in the same way as the Lloyd algorithm using medoids was an alternative to
the Lloyd algorithm. Its complexity is o(n2 × nk1 ). We will see in the next
section that the Alter-fast algorithm seems to perform almost as well as the
Alter algorithm on real-life data.

12
5 Application: speech recognition
Here we use a part of the TIMIT database (http://www-stat.stanford.edu/
∼tibs/ElemStatLearn/). The data are log-periodograms corresponding to
recording phonemes of 32 ms duration. We are interested in the discrimi-
nation of five speech frames corresponding to five phonemes transcribed as
follows: “sh” as in “she” (872 items), “dcl” as in “dark” (757 items), “iy”
as the vowel in “she” (1163 items), “aa” as the vowel in “dark” (695 items)
and “ao” as the first vowel in “water” (1022 items). The database is a multi
speaker database. Each speaker is recorded at a 6 kHz sampling rate and
we retain only the first 256 frequencies (see Figure 1).

Figure 1: A sample of log-periodograms for fives phonemes.

Thus the data consist of 4509 series of length 256. We compare here the
Lloyd and Alter-fast algorithms. We split the data into a learning and a
testing set. The quantizer is constructed using only the first set and its per-
formance (i.e., the rate of good classification) is evaluated from the second
one. We give the rates of good classification associated to the codebooks
selected by the Lloyd and and Alter-fast algorithms in Table 1. Recall that,
for each center, a cluster includes the data which are closer to this center
than to any other. Moreover we give the variance induced by the dependence

13
on initial conditions: the initial codebook for the Lloyd algorithm, and the
successive reduced data set for the Alter-fast algorithm. We note that the re-
sults of the Alter-fast algorithm are better than those of the Lloyd algorithm.

Algorithm Rate of good classification

Lloyd 0.80 (var=0.0047)
Alter-fast 0.84 (var=0.00014)

Table 1: Rate of good classification with the five phonemes.

The phonemes “ao” and “aa” appear to be particularly difficult to classify.

To illustrate this phenomenon, we confront the Lloyd, Alter, and Alter-fast
algorithms on these two phonemes only. The rates of good classification are
given in Table 2 (note that we gave no variance for the Alter algorithm,
since it does not depends on any initial condition). As expected, the results
are not satisfactory. We note however that the Alter algorithm results are
more reliable than the Lloyd algorithm ones, and that the rates of good
classification obtained from the Alter and Alter-fast algorithms are almost
equivalent. We also note that we improve over the results of Bleakley [3]
(Chapter 2), who is using different SVM algorithms in a supervised learning
context.

Algorithm Rate of good classification

Lloyd 0.64 (var=0.0031)
Alter 0.71
Alter-fast 0.68 (var=0.00015)
Max. bin. kernel [3] 0.61
Min. bin. kernel [3] 0.63

Table 2: Rate of good classification of phonemes “aa” and “ao”.

Finally, we provide a similar study by removing the phonemes “ao”from

the database (see Table 3). The results are significantly better than those
obtained with the whole database.

Algorithm Rate of good classification

Lloyd 0.87 (var=0.0032)
Alter-fast 0.90 (var=0.0001)

Table 3: Rate of good classification without the phoneme “ao”.

14
6 Conclusion
This paper thus provided an answer to the problem of functional L1 -
clustering: we first proved that for any measure µ ∈ P(H) with finite
moment, an optimal quantization always exists (Theorem 2.1). Then we
proposed a consistent estimator of q ∗ (Theorem 3.1), and we state its rate
of convergence (Theorem 3.4). In order to offset the main drawbacks of the
Lloyd algorithm, we then proposed the Alter algorithm and its accelerated
version, the Alter-fast algorithm. Finally, a confrontation of our algorithms
on real-life data states the practical suitability of our theoretical results.

One of the most interesting points in our results is that the assumptions
we make are as light as possible. For example, we made no restriction on
the support of µ, and the assumptions H1, H2 are satisfied in classical
stochastic modeling.

15
A Appendix: Proofs
A.1 Proof of Theorem 2.1
Before we prove Theorem 2.1, we will need to introduce the following defi-
nition.

Definition A.1 A function φ : H → R̄ is called lower semi-continuous for

the weak topology (abbreviated weakly l.s.c.) if it satisfies one of the following
equivalent conditions:

(i) ∀t ∈ R, {u ∈ H : φ(u) ≤ t} is closed for the weak topology.

w
(ii) ∀ū ∈ H, limwinf φ(u) ≥ φ(ū) (where → note the weak convergence in
u→ū
H).

For a proof of this equivalence and of the following proposition, we refer the
reader to the book by Ekeland and Temam [10].

Proposition A.1 With the notation of Definition A.1, the two following
properties hold:

(i) If φ is continuous and convex, then it is weakly l.s.c.

(ii) If φ is weakly l.s.c. on a set Λ which is compact for the weak topology,
then φ has a minimum on Λ.

Lemma A.1 is a straightforward adaptation of the results proven in the first

part of the proof of Theorem 1 in Linder [17].

Lemma A.1 There exists A > 0 and ` ≤ k such that

inf D(µ, yk ) = inf D(µ, y` ).

yk ∈Hk `
y` ∈BA

For all x in H, we define the functions gi,x : Hk → R and gx : Hk → R by:

gi,x (yk ) = kx − yi k,

and
gx (yk ) = min gi,x (yk ).
i=1,...,k

Lemma A.2 For any x in H, the function gx is weakly l.s.c. on Hk .

Proof of Lemma A.2 For each x in H, the functions gi,x are continuous
and convex, thus they are weakly l.s.c. according to Proposition A.1. For
all t in R, the sets n o
yk ∈ Hk : gi,x (yk ) ≤ t

16
are then weakly closed. We deduce that

n o [k n o
k k
yk ∈ H : gx (yk ) ≤ t = yk ∈ H : gi,x (yk ) ≤ t
i=1

is weakly closed. Lemma A.2 follows by using statement (i) in Definition

A.1.

Lemma A.3 The function D(µ, .) is weakly l.s.c. on Hk .

Proof of Lemma A.3 For each yk∗ ∈ Hk , we can write:

Z
limwinf D(µ, yk ) = limwinf gx (yk )µ(dx)
yk →yk∗ y
Z k →y ∗
k H

≥ limwinf gx (yk )µ(dx)

H yk →yk∗
(byZ Fatou’s Lemma)
≥ gx (yk∗ )µ(dx)
H
(by Lemma A.2 and statement (ii) in Definition A.1)
= D(µ, yk∗ ),

which proves that D(µ, .) satisfies the condition (ii) of Definition A.1.

We are now in a position to prove Theorem 2.1.

Proof of Theorem 2.1 According to Lemma A.1, there exists R > 0 such
that the infimum of D(µ, .) on Hk is also the infimum of D(µ, .) on BR k.

Moreover, on the one hand BR k is compact for the weak topology, and on the

other hand D(µ, .) is weakly l.s.c. according to Lemma A.3. Thus, according
to Proposition A.1, the function D(µ, .) reaches its infimum on BR k.

A.2 Proof of Theorem 3.3

The proof is adapted from the proof of Theorem 1 by Bolley, Guillin, and
Villani [4]. It can be decomposed in three steps:

1. First, we show we can consider truncated version of the probability

measures µ and µn on the ball BR ;

2. Then we cover the space P(BR ) by small balls of radius r;

3. Finally, we optimize the various parameters introduced in the proof.

Each of the next three lemmas matches a step.

17
Let R > 0. We consider µR defined, for all Borel set A ⊂ H, by:
µ[A ∩ BR ]
µR [A] = = µ[A|BR ].
µ[BR ]
Consider now the independent random variables {Xi }ni=1 with distribution
µ and {Yi }ni=1 with distribution µR . We define, for i ≤ n,

R Xi if kXi k ≤ R
Xi =
Yi if kXi k > R.

Let δx be the Dirac measure at point x. The empirical measures µn and µR

n
are defined by
n n
1X 1X
µn = δXi and µR
n = δX R .
n n i
i=1 i=1

Note Eα = H exp(αkxk2 )µ(dx). Since we suppose that µ satisfies a T1 (λ)-

inequality, we have, for α < λ/2, Eα < ∞.

Lemma A.4 Let η ∈]0, 1[, ε, θ > 0, α1 ∈]0, λ/2[, and α ∈]α1 , λ/2[. Then,
p
for all R > max 1/2α, 2θ/α1 , we have
h i
−αR2
P [ρ (µn , µ) > ε] ≤ P ρ µR , µR

n > ηε − 2E α Re
h 2
i
+ exp −n θ (1 − η) ε − Eα e(α1 −α)R .

Proof of Lemma A.4 For a fixed ε > 0, we bound P[ρ(µ, µn ) > ε] in func-
tion of µR and µR
n . First, following the arguments of the proof of Theorem
1.1 by Bolley, Guillin,
p and Villani (step 1) [4], it can be proven that for all
α < λ/2 and R ≥ 1/2α,
2
ρ(µ, µR ) ≤ 2Eα Re−αR . (A.1)

Second, the probability measures µn and µR

n satisfy
n n
1X 1X
ρ(µn , µR
n) ≤ kXiR − Xi k ≤ Zi ,
n n
i=1 i=1

where Zi = 2kXi k1kXi k>R (i = 1, . . . , n). Using a similar argument as in the

proof of Theorem 1.1 by Bolley, Guillin, and Villani (step 1) [4], we deduce
that if ε, θ are positive and α < λ/2,
h i
R (α1 −α)R2

P ρ(µn , µn ) > ε ≤ exp −n θε − Eα e . (A.2)

The conclusion follows from (A.1), (A.2), and the triangular inequality for
ρ.

18
Lemma A.5 Given θ, α, α1 , λ1 > 0 such that λ1 < λ, α ∈]α1 , λ/2[, and
ζ > 1, there exist
p positive constants
δ1 , λ2 < λ1 , K1 and K2 such that, for
all R > ζ max 1/2α, 2θ/α1 and ε > 0,

λ2 2 2 −αR2
P [ρ(µ, µn ) > ε] ≤ N (δ1 ε/2, BR ) exp −n ε − K1 R e
2
h 2
i
+ exp −n K2 ζε − K3 e(α1 −α)R ,

where K3 is a positive constant depending only on θ and α1 .

Proof of Lemma A.5 We start by proving that µR satisfies a modified

T1 (λ)-inequality. Let Λ be a Borel set of P(BR ). Following the arguments
of the proof of Theorem 1.1 of Bolley, Guillin, and Villani (step 2) [4], one
may write
P[µR R
n ∈ Λ] ≤ exp −n inf H(ν|µ ) . (A.3)
ν∈Λ

From now on, we consider that P(BR ) is equipped with the distance ρ. Con-
sider δ > 0 and A a measurable subset of P(BR ). We set N A = N (δ/2, A).
Then there exist N A balls Bi , i = 1, . . . , N A , covering A. Each of this balls
is convex and included in the δ-neighborhood Aδ of A. Moreover, by as-
sumption H2, the balls Bi are totally bounded.

It is easily inferred from equation (A.3) that

R A R
P[µn ∈ A] ≤ N exp −n inf H(ν|µ ) . (A.4)
ν∈Aδ

Define now
n 2
o
A = ν ∈ P(BR ) : ρ(ν, µR ) ≥ ηε − 2Eα Re−αR .

According to the basic inequality

∀a ∈]0, 1[, ∃ C > 0 such that ∀x, y ∈ R, (x − y)2 ≥ (1 − a)x2 − Cy 2 , (A.5)

we have, for any ν ∈ H,

λ1 2 R 2
∀λ1 < λ, ∃K > 0 such that H(ν|µR ) ≥ ρ (µ , ν) − KR2 e−αR .
2
Thus, we can write
λ1 2 R 2 λ1 2 2
∀ν ∈ Aδ , H(ν|µR ) ≥ ρ (µ , ν) − KR2 e−αR ≥ m − KR2 e−αR ,
2 2
where
2
m = max ηε − 2Eα Re−αR − δ, 0 .

19
From this and equation (A.4) we conclude that

h
R R −αR2
i
A λ1 2 2 −αR2
P ρ(µ , µn ) ≥ ηε − 2Eα Re ≤ N exp −n m − KR e .
2
(A.6)
Now, given λ2 < λ1 , it follows from (A.5) that there exist three positive
constants δ1 , η1 and K1 depending only on α, λ1 , and λ2 such that
λ1 2 2 λ2 2 2
m − KR2 e−αR ≥ ε − K1 R2 e−αR ,
2 2
where δ = δ1 ε. This leads, together with (A.6), to

h
R R −αR2
i
A λ2 2 2 −αR2
P ρ(µ , µn ) ≥ ηε − 2Eα Re ≤ N exp −n ε − K1 R e .
2
(A.7)
To bound N A , we observe that since A ⊂ P(BR ),

N A ≤ N (δ/2, BR ) = N (δ1 ε/2, BR ).

The conclusion follows by Lemma A.4 and inequality (A.7).

The following lemma simplifies the results of the previous.

Lemma A.6 Let λ0 < λ, α < λ/2, and α0 < α. There exists δ1 > 0 such
that, for all ε > 0,
0
λ 2
P [ρ(µ, µn ) > ε] ≤ exp − nε + exp −α0 nε2 ,

2
as soon as

2 2 1 ln (N (δ1 ε/2, BR ))
R ≥ R2 max 1, ε , ln and n ≥ K4 ,
ε2 ε2
where R2 and K4 are some positive constants depending on µ through λ and
α.
Proof of Lemma A.6 On the one hand, under the assumptions and no-
tation of Lemma A.5, we have, for all λ0 < λ2 ,
−nλ0 ε2

λ2 2 2 −αR2
ln N (δ1 ε/2, BR ) exp −n ε − K1 R e ≤ (A.8)
2 2
as soon as R, R/ ln(1/ε2 ) and nε2 / ln(N (δ1 ε/2, BR )) are large enough (see
the third step of the proof of Theorem 1.1 by Bolley, Guillin, and Villani [4]).

On the other hand, let α0 < α2 < α1 . We can choose ζ such that K2 ζ = α2 ε.
With this choice we obtain
h 2
i h 2
i
exp −n K2 ζε − K3 e(α1 −α)R = exp −n α2 ε2 − K3 e(α1 −α)R ,

20
which can be bounded by exp −α0 nε2 , for R and R2 / ln(1/ε2 ) large enough.

This, together with (A.8), leads to the conclusion.

Theorem 3.3 is then a straightforward consequence of Lemma A.6, noticing

that, for any K < min((λ0 /2), α0 ) and n large enough, we have
0
λ 2
exp − nε + exp −α0 nε2 ≤ exp −Knε2 .

2

A.3 Proof of Theorem 3.4

Let ε > 0 be small enough. According to Corollary 3.1 we have
0 2
P [D(µ, qn∗ ) − D(µ, q ∗ ) > ε] ≤ e−(λ /8)nε ,

as soon as n ≥ M(ε). Therefore we can write:

Z +∞
∗ ∗
ED(µ, qn ) − D(µ, q ) = P [D(µ, qn∗ ) − D(µ, q ∗ ) > ε] dε
0
Z M−1 (n)
= P [D(µ, qn∗ ) − D(µ, q ∗ ) > ε] dε
0
Z +∞
+ P [D(µ, qn∗ ) − D(µ, q ∗ ) > ε] dε
M−1 (n)
Z +∞
−1 0 2
≤ M (n) + e−(λ /8)nε dε
0
−1
≤ C0 max(M (n), n−1/2 ),

as desired.

A.4 Proof of Theorem 4.1

One can easily show that
∗ ∗
D(µn , yk,n ) − D(µ, yk,n ) ≤ ρ(µ, µn ). (A.9)

Thus, by Lemma 4 in Linder [17] and Varadarajan’s Theorem [8], we deduce

that:
∗ ∗
D(µn , yk,n ) − D(µ, yk,n ) → 0 a.s. as n → ∞. (A.10)
∗ ) ≤ D(µ , z) and, by the
Let p ≤ n and z ∈ {X1 , . . . , Xp }k . Since D(µn , yk,n n
law of large number, D(µn , z) → D(µ, z) a.s., we have
∗
lim sup D(µn , yk,n ) ≤ D(µ, z) a.s.
n

21
From (A.10), we deduce that, for all p ≥ 1,
∗
lim sup D(µ, yk,n )≤ min D(µ, z). (A.11)
n z∈{X1 ,...,Xp }k

Let us now evaluate the limit of the right-hand term in the equation (A.11)
as p → ∞. Note, for ε > 0 and p ≥ 1,

N (p, ε) = ∃ z∗ ∈ arg min D(µ, z) ∩ BHk (yk∗ , ε),
z∈{X1 ,...,Xp }k

∗
D(µ, z ) ≥ D(µ, yk∗ ) + 2ε .

Since, ∀ yk , yk0 ∈ Hk , |D(µ, yk ) − D(µ, yk0 )| ≤ kyk − yk0 kk , we obtain

N (p, ε) ⊂ [D(µ, yk∗ ) ≥ D(µ, yk∗ ) + ε] = ∅.

Therefore as soon as p ≥ k,

h i
P min D(µ, z) − D(µ, yk∗ ) > 2ε
z∈{X1 ,...,Xp }k
h i h i
≤ P N (p, ε) + P ∀ z ∈ {X1 , . . . , Xp }k , z 6∈ BHk (yk∗ , ε)
h ibp/kc
≤ P (X1 , . . . , Xk ) 6∈ BHk (yk∗ , ε)
bp/kc
= 1 − P (X1 , . . . , Xk ) ∈ BHk (yk∗ , ε)

, (A.12)

where b.c stands for the integer part function. Then, by the Borel-Cantelli
lemma,
lim min D(µ, z) = D(µ, yk∗ ) a.s.
p→∞ z∈{X1 ,...,Xp }k

This result, together with (A.11), leads to the conclusion.

A.5 Proof of Theorem 4.2

according to (A.9).

22
On the other hand,
∗
lim D(µn , yk,n ) = D∗ (µ) a.s.
n→∞

Moreover,
n
∗ 1X
D(µn , yk,n ) = min min kXi − zj k
z∈{X1 ,...,Xn }k n j=1,...,k
i=1
n
1X
≤ kXi − X1 k
n
i=1
n
1X
≤ kXi k + kX1 k.
n
i=1

∗ ) is equi-integrable, which proves that it converges in L .

Hence, D(µn , yk,n 1

Finally, Eρ(µ, µn ) → 0 by Theorem 3.3, and we deduce the proof of Theorem

4.2.

A.6 Proof of Theorem 4.3

First we can write
∗
D(µ, yk,n ) − Dk∗ (µ) = D(µ, yk,n
∗ ∗
) − D(µn , yk,n )
∗
+ D(µn , yk,n )− min D(µ, z)
z∈{X1 ,...,Xn }k
+ min D(µ, z) − Dk∗ (µ).
z∈{X1 ,...,Xn }k

Then, according to Lemma 3 in Linder [17], we have

∗ ∗
D(µ, yk,n ) − D(µn , yk,n ) ≤ ρ(µ, µn )

and
∗
D(µn , yk,n )− min D(µ, z) ≤ ρ(µ, µn ).
z∈{X1 ,...,Xn }k

Thus,
∗
D(µ, yk,n ) − Dk∗ (µ) ≤ 2ρ(µ, µn ) + min D(µ, z) − Dk∗ (µ). (A.13)
z∈{X1 ,...,Xn }k

Moreover, according to the inequality (A.12), we have for n ≥ k:

h ibn/kc
P min D(µ, z) − Dk (µ) ≥ 2ε ≤ 1 − f (yk∗ , ε)
∗
.
z∈{X1 ,...,Xn }k

23
We deduce

E min D(µ, z) − Dk∗ (µ)

z∈{X1 ,...,Xn }k
Z +∞ h i
= P min D(µ, z) − Dk∗ (µ) ≥ ε dε
0 z∈{X1 ,...,Xn }k
Z +∞ bn/kc
≤2 1 − f (yk∗ , ε) dε
0
!
Z bn/kc Z v bn/kc
≤2 1− f (yk∗ , ε) dε + 1− f (yk∗ , ε) dε
[0,u]∪[v,∞[ u
Z v h ibn/kc
≤ 2 2V (n) + 1− f (yk∗ , ε) dε
u
(according to assumption H3)

≤ 2 2V (n) + (v − u)Γbn/kc

≤ C max bn/kc−1/2 , V (n) for n large enough,

where Γ < 1 and C are some positive constants. Theorem 4.3 follows from
(A.13), Theorem 3.3 and Theorem 3.4.

24
References
[1] E. Abaya and G. Wise. Convergence of vector quantizers with applica-
tion to optimal quantization. SIAM Journal on Applied Mathematics,
44:183–189, 1984.

[2] G. Biau, L. Devroye, and G. Lugosi. On the performance of clustering in

Hilbert spaces. IEEE Transactions on Information Theory, 54:781–790,
2007.

[3] K. Bleakley. Quelques Contributions à l’Analyse Statistique et à la Clas-

sification des Graphes et des Courbes. Applications à l’Immunobiologie
et à la Reconstruction des Réseaux Biologiques. PhD thesis, Université
Montpellier II, 2007.

[4] F. Bolley, A. Guillin, and C. Villani. Quantitative concentration in-

equalities for empirical measures on non-compact spaces. Probability
Theory and Related Fields, 137(3-4):541–593, 2007.

[5] B. Cadre. Convergent estimators for the L1 -median of a Banach valued

random variable. Statistics, 35(4):509–521, 2001.

[6] F. Cucker and S. Smale. On the mathematical foundations of learn-

ing. American Mathematical Society. Bulletin. New Series, 39(1):1–49
(electronic), 2002.

[7] H. Djellout, A. Guillin, and L. Wu. Transportation cost-information

inequalities and applications to random dynamical systems and diffu-
sions. Annals of Probability, 32(3B):2702–2732, 2004.

[8] R. M. Dudley. Real Analysis and Probability, volume 74 of Cambridge

Studies in Advanced Mathematics. Cambridge University Press, Cam-
bridge, 2002. Revised reprint of the 1989 original.

[9] N. Dunford and J. T. Schwartz. Linear Operators. Part I. Wiley Clas-

sics Library. John Wiley & Sons Inc., New York, 1988. General theory,
With the assistance of William G. Bade and Robert G. Bartle, Reprint
of the 1958 original, A Wiley-Interscience Publication.

[10] I. Ekeland and R. Temam. Analyse Convexe et Problèmes Variation-

nels. Dunod, 1974. Collection Études Mathématiques.

[11] A. Gersho and R. M. Gray. Vector Quantization and Signal Compres-

sion. Kluwer Academic Publishers, Norwell, MA, USA, 1991.

[12] S. Graf and H. Luschgy. Foundations of Quantization for Probability

Distributions, volume 1730 of Lecture Notes in Mathematics. Springer-
Verlag, Berlin, 2000.

25
[13] J. A. Hartigan. Clustering Algorithms. John Wiley & Sons, New York-
London-Sydney, 1975. Wiley Series in Probability and Mathematical
Statistics.

[14] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data. Wiley Series

in Probability and Mathematical Statistics: Applied Probability and
Statistics. John Wiley & Sons Inc., New York, 1990. An introduction
to cluster analysis, A Wiley-Interscience Publication.

[15] J. H. B. Kemperman. The median of a finite measure on a Banach space.

In Statistical data analysis based on the L1 -norm and related methods
(Neuchâtel, 1987), pages 217–230. North-Holland, Amsterdam, 1987.

[16] M. Ledoux. The Concentration of Measure Phenomenon, volume 89 of

Mathematical Surveys and Monographs. American Mathematical Soci-
ety, 2001.

[17] T. Linder. Learning-theoretic methods in vector quantization. In Prin-

ciples of nonparametric learning (Udine, 2001), volume 434 of CISM
Courses and Lectures, pages 163–210. Springer, Vienna, 2002.

[18] T. Linder, G. Lugosi, and K. Zeger. Rates of convergence in the source

coding theorem, in empirical quantizer design, and in universal lossy
source coding. IEEE Transactions on Information Theory, 40:1728–
1740, 1994.

[19] D. Pollard. Strong consistency of k-means clustering. The Annals of

Statistics, 9:135–140, 1981.

[20] D. Pollard. A central limit theorem for k-means clustering. The Annals
of Probability, 10:919–926, 1982.

[21] D. Pollard. Quantization and the method of k-means. IEEE Transac-

tions on Information Theory, 28:199–205, 1982.

[22] J. O. Ramsay and B. W. Silverman. Functional Data Analysis. Springer

Series in Statistics. Springer, New York, second edition, 2005.

[23] A. W. Van der Vaart and J. A. Wellner. Weak Convergence and Empir-
ical Processes. Springer Series in Statistics. Springer-Verlag, New York,
1996. With applications to statistics.

Real and Functional Analysis - Serge Lang
100% (2)
Real and Functional Analysis - Serge Lang
591 pages
Functional Analysis Solutions
100% (1)
Functional Analysis Solutions
43 pages
The Enhanced LBG Algorithm: Giuseppe Patan e and Marco Russo
No ratings yet
The Enhanced LBG Algorithm: Giuseppe Patan e and Marco Russo
33 pages
New Bounds For The Marcum Q-Function
No ratings yet
New Bounds For The Marcum Q-Function
6 pages
The Enhanced LBG Final
No ratings yet
The Enhanced LBG Final
6 pages
Quantization of Discrete Probability Distributions: Qualcomm Inc. 5775 Morehouse Drive, San Diego, CA 92121
No ratings yet
Quantization of Discrete Probability Distributions: Qualcomm Inc. 5775 Morehouse Drive, San Diego, CA 92121
6 pages
Geometrical Part1
No ratings yet
Geometrical Part1
93 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Optimal One-Bit Quantization
No ratings yet
Optimal One-Bit Quantization
9 pages
Part 2
No ratings yet
Part 2
6 pages
Lemaire Et Al. - 2020 - New Weak Error Bounds and Expansions For Optimal Q
No ratings yet
Lemaire Et Al. - 2020 - New Weak Error Bounds and Expansions For Optimal Q
25 pages
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
No ratings yet
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
237 pages
Solution of Certain Problems in Quantum Mechanics
From Everand
Solution of Certain Problems in Quantum Mechanics
A. Bolotin
No ratings yet
Improved Versions of Learning Vector Quantization
No ratings yet
Improved Versions of Learning Vector Quantization
6 pages
Complex Sigma-Delta Quantization Algorithms For Finite Frames
No ratings yet
Complex Sigma-Delta Quantization Algorithms For Finite Frames
23 pages
Basis Representation Fundamentals: Notes by J. Romberg
No ratings yet
Basis Representation Fundamentals: Notes by J. Romberg
28 pages
Vector Quantization K Means Nearest Neig
No ratings yet
Vector Quantization K Means Nearest Neig
19 pages
Data Compression (Rcs087) Assignment Unit-5
No ratings yet
Data Compression (Rcs087) Assignment Unit-5
6 pages
Basic Methods of Linear Functional Analysis
From Everand
Basic Methods of Linear Functional Analysis
John D. Pryce
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
DC Unit 5
No ratings yet
DC Unit 5
7 pages
Sampling and Quantization Theory For Image Processing: Iraldi
No ratings yet
Sampling and Quantization Theory For Image Processing: Iraldi
5 pages
Ali, Englis 2004 - Quantization Methods - A Guide For Physicists and Analysts
No ratings yet
Ali, Englis 2004 - Quantization Methods - A Guide For Physicists and Analysts
72 pages
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
1960 Max: Qu, Antixing For Minimum Distortion 7
No ratings yet
1960 Max: Qu, Antixing For Minimum Distortion 7
6 pages
Integration, Measure and Probability
From Everand
Integration, Measure and Probability
H. R. Pitt
No ratings yet
Unit 5
No ratings yet
Unit 5
8 pages
Encoding The Ball From Limited Measurements
No ratings yet
Encoding The Ball From Limited Measurements
10 pages
Quantization
No ratings yet
Quantization
59 pages
Information Theory and Machine Learning
No ratings yet
Information Theory and Machine Learning
21 pages
Mit6 441s16 Course Notes
No ratings yet
Mit6 441s16 Course Notes
295 pages
The Summation of Series
From Everand
The Summation of Series
Harold T. Davis
4/5 (1)
Shannon Aliasing
No ratings yet
Shannon Aliasing
10 pages
Nonparametric Inference Under B-Bits Quantization: Kexuan Li
No ratings yet
Nonparametric Inference Under B-Bits Quantization: Kexuan Li
68 pages
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
From Everand
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
Fouad Sabry
No ratings yet
Geometry of Submanifolds
From Everand
Geometry of Submanifolds
Bang-Yen Chen
No ratings yet
Speech Recognition Using Vector Quantization Through Modified K-Meanslbg Algorithm
No ratings yet
Speech Recognition Using Vector Quantization Through Modified K-Meanslbg Algorithm
9 pages
Student Member, IEEE Fellow, IEEE: Distributed Structures, Sequential Optimization, and Quantization For Detection
No ratings yet
Student Member, IEEE Fellow, IEEE: Distributed Structures, Sequential Optimization, and Quantization For Detection
6 pages
Sparse Sampling of Signal Innovations: Theory, Algorithms and Performance Bounds
No ratings yet
Sparse Sampling of Signal Innovations: Theory, Algorithms and Performance Bounds
18 pages
Vector Quantization: Data Compression and Data Retrieval
No ratings yet
Vector Quantization: Data Compression and Data Retrieval
39 pages
Friedland2020 Article TheCollatz-WielandtQuotientFor
No ratings yet
Friedland2020 Article TheCollatz-WielandtQuotientFor
41 pages
Quiz3 With Solutions
No ratings yet
Quiz3 With Solutions
6 pages
CH - 13 - 1 Transform Coding - Intro & Bit Allocation Optimization
No ratings yet
CH - 13 - 1 Transform Coding - Intro & Bit Allocation Optimization
17 pages
Q (X) Q (X) : 8.4.1 Uniform Scalar Quantization
No ratings yet
Q (X) Q (X) : 8.4.1 Uniform Scalar Quantization
22 pages
Vector Quantization
No ratings yet
Vector Quantization
12 pages
Compression 2
No ratings yet
Compression 2
6 pages
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
A Taste of Compressed Sensing
No ratings yet
A Taste of Compressed Sensing
11 pages
It Lectures
No ratings yet
It Lectures
342 pages
The Method of Trigonometrical Sums in the Theory of Numbers
From Everand
The Method of Trigonometrical Sums in the Theory of Numbers
I. M. Vinogradov
No ratings yet
Quantization: Prof. Pooja M. Bharti IT Department Laxmi Institute of Technology
No ratings yet
Quantization: Prof. Pooja M. Bharti IT Department Laxmi Institute of Technology
35 pages
Bellman Filtering and Smoothing For State-Space Models
No ratings yet
Bellman Filtering and Smoothing For State-Space Models
60 pages
Collective Intelligence Agenda 6 19 Dab5c0
No ratings yet
Collective Intelligence Agenda 6 19 Dab5c0
16 pages
2540. 重The Information Bottleneck Method -2000
No ratings yet
2540. 重The Information Bottleneck Method -2000
11 pages
Rajmic 2016
No ratings yet
Rajmic 2016
5 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Global Convergence and Empirical Consistency of The Generalized Lloyd Algorithm
No ratings yet
Global Convergence and Empirical Consistency of The Generalized Lloyd Algorithm
8 pages
Analog-to-Digital Conversion
No ratings yet
Analog-to-Digital Conversion
41 pages
Mat CDZ Enumerate
No ratings yet
Mat CDZ Enumerate
8 pages
Slide Xu Ly Tin Hieu So Chapter 2 Analog To Digital Conversion
No ratings yet
Slide Xu Ly Tin Hieu So Chapter 2 Analog To Digital Conversion
41 pages
Measure and Integration
100% (1)
Measure and Integration
417 pages
CBCS SYLLABUS MAorMSC MATHEMATICS
No ratings yet
CBCS SYLLABUS MAorMSC MATHEMATICS
27 pages
Mulatu's Draft Thesis AGN
No ratings yet
Mulatu's Draft Thesis AGN
34 pages
Solutions OPERATOR THEORY Mathematic87.Blogfa
No ratings yet
Solutions OPERATOR THEORY Mathematic87.Blogfa
39 pages
Functional Analysis2
No ratings yet
Functional Analysis2
173 pages
Abstract Calculus
No ratings yet
Abstract Calculus
3 pages
Operator Theory Project.
No ratings yet
Operator Theory Project.
32 pages
Ajcm 2021060814184638
No ratings yet
Ajcm 2021060814184638
19 pages
Math of MQ
No ratings yet
Math of MQ
39 pages
Applied Functional Analysis, Third Edition John Tinsley Oden 2024 Scribd Download
No ratings yet
Applied Functional Analysis, Third Edition John Tinsley Oden 2024 Scribd Download
38 pages
Regularization Methods in Banach Spaces Thomas Schuster Barbara Kaltenbacher Bernd Hofmann Kamil S Kazimierski PDF Download
100% (1)
Regularization Methods in Banach Spaces Thomas Schuster Barbara Kaltenbacher Bernd Hofmann Kamil S Kazimierski PDF Download
87 pages
122 PGTRB Maths Unit 6 Study Material PDF
No ratings yet
122 PGTRB Maths Unit 6 Study Material PDF
8 pages
MSC 4 Sem Mathematics Functional Analysis 4263 May 2021
No ratings yet
MSC 4 Sem Mathematics Functional Analysis 4263 May 2021
8 pages
Expansive Mappings and Their Applications in Modular Space
No ratings yet
Expansive Mappings and Their Applications in Modular Space
9 pages
Fixed Point Theorems in Metric Spaces: Petko D. Proinov
No ratings yet
Fixed Point Theorems in Metric Spaces: Petko D. Proinov
12 pages
A YT FunctAnalysis Vol 1 146 JM 01
No ratings yet
A YT FunctAnalysis Vol 1 146 JM 01
386 pages
Sidorov 2012
No ratings yet
Sidorov 2012
16 pages
Linear Functional Analysis: Joan Cerdà
0% (1)
Linear Functional Analysis: Joan Cerdà
26 pages
2021 Course Catalogue
No ratings yet
2021 Course Catalogue
209 pages
21 Syllabus M.SC Mathematics Syllabus
No ratings yet
21 Syllabus M.SC Mathematics Syllabus
32 pages
What Is An RKHS?: 1 Outline
No ratings yet
What Is An RKHS?: 1 Outline
24 pages
Escuela Chile
No ratings yet
Escuela Chile
46 pages
Cazenave & Haraux - An Introduction To Semilinear Evolution
No ratings yet
Cazenave & Haraux - An Introduction To Semilinear Evolution
198 pages
(Jaroslav Lukeš, Jan Malý, Ivan Netuka, Jiří S
No ratings yet
(Jaroslav Lukeš, Jan Malý, Ivan Netuka, Jiří S
732 pages
MSC Mathematics 2014
No ratings yet
MSC Mathematics 2014
21 pages
Normed and Banach Spaces
No ratings yet
Normed and Banach Spaces
9 pages
A Proof of The Fredholm Alternative - What's New
No ratings yet
A Proof of The Fredholm Alternative - What's New
10 pages
M A - M Sc-Maths
0% (1)
M A - M Sc-Maths
23 pages