0% found this document useful (0 votes)
63 views4 pages

A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook

This document provides an appendix summarizing basic definitions and results from probability theory. It includes definitions of measurable spaces, measures, probability spaces, random variables, expectations, variances, independence, and common probability distributions like the binomial and multinomial. It also lists several important inequalities like Cauchy-Schwarz, Markov, Chebyshev, and Jensen's inequality. Finally, it defines different modes of convergence for random variables and conditional expectations and probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views4 pages

A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook

This document provides an appendix summarizing basic definitions and results from probability theory. It includes definitions of measurable spaces, measures, probability spaces, random variables, expectations, variances, independence, and common probability distributions like the binomial and multinomial. It also lists several important inequalities like Cauchy-Schwarz, Markov, Chebyshev, and Jensen's inequality. Finally, it defines different modes of convergence for random variables and conditional expectations and probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Based on the Appendix of the textbook

A Probabilistic Theory of Pattern Recognition


Luc Devroye

Laszlo Gyorfi

Gabor Lugosi

Edited by Andras Antos


[We handle only discrete random variables and joint distributions of finitely many of them.]

Appendix
In this appendix we summarize some basic definitions and results from the theory of probability. Most
proofs are omitted as they may be found in standard textbooks on probability, such as Feller [1], Ash [2],
Shiryayev [3], Chow and Teicher [4], Durrett [5], Grimmett and Stirzaker [6], and Zygmund [7]. We also give
a list of useful inequalities that are used in the text.

Basics of Measure Theory

Definition 1 Let S be a set, and let F be the family of all subsets of S. Then (S, F) is called a measurable
space. The subsets of S are called measurable sets.
Definition 2 Let (S, F) be a measurable space and let : F [0, ) be a function. is a measure on F if
(i) () = 0,
P
(ii) is -additive, that is, A1 , A2 , . . . F, and Ai Aj = 0, i 6= j imply that (
i=1 Ai ) =
i=1 (Ai ).
In other words, a measure is a nonnegative, -additive set function.

Definition 3 The triple (S, F, ) is a measure space if (S, F) is a measurable space and is a measure on F.
Definition 4 Let 1 and 2 be measures on the measurable spaces (S1 , F1 ) and (S2 , F2 ), respectively. Let
(S, F) be a measurable space such that S = S1 S2 . is called the product measure of 1 and 2 on F if for
F1 F1 and F2 F2 , (F1 F2 ) = 1 (F1 )2 (F2 ). The product of more than two measures can be defined
similarly.

Probability

Definition 5 A (countable) measure space (, F, P) is called a probability space if P{} = 1. is the sample
space or sure event, the measurable sets are called events, and the 7 R functions are called (discrete) random
variables. If X1 , . . . , Xn are random variables then X = (X1 , . . . , Xn ) is a vector-valued random variable.
Definition 6 Let X be a random variable, then X induces the measure on the subsets of R by
(B) = P {{ : X() B}} = P{X B},

B R.

The probability measure is called the distribution of the random variable X.


Definition 7 Let X be a random variable. The expectation of X is
X
X
X
E{X} =
xP{X = x} =
xP{X = x} +
xP{X = x},
x

x>0

x<0

if at least one term on the right-hand side is finite.


Definition 8 Let X be a random variable. The variance of X is


Var{X} = E (X E{X})2

if E{X} is finite, and if E{X} is not finite or does not exist.

Definition 9 Let X1 , . . . , Xn be random variables. They induce the measure (n) on the subsets of Rn with
the property
(n) (B) = P {{ : (X1 (), . . . , Xn ()) B}} ,
B Rn .
(n) is called the joint distribution of the random variables X1 , . . . , Xn . Let i be the distribution of Xi
(i = 1, . . . , n). The random variables X1 , . . . , Xn are independent if their joint distribution (n) is the product
measure of 1 , . . . , n . The events A1 , . . . , An F are independent if the random variables IA1 , . . . , IAn are
independent.
Theorem 1 If the random variables X1 , . . . , Xn are independent and have finite expectations then
E{X1 X2 . . . Xn } = E{X1 }E{X2 } E{Xn }.

Inequalities

Theorem 2 (Cauchy-Schwarz inequality).


ments (E{X 2 } < and E{Y 2 } < ), then
|E{XY }|
Theorem 3 (Markovs inequality).
t > 0,

If the random variables X and Y have finite second mop


E{X 2 }E{Y 2 }.

Let X be a nonnegative-valued random variable. Then for each


P{X t}

E{X}
.
t

Theorem 4 (Chebyshevs inequality).

Let X be a random variable. Then for each t > 0,


Var{X}
.
t2

P{|X E{X}| t}

Theorem 5 (Jensens inequality). If f is a real-valued convex function on a finite or infinite interval


of R, and X is a random variable with finite expectation, taking its values in this interval, then
f (E{X}) E{f (X)}.

Convergence of Random Variables

Definition 10 Let {Xn }, n = 1, 2, . . ., be a sequence of random variables. We say that


lim Xn = X

in probability

if for each > 0


lim P{|Xn X| } = 0.

We say that
lim Xn = X

with probability one (or almost surely),

if
n
o
P : lim Xn () = X() = 1.
n

For a fixed number p 1 we say that

lim Xn = X

in Lp ,

if
lim E {|Xn X|p } = 0.

Theorem 6 Convergence in Lp implies convergence in probability.


Theorem 7 limn Xn = X with probability one if and only if
lim sup |Xm X| = 0

n nm

in probability. Thus, convergence with probability one implies convergence in probability.

Conditional Expectation

If Y is a random variable with finite expectation and A is an event with positive probability, then the conditional expectation of Y given A is defined by
E{Y |A} =

E{Y IA }
.
P{A}

The conditional probability of an event B given A is


P{B|A} = E{IB |A} =

P{A B}
.
P{A}

Definition 11 Let Y be a random variable with finite expectation and X be a d-dimensional vector-valued
random variable. For x Rd such that P{X = x} > 0, let
g(x) = E{Y |X = x} =

E{Y I{X=x} }
.
P{X = x}

The conditional expectation E{Y |X} of Y given X is a random variable with the property that E{Y |X} = g(X)
with probability one.
Definition 12 Let C be an event and X be a d-dimensional vector-valued random variable. Then the conditional probability of C given X is P{C|X} = E{IC |X}.
Theorem 8 Let Y be a random variable with finite expectation. Let C be an event, and let X and Z be
vector-valued random variables. Then
(i)
(ii) E{Y } = E{E{Y |X}}, P{C} = E{P{C|X}}.
(iii) E{Y |X} = E{E{Y |X, Z}|X}, P{C|X} = E{P{C|X, Y }|X}.
(iv) If Y is a function of X then E{Y |X} = Y .
(v) If (Y, X) and Z are independent, then E{Y |X, Z} = E{Y |X}.
(vi) If Y = f (X, Z) for a function f , and X and Z are independent, then E{Y |X} = g(X), where
g(x) = E{f (x, Z)}.

The Binomial Distribution

An integer-valued random variable X is said to be binomially distributed with parameters n and p if


 
n k
P{X = k} =
p (1 p)nk , k = 0, 1, . . . , n.
k
Pn
If A1 , . . . , An are independent events with P{Ai } = p, then X = i=1 IAi is binomial (n, p). IAi is called a
Bernoulli random variable with parameter p.

The Multinomial Distribution

A vector (N1 , . . . , Nk ) of integer-valued random variables is multinomially distributed with parameters (n, p1 , . . . , pk )
if

Pk
i1
ik
n!
if
j=1 ij = n, ij 0
i1 !ik ! p1 pk
P{N1 = i1 , . . . , Nk = ik } =
0
otherwise.

References
[1]
[2]
[3]
[4]

W. Feller. An Introduction to Probability Theory and its Applications, Vol.1. John Wiley, New York, 1968.
R.B. Ash. Real Analysis and Probability. Academic Press, New York, 1972.
A. N. Shiryayev. Probability. Springer-Verlag, New York, 1984.
Y. S. Chow and H. Teicher. Probability Theory, Independence, Interchangeability, Martingales. Springer Texts in
Statistics. Springer-Verlag, New York, first edition, 1978.
[5] R. Durrett. Probability: Theory and Examples. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1991.
[6] G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes. Oxford University Press, Oxford, 1992.
[7] A. Zygmund. Trigonometric Series I. University Press, Cambridge, 1959.

You might also like