A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook
A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook
Laszlo Gyorfi
Gabor Lugosi
Appendix
In this appendix we summarize some basic definitions and results from the theory of probability. Most
proofs are omitted as they may be found in standard textbooks on probability, such as Feller [1], Ash [2],
Shiryayev [3], Chow and Teicher [4], Durrett [5], Grimmett and Stirzaker [6], and Zygmund [7]. We also give
a list of useful inequalities that are used in the text.
Definition 1 Let S be a set, and let F be the family of all subsets of S. Then (S, F) is called a measurable
space. The subsets of S are called measurable sets.
Definition 2 Let (S, F) be a measurable space and let : F [0, ) be a function. is a measure on F if
(i) () = 0,
P
(ii) is -additive, that is, A1 , A2 , . . . F, and Ai Aj = 0, i 6= j imply that (
i=1 Ai ) =
i=1 (Ai ).
In other words, a measure is a nonnegative, -additive set function.
Definition 3 The triple (S, F, ) is a measure space if (S, F) is a measurable space and is a measure on F.
Definition 4 Let 1 and 2 be measures on the measurable spaces (S1 , F1 ) and (S2 , F2 ), respectively. Let
(S, F) be a measurable space such that S = S1 S2 . is called the product measure of 1 and 2 on F if for
F1 F1 and F2 F2 , (F1 F2 ) = 1 (F1 )2 (F2 ). The product of more than two measures can be defined
similarly.
Probability
Definition 5 A (countable) measure space (, F, P) is called a probability space if P{} = 1. is the sample
space or sure event, the measurable sets are called events, and the 7 R functions are called (discrete) random
variables. If X1 , . . . , Xn are random variables then X = (X1 , . . . , Xn ) is a vector-valued random variable.
Definition 6 Let X be a random variable, then X induces the measure on the subsets of R by
(B) = P {{ : X() B}} = P{X B},
B R.
x>0
x<0
Definition 9 Let X1 , . . . , Xn be random variables. They induce the measure (n) on the subsets of Rn with
the property
(n) (B) = P {{ : (X1 (), . . . , Xn ()) B}} ,
B Rn .
(n) is called the joint distribution of the random variables X1 , . . . , Xn . Let i be the distribution of Xi
(i = 1, . . . , n). The random variables X1 , . . . , Xn are independent if their joint distribution (n) is the product
measure of 1 , . . . , n . The events A1 , . . . , An F are independent if the random variables IA1 , . . . , IAn are
independent.
Theorem 1 If the random variables X1 , . . . , Xn are independent and have finite expectations then
E{X1 X2 . . . Xn } = E{X1 }E{X2 } E{Xn }.
Inequalities
E{X}
.
t
P{|X E{X}| t}
in probability
We say that
lim Xn = X
if
n
o
P : lim Xn () = X() = 1.
n
lim Xn = X
in Lp ,
if
lim E {|Xn X|p } = 0.
n nm
Conditional Expectation
If Y is a random variable with finite expectation and A is an event with positive probability, then the conditional expectation of Y given A is defined by
E{Y |A} =
E{Y IA }
.
P{A}
P{A B}
.
P{A}
Definition 11 Let Y be a random variable with finite expectation and X be a d-dimensional vector-valued
random variable. For x Rd such that P{X = x} > 0, let
g(x) = E{Y |X = x} =
E{Y I{X=x} }
.
P{X = x}
The conditional expectation E{Y |X} of Y given X is a random variable with the property that E{Y |X} = g(X)
with probability one.
Definition 12 Let C be an event and X be a d-dimensional vector-valued random variable. Then the conditional probability of C given X is P{C|X} = E{IC |X}.
Theorem 8 Let Y be a random variable with finite expectation. Let C be an event, and let X and Z be
vector-valued random variables. Then
(i)
(ii) E{Y } = E{E{Y |X}}, P{C} = E{P{C|X}}.
(iii) E{Y |X} = E{E{Y |X, Z}|X}, P{C|X} = E{P{C|X, Y }|X}.
(iv) If Y is a function of X then E{Y |X} = Y .
(v) If (Y, X) and Z are independent, then E{Y |X, Z} = E{Y |X}.
(vi) If Y = f (X, Z) for a function f , and X and Z are independent, then E{Y |X} = g(X), where
g(x) = E{f (x, Z)}.
A vector (N1 , . . . , Nk ) of integer-valued random variables is multinomially distributed with parameters (n, p1 , . . . , pk )
if
Pk
i1
ik
n!
if
j=1 ij = n, ij 0
i1 !ik ! p1 pk
P{N1 = i1 , . . . , Nk = ik } =
0
otherwise.
References
[1]
[2]
[3]
[4]
W. Feller. An Introduction to Probability Theory and its Applications, Vol.1. John Wiley, New York, 1968.
R.B. Ash. Real Analysis and Probability. Academic Press, New York, 1972.
A. N. Shiryayev. Probability. Springer-Verlag, New York, 1984.
Y. S. Chow and H. Teicher. Probability Theory, Independence, Interchangeability, Martingales. Springer Texts in
Statistics. Springer-Verlag, New York, first edition, 1978.
[5] R. Durrett. Probability: Theory and Examples. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1991.
[6] G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes. Oxford University Press, Oxford, 1992.
[7] A. Zygmund. Trigonometric Series I. University Press, Cambridge, 1959.