CS434a/541a: Pattern Recognition Prof. Olga Veksler
CS434a/541a: Pattern Recognition Prof. Olga Veksler
Lecture 1
1
Outline of the lecture
Syllabus
Introduction to Pattern Recognition
Review of Probability/Statistics
2
Syllabus
Prerequisite
Analysis of algorithms (CS 340a/b)
First-year course in Calculus
Introductory Statistics (Stats 222a/b or
equivalent) will
review
Linear Algebra (040a/b)
Grading
Midterm 30%
Assignments 30%
Final Project 40%
3
Syllabus
Assignments
bi-weekly
theoretical or programming in Matlab or C
no extensive programming
may include extra credit work
may discuss but work individually
due in the beginning of the class
Midterm
open anything
roughly on November 8
4
Syllabus
Final project
Choose from the list of topics or design your
own
May work in group of 2, in which case it is
expected to be more extensive
5 to 8 page report
proposals due roughly November 1
due December 8
5
Intro to Pattern Recognition
Outline
What is pattern recognition?
Some applications
Our toy example
Structure of a pattern recognition system
Design stages of a pattern recognition system
6
What is Pattern Recognition ?
Informally
Recognize patterns in data
More formally
Assign an object or an event to
one of the several pre-specified
categories (a category is usually
called a class)
tea cup
face
phone 7
Application: male or female?
classes
Objects (pictures)
male female
Perfect
PR system
8
Application: photograph or not?
classes
Perfect
PR system
9
Application: Character Recognition
objects Perfect
hello world
PR system
10
Application: Medical diagnostics
classes
Perfect
PR system
11
Application: speech understanding
12
Application: Loan applications
classes
objects (people)
income debt married age approve deny
Susan Ho 0 20,000 no 25
13
Our Toy Application: fish sorting
classifier
fis
hs
pe
fis
hi ies c salmon
ma
ge camera
sorting
chamber
14
How to design a PR system?
Collect data (training data) and classify by hand
salmon sea bass salmon salmon sea bass sea bass
12
10
8
Count
salmon
6
sea bass
4
2
0
2 4 8 10 12 14
16
Length
Fish length as discriminating feature
Find the best length L threshold
fish length < L fish length > L
17
= 34%
50 17
Fish Length as discriminating feature
fish classified fish classified
as salmon as sea bass
12
10
8
Count
salmon
6
sea bass
4
2
0
2 4 8 10 12 14
Length
Lesson learned:
Length is a poor feature alone!
What to do?
Try another feature
Salmon tends to be lighter
Try average fish lightness
19
Fish lightness as discriminating feature
1 2 3 4 5
bass 0 1 2 10 12
salmon 6 10 6 1 0
14
12
10
8
Count
salmon
6 sea bass
4
2
0
1 2 3 4 5
Lightness
ba decision
ss boundary
lightness
decision regions
sa
lm
on
length
21
Better decision boundary
lightness
length
length
23
What Went Wrong?
complicated
boundary
Poor generalization
24
Generalization
training data testing data
collect data
choose features
prior
knowledge
choose model
train classifier
evaluate classifier
end 27
Design Cycle cont.
start
collect data
Collect Data
Can be quite costly
choose features
How do we know when
we have collected an
adequately choose model
representative set of
testing and training
examples? train classifier
evaluate classifier
end 28
Design Cycle cont.
start
Choose features collect data
Should be discriminating, i.e.
similar for objects in the same
category, different for objects in
different categories: choose features
evaluate classifier
end 30
Design Cycle cont.
start
Trade-off between
computational complexity evaluate classifier
and performance
end 32
Conclusion
useful
a lot of exciting and important
applications
but hard
must solve many issues for a successful
pattern recognition system
33
Review: mostly probability and
some statistics
34
Content
Probability
Axioms and properties
Conditional probability and independence
Law of Total probability and Bayes theorem
Random Variables
Discrete
Continuous
Pairs of Random Variables
Random Vectors
Gaussian Random Variable
35
Basics
We are performing a random experiment (catching
one fish from the sea)
S: all fish in the sea
event A
12
total number of events: 2
probability
all events in S
P
events
A P(A)
Axioms of Probability
1. P (A) ≥ 0
2. P (S ) = 1
3. If A B = ∅ then P ( A B ) = P ( A) + P (B )
37
Properties of Probability
P (∅) = 0
P (A) ≤ 1
P ( A c ) = 1 − P ( A)
A ⊂B P ( A ) < P (B )
A B occurred
U
A B B
A B
U B
U
! ! "
U
multiplication rule P(A B)= P(A|B) P(B)
39
Independence
A and B are independent events if
P(A B) = P(A) P(B)
U
A = U
U U
U U
U U
A B1 A B2 A B3 A B4
Thus P(A) = P(A B1) +P(A B2 ) +P(A B3 ) +P(A B4 )
Or using multiplication rule:
P( A) = P(A | B1 )P(B1 ) + + P(A | B4 )P(B4 )
n
P(A) = P(A | Bk )P(Bk )
k =1
Bayes Theorem
Let B , B , …, B , be a partition of the
sample space S. Suppose event A occurs.
What is the probability of event B ?
Answer: Bayes Rule
P (B i A ) P (A | B i )P (B i )
P (B i | A ) = =
P (A ) n
P (A | B k )P (B k )
k =1
#
42
Random Variables
In random experiment, usually assign some number
to the outcome, for example, number of of fish fins
A random variable X is a function from sample
sample space S to a real number. $
$ (# of fins)
&
%
P ( X = a ) = P ( X (ω ) = a ) = P (ω ∈ Ω | X (ω ) = a )
Two Types of Random Variables
"! #
" #
!
Properties of CDF F (a ) = P ( X ≤ a )
3. limb→ −∞ F (b) = 0
" #
Example:
P( ' ( " ) ( )=F(30)-F(20)
Discrete RV: Probability Mass Function
Given a discrete random variable X, we
define the probability mass function as
p(a) = P( X = a)
Satisfies all axioms of probability
47
Continuous RV: Probability Density Function
48
Properties of Probability Density Function
d
F (x ) = f (x )
dx
a
P( X = a) = f (x )dx = 0
a
∞
P(− ∞ ≤ X ≤ ∞) = f (x) dx = 1
−∞
f (x ) ≥ 0
49
probability mass probability density
pmf 1
pdf
1
0.4 0.6
0.3
1 2 3 4 5
! $
2
An important function of X: [X-E(X)]
2
Variance E[[X-E(X)] ] = var(X)=σ2
Variance measures the spread around the
mean
Standard deviation = [Var(X)]1/2 , has the
same units as the r.v. X
52
Properties of Expectation
If X is constant r.v. X=c, then E(X) = c
53
Pairs of Random Variables
Say we have 2 random variables:
Fish weight X
Fish lightness Y
Can define joint CDF
F(a,b) = P( X ≤ a,Y ≤ b) = P(ω ∈Ω | X(ω) ≤ a,Y(ω) ≤ b)
Similar to single variable case, can define
discrete: joint probability mass function
p(a,b) = P( X = a,Y = b)
continuous: joint density function f (x, y )
P(a ≤ X ≤ b,c ≤ Y ≤ d ) = f (x, y ) dx dy
54
a≤ x≤b
c≤ y ≤d
Marginal Distributions
given joint mass function px,y(a,b), marginal,
i.e. probability mass function for r.v. X can
be obtained from px,y(a,b)
px (a) = px,y (a, y ) py (b) = px,y (x, b)
∀y ∀x
y =−∞ x =−∞
55
Independence of Random Variables
56
More on Independent RV’s
E(XY)=E(X)E(Y)
Var(X+Y)=Var(X)+Var(Y)
G(X) and H(Y) are independent
57
Covariance
Given r.v. X and Y, covariance is defined as:
cov ( X ,Y ) = E[( X − E( X ))(Y − E(Y ))] = E( XY ) − E( X )E(Y )
Covariance is useful for checking if features X
and Y give similar information
Covariance (from co-vary) indicates tendency
of X and Y to vary together
If X and Y tend to increase together, Cov(X,Y) > 0
If X tends to decrease when Y increases, Cov(X,Y)
<0
If decrease (increase) in X does not predict
behavior of Y, Cov(X,Y) is close to 0 58
Covariance Correlation
If cov(X,Y) = 0, then X and Y are said to be
uncorrelated (think unrelated). However X
and Y are not necessarily independent.
59
Random Vectors
Generalize from pairs of r.v. to vector of r.v.
X= [X1 X2… X3 ] (think multiple features)
Joint CDF, PDF, PMF are defined similarly to
the case of pair of r.v.’s
Example:
F (x1, x2,...,xn ) = P( X1 ≤ x1, X2 ≤ x2,...,Xn ≤ xn )
60
Covariance Matrix
characteristics summary of random vector
T
cov(X)=cov[X1 X2… Xn] = Σ =E[(X- µ)(X- µ) ]=
…
…
σ 2π
Mean µ, and variance σ2
62
Multivariate Gaussian
1 1
[( x−µ ) −1
( x − µ )]
has density f (x ) =
−
e 2
(2π )
n/2 1/ 2
[
mean vector µ = µ 1, , µ n ]
covariance matrix
63
Why Gaussian?
64
Summary
65