Pattern Reco Tutorial
Pattern Reco Tutorial
March 2025
• Face recognition
• Speech recognition
1
1.3 Bayes’ Theorem
P (x | ωi )P (ωi )
P (ωi | x) =
P (x)
where, X
P (x) = P (x | ωj )P (ωj )
j
λ(ωi | ωj ) = { 0 if i = j1if i ̸= j
Conditional Risk:
X
R(ωi | x) = λ(ωi | ωj )P (ωj | x)
j
Summary
• Bayesian decision theory uses probability to make optimal decisions.
• Bayes’ theorem relates prior, likelihood, and posterior.
• The optimal decision rule minimizes expected loss (risk).
2
Example: Spam Detection using Bayes’ Theorem
We want to determine the probability that an email is spam given that it con-
tains the word ”free”.
Given Data:
• Total emails: 100
• Number of spam emails: 30 ⇒ P (Spam) = 30
100 = 0.3
• Number of non-spam emails: 70 ⇒ P (N otSpam) = 70
100 = 0.7
• Emails with the word ”free”:
18
– Among spam emails: 18 ⇒ P (”f ree” | Spam) = 30 = 0.6
7
– Among not spam emails: 7 ⇒ P (”f ree” | N otSpam) = 70 = 0.1
P (x | ωi )P (ωi )
P (ωi | x) =
P (x)
where, X
P (x) = P (x | ωj )P (ωj )
j
P (”f ree”) = P (”f ree” | Spam)·P (Spam)+P (”f ree” | N otSpam)·P (N otSpam)
Conclusion:
If an email contains the word ”free”, the probability that it is spam is 72%.
classifier:
A classifier is a system or formula that tells us which class an object belongs
to based on some features.
3
Example:
Suppose you want to classify a fruit based on its weight and texture:
If a new fruit has weight = 160g and is smooth, the classifier might predict
it is an Apple.
2. Discriminant Functions
A discriminant function gives a score to each class. We pick the class with the
highest score. Imagine you’re a robot trying to decide between multiple options
(like whether a fruit is an apple or an orange) based on some measurements
(like weight, color, or shape). But you don’t want to guess blindly — you want
to make the best decision using math. This is where Discriminant Functions
help.
Simple Example:
Let the discriminant functions be:
3. Decision Surfaces
A decision surface is the boundary that separates classes. It divides the
feature space.
1D Example:
4
2D Example:
In a two-feature case (e.g., Math score and English score), the decision surface
becomes a line.
(x − µ)2
2 1
p(x|µ, σ ) = √ exp −
2πσ 2 2σ 2
Example:
For Apples:
µ = 150, σ 2 = 100, x = 170
(170 − 150)2
1 1
p(170|Apple) = √ · exp − = · e−2 ≈ 0.053
2π · 100 2 · 100 25.07
For Oranges:
µ = 200, σ 2 = 100, x = 170
1
p(170|Orange) = · e−4.5 ≈ 0.0044
25.07
We choose Apple because its probability is higher.
Where:
Nemerical Example:
You are a robot that guesses whether a fruit is an Apple or an Orange based on
how heavy it is.
5
Given Data:
Fruit Average Weight (grams) Variance
Apple 150 100
Orange 200 100
New Observation
A new fruit weighs x = 170 grams. What is it?
We use the Gaussian (Normal) probability density function:
Gaussian Formula
(x − µ)2
2 1
p(x | µ, σ ) = √ · exp −
2πσ 2 2σ 2
Where:
• x = observed weight (170g)
• µ = average weight
• σ 2 = variance
For Apple
σ 2 = 100
µ = 150,
(170 − 150)2
1
p(170 | Apple) = √ · exp −
2π · 100 2 · 100
1 400 1
=√ · exp − = · e−2 ≈ 0.053
628.32 200 25.07
For Orange
σ 2 = 100
µ = 200,
(170 − 200)2
1
p(170 | Orange) = √ · exp −
2π · 100 2 · 100
1 900 1
= · exp − = · e−4.5 ≈ 0.0044
25.07 200 25.07
6
Decision
For Apple:
For Orange:
Decision
Compare the discriminant values:
Decision: Apple is more likely based on the discriminant function. So, the
robot says:
5. Discrete Features
Discrete features take on a limited number of values (like colors: red, blue,
green).
7
Example: Candy Classification
Color Wrapper Type
Red Shiny Sweet
Blue Dull Sour
Red Shiny Sweet
A new candy is Red and Shiny. From the table, we guess it is Sweet.
Discrete classifiers often use frequency tables or decision trees.
Conclusion
• Classifier: Guesses the class based on features
8
• You know the model (e.g., normal distribution).
• You want a simple and direct estimate.
• All data is observed.
3. Expectation-Maximization (EM)
Goal: Estimate parameters when some information is hidden or incomplete.
How it works:
• E-step (Expectation): Estimate hidden variables using current param-
eters.
• M-step (Maximization): Update parameters using current estimates.
EM is used:
• When data is incomplete or has missing values.
• In algorithms like GMM where group membership is not observed.
4. Bayesian Estimation
Goal: Combine prior knowledge with observed data to estimate parameters.
Example: You believe there’s a 70% chance of rain. Then you see dark
clouds. Bayesian estimation updates your belief based on the new observation.
Bayesian Estimation is used when:
• You have prior knowledge or beliefs about the parameters.
• You want a distribution over parameters, not just one estimate.
9
Summary Table
Method What it does When to use it
MLE Finds best-fit parameters Data is complete and model is known
GMM Finds groups in data Data has hidden groupings
EM Handles missing info Incomplete or hidden data
Bayesian Updates belief Prior knowledge + new data
Numerical Example
1. Maximum Likelihood Estimation (MLE)
Example: You are helping in your family’s fruit shop and want to know how
much an apple usually weighs.
You weigh 3 apples: 140g, 150g, 160g.
140 + 150 + 160
Average = = 150 grams
3
This is called Maximum Likelihood Estimation — you choose the value
(in this case, the average weight) that makes the observed data most likely.
Goal
Use a Gaussian Mixture Model (GMM) to:
• Estimate the average weight of each group
• Predict which cone belongs to which group
10
Steps (Simplified)
1. Guess two averages: µ1 = 81, µ2 = 132
2. For each cone, calculate which mean it’s closer to
3. Update the group means based on those guesses
4. Repeat until the groups stabilize
• Probabilities:
– Cone A = 95% Vanilla
– Cone D = 90% Chocolate
3. Expectation-Maximization (EM)
Example: You’re blindfolded and trying to guess how many red and green balls
are in a bag. You can’t see the color, only feel the texture.
3. Count: Update how many reds and greens you think there are.
4. Repeat steps 2–3 until your guess doesn’t change.
This is how the EM algorithm works — it makes a guess, improves it, and
keeps repeating.
”Let me guess, check, and fix my guess again and again!”
4. Bayesian Estimation
Example: You are guessing how many toy cars your friend has.
”I think he usually has about 5 cars.”
Then your friend says: ”Today I bought 3 more!”
Now you say: ”I think he has about 8 cars now.”
11
This is Bayesian Estimation — you start with a belief (called a prior),
and when you get new information, you update it.
Likelihood × P rior
P osterior =
Evidence
Summary Table
12
Conditional Risk R(αi |x)
The conditional risk is the expected loss for action αi given observation x:
X
R(αi |x) = L(αi |ωj ) · P (ωj |x)
j
Bayes Risk R
Assume two observations x1 and x2 , each equally likely:
• For x1 : P (ω1 |x1 ) = 0.8, P (ω2 |x1 ) = 0.2
R(α1 |x1 ) = 0.4 (decision :α1 )
Summary
• Loss Function: Quantifies penalty for wrong decisions.
• Conditional Risk: Expected loss given an observation.
• Bayes Decision Rule: Choose action with minimum conditional risk.
• Bayes Risk: Average expected loss across all observations.
13