0% found this document useful (0 votes)

13 views

Supervised Unsupervised

Uploaded by

eyoyo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Supervised Unsupervised

Uploaded by

eyoyo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

5.

Supervised and Un-supervised

Classification

Electrical and Computer Engineering

4/24/2024 1
Content
❖ Supervised Classification

❖ Un-Supervised Classification

❖ Classifier Combination

4/24/2024 2
Supervised Vs Unsupervised Classification
❖ Supervised learning: the data (observations, measurements,
etc.) are labelled with pre-defined classes.
❖ Test data are classified into these classes too.
❖ Unsupervised learning (clustering)
❖Class labels of the data are unknown
❖Given a set of data, the task is to establish the existence of
classes or clusters in the data

4/24/2024 3
Supervised Classification- Phases
❖ Learning (training): Learn a model using the training data
❖ Testing: Test the model using unseen test data to assess the
model accuracy

Number of correct classifications

Accuracy = ,
Total number of test cases

4/24/2024 4
Supervised Classification
❖ Let x = (x1, x2, …, xn) be a vector defined in a n-dimensional
feature space.
❖ Let Ω be a set of C classes ωi (i=1, 2, …, C).
❖ Let X be a set of N training samples.
MLE Classifier
❖Hypothesis: only the class-conditional pdfs p(X|ωi) (i=1,2,...,C)
are assumed to be known.
❖ Goal: minimize the average probability of error.
❖ Decision rule:
𝑥 ∈ 𝜔𝑗 𝑝 𝑥 |𝜔𝑗 ≥ 𝑝 𝑥 ≥ 𝜔𝑖 , ∀ 𝑖 = 1,2,3, … , 𝐶

4/24/2024 5
MLE Classifier
❖ Suppose we have two classes 1 and 2.
❖ Compute the likelihoods p(x | 1) and p(x | 2).
❖ To classify test data x assign it to class 1 if p(x | 1) is greater
than p(x| 2) and 2 otherwise.
❖Assume that class likelihood is represented by a Gaussian
distribution with parameters μ(mean) and σ(standard deviation).
(𝑥−𝜇1 )2 (𝑥−𝜇2 )2
1 − 2 1 −
𝑝 x|𝜔1 = 𝑒 2𝜎1 𝑝 x|𝜔2 = 𝑒 2𝜎2
2
2𝜋𝜎1 2𝜋𝜎2
❖ Decision rule:
𝑥 ∈ 𝜔𝑗 𝑝 𝑥 |𝜔𝑗 ≥ 𝑝 𝑥 ≥ 𝜔𝑖 , ∀ 𝑖 = 1,2,3, … , 𝐶

4/24/2024 6
Gaussian classification example
❖ Consider one dimensional data for two classes (SNP
genotypes for case and control subjects).
– Case (class 1): 1, 1, 2, 1, 0, 2
– Control (class 2): 0, 1, 0, 0, 1, 1
❖ Under the Gaussian assumption case and control classes are
represented by Gaussian distributions with parameters (μ1, σ1)
and (μ2, σ2) respectively. The maximum likelihood estimates of
means are
σ𝑁 1
𝑖=1 𝑥𝑖 1+1+2+1+0+2
𝑚1 = = = 7Τ6
𝑁1 6
σ𝑁 2
𝑖=1 𝑥𝑖 0+1+0+0+1+1
𝑚2 = = = 3Τ6
𝑁2 6
❖ The estimates of class standard deviations are
Gaussian classification example
σN 2
i=1(xi − m1)
2
1 =
N1
(1 − 7Τ6)2 +(1 − 7Τ6)2 +(2 − 7Τ6)2 +(1 − 7Τ6)2 +(0 − 7Τ6)2 (2 − 7Τ6)2
=
N1
= 0.47

❖ Similarly, 2 = 0.25
❖ Which class does x=1 belong to? What about x=0 and x=2?
Maximum A Posterior Classification
❖ Hypothesis: a posteriori (posterior) probabilities of classes P(ωi |
x), (i = 1, 2, ..., C) are assumed to be known.
❖ Goal: minimize the average probability of error.
❖ Decision rule: a pattern x is assigned to the class that maximizes
the a posteriori probability P(ωi|x):

❖ Since the posterior probabilities are often not directly known, it is

preferable to rewrite the MAP decision rule by using the Bayes
𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 x 𝑝𝑟𝑖𝑜𝑟
theorem 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 = as follows:
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒

4/24/2024 9
Maximum A Posterior Classification
❖ If C is small, estimating likelihood is feasible.
❖ However, if C (the number of classes) is very large, estimating
likelihood is a very expensive task over a large dataset.
P(x1, x2,x3 … xn|𝑖) = P(x1| x2,x3 … xn,𝑖). P(x2| x3 … xn,𝑖). P(xn|𝑖)

Naïve Bayes Classifier

4/24/2024 12
kNN Density Estimation as a Bayesian classifier
❖ The main advantage of kNN is that it leads to a very simple
approximation of the (optimal) Bayes classifier.
❖ Assume that we have a dataset with 𝑁 examples, 𝑁𝑖 from class 𝜔𝑖,
and that we are interested in classifying an unknown sample 𝑥𝑢
❖ We draw a hyper-sphere of volume 𝑉 around 𝑥𝑢. Assume this
volume contains a total of 𝑘 examples, 𝑘𝑖 from class 𝜔𝑖
❖ We can then approximate the likelihood functions as

❖ Similarly, the unconditional density can be estimated as

𝑘
𝑝 𝑥 =
𝑁𝑉

4/24/2024 13
kNN Density Estimation as a Bayesian classifier
❖ And the priors are approximated by

❖ Putting everything together, the Bayes classifier becomes

4/24/2024 14
The kNN classifier
❖ The kNN rule is a very intuitive method that classifies unlabeled
examples based on their similarity to examples in the training set
❖ For a given unlabeled example 𝑥𝑢 ∈ ℜ𝐷, find the 𝑘 “closest”
labeled examples in the training data set and assign 𝑥𝑢 to the class
that appears most frequently within the k-subset
❖ The kNN only requires
❖ An integer k
❖ A set of labeled examples (training data)
❖A metric to measure “closeness”

4/24/2024 15
The kNN classifier
Example
❖ In the example here we have three classes and the goal is to find
a class label for the unknown example 𝑥𝑢
❖ In this case we use the Euclidean distance and a value of 𝑘 = 5
neighbors
❖ Of the 5 closest neighbors, 4 belong to 𝜔1 and 1 belongs to 𝜔3,
so 𝑥𝑢 is assigned to 𝜔1, the predominant class

4/24/2024 16
Discriminant Functions
❖ A useful way of representing classifiers is through discriminant
functions gi(x) (i = 1, 2, ..., C), where the classifier assigns a feature
vector x to class ωi if

❖ These functions divide the feature space into C decision regions

(ℜ1,...,ℜC), separated by decision boundaries.
❖ For the classifier that minimizes error/ Bayesian Classifier

❖ Maximum likelihood

4/24/2024 17
Discriminant Functions
❖ A two category classifier can often be written in the form

❖ where g(x) is a discriminant function, and

4/24/2024 18
Discriminant Functions
❖ In the following, we will study in detail the behavior of the minimum-error-
rate discriminant functions for classification problems characterized by C classes
with multivariate Gaussian distributions:

❖ The minimum error-rate classification can be achieved by the discriminant

functions

4/24/2024 19
Discriminant Functions
❖ In case of multivariate normal densities

❖ Case :- 𝛴𝑖 = 𝜎 2 . 𝐼 where I is the identity matrix

❖ Features are statistically independent and each feature has the same variance
irrespective of the class

❖ A classifier that uses linear discriminant functions is called “a linear

machine”
4/24/2024 20
Discriminant Functions
❖ Decision boundaries are the hypersurfaces corresponding to

❖ The hyperplane separating Ri and Rj passes through the point x0

❖ and is orthogonal to the vector w.

4/24/2024 21
Discriminant Functions

4/24/2024 22
Discriminant Functions

4/24/2024 23
Discriminant Functions

4/24/2024 24
Discriminant Functions
❖ (covariance matrices of all classes are identical, but otherwise arbitrary!)
❖ Discriminant functions become

Case 2:- 𝛴𝑖 = 𝛴

4/24/2024 25
Discriminant Functions
❖ Decision boundaries are

❖ Hyperplane passes through x0 but is not necessarily orthogonal to the

line between the means.

4/24/2024 26
Discriminant Functions

4/24/2024 27
Discriminant Functions
❖ Case 3:- 𝛴𝑖 = 𝑎𝑟𝑏𝑖𝑡𝑎𝑟𝑦
❖ The covariance matrices are different for each category.
❖ Discriminant functions are

❖ In the 2-category case, the decision surfaces are hyperquadrics that can assume
any of the general forms: hyperplanes, pairs of hyperplanes, hyperspheres,
hyperellipsoids, hyperparaboloids, hyperhyperboloids)

4/24/2024 28
Discriminant Functions
❖ Example
❖ Let ω1 and ω2 be two classes such that:
❖p(x| ωi) = Ν(mi, Σi) (i = 1, 2)
❖and

4/24/2024 29
Discriminant Functions
❖ Applying what seen, the related discriminant functions are:

❖ The decision boundary takes thus the following form:

4/24/2024 30
Discriminant Functions
❖ Example
❖ Determine expression for the following set of parameters.

4/24/2024 31
Classifier Combination
❖ In order to achieve robust and accurate classification, a
possible solution consists in combining (fusing) an ensemble of
different classifiers so that to exploit the peculiarities of each of
them synergistically.

4/24/2024 32
Classifier Combination
❖ The idea to combine different “experts” to solve a given complex
problem has been exploited in different application domains.
❖ Statistical Estimation: in 1818, Laplace proved that an opportune
combination of two probabilistic methods can yield a more accurate
statistical model.
❖ Electrical Component Reliability: the problem of designing a reliable
system with unreliable components has been faced by Von Neumann in
1956. Nowadays, redundancy and combination have become golden
rules.
❖ Meteorology: the advantages of combining different meteorological
predictors have been widely recognized within the weather community.

4/24/2024 33
Classifier Combination
❖ Under opportune conditions, it can be shown that the fusion of
classifiers allows leading to reduced variance and bias of the
classification error as well as to superior robustness.
❖ Typical combination scenarios are as follows:
❖Traditional classification (TC): all classifiers of the ensemble
work on the same feature space.
❖Multisensor/multisource classification (MC): each classifier is
fed by a different source of information.
❖Hyperdimensional classification (HC): each classifier is
defined over a subset of the available features.

4/24/2024 34
MC: Ensemble Definition
❖Data acquired by each sensor (information source) are given in
input to a corresponding classifier.
❖The classifier outputs are then combined to yield the final
decision.

4/24/2024 35
MC: Ensemble Definition
❖Appropriate statistical models (classifiers) can be adopted for
each sensor (information source);
❖Avoids dealing simultaneously with a large number of input
features.

4/24/2024 36
Fusion Architectures
❖ Parallel Architecture: The classifiers are gathered within a parallel
architecture. Their outputs are combined by means of an appropriate fusion
strategy (e.g., majority vote, averaging, dynamic selection, etc…).
❖ Cascade Architecture: Classifiers are put in cascade, each devoted to
analyze a single class (or subset of classes).
❖ Hybrid Architecture: Mix between parallel and cascade architectures.

4/24/2024 37
Parallel Architecture
❖ In the following, we will study different fusion strategies
developed for the parallel architecture, which is the most
widespread.
❖ Three main categories of strategies:

❖ In particular, we will analyze three fusion strategies typically

adopted, each referring to one of the previous categories:
❖Majority Vote
❖Averaging
❖Weighted Averaging
4/24/2024 38
Parallel Architecture

4/24/2024 39

L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Bayesian
No ratings yet
Bayesian
21 pages
Lec 04
No ratings yet
Lec 04
70 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
8
No ratings yet
8
141 pages
Lec 2
No ratings yet
Lec 2
37 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Uoc Luong Phi Tham So
No ratings yet
Uoc Luong Phi Tham So
84 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
Notes Chapter Linear Classifiers
No ratings yet
Notes Chapter Linear Classifiers
4 pages
SGN-2506 Introduction To Pattern Recognition Handout
No ratings yet
SGN-2506 Introduction To Pattern Recognition Handout
82 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
Data Classification
No ratings yet
Data Classification
159 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
Lec 1
No ratings yet
Lec 1
42 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
No ratings yet
Linear Models For Classification: Sumeet Agarwal, EEL709 (Most Figures From Bishop, PRML)
21 pages
DM See M4
No ratings yet
DM See M4
8 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Classificationi 4
No ratings yet
Classificationi 4
4 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Classification Naive Bayes
No ratings yet
Classification Naive Bayes
17 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Chapter - 5 (New) PDF
No ratings yet
Chapter - 5 (New) PDF
17 pages
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Linear Discriminant Functions: CS479/679 Pattern Recognition Dr. George Bebis
41 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
03 Classification Methods
No ratings yet
03 Classification Methods
37 pages
Classifier Conditional Posterior Probabilities: Robert P.W. Duin, David M.J. Tax
No ratings yet
Classifier Conditional Posterior Probabilities: Robert P.W. Duin, David M.J. Tax
9 pages
7. Statistical Perspective
No ratings yet
7. Statistical Perspective
85 pages
Lecture 6_Generative Models
No ratings yet
Lecture 6_Generative Models
33 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Classification-Clustering
No ratings yet
Classification-Clustering
44 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
8 ML
No ratings yet
8 ML
22 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
44 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
AE - Tema 5 - Two-class Fisher Discriminant Analysis
No ratings yet
AE - Tema 5 - Two-class Fisher Discriminant Analysis
6 pages
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
11 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
work sheet
No ratings yet
work sheet
10 pages
Chapter Two Transformers
No ratings yet
Chapter Two Transformers
48 pages
Computer Programming
No ratings yet
Computer Programming
147 pages
1 (2)
No ratings yet
1 (2)
2 pages
Exit Exam Comptency & Course List
100% (1)
Exit Exam Comptency & Course List
18 pages
Wireless Automatic Water-Meter Reading System
No ratings yet
Wireless Automatic Water-Meter Reading System
7 pages
Analysis of First and Second Order Circuits
No ratings yet
Analysis of First and Second Order Circuits
19 pages
DSP Chapter One
No ratings yet
DSP Chapter One
13 pages
Capter 6
No ratings yet
Capter 6
55 pages
Power System 6
No ratings yet
Power System 6
29 pages
Final Class Schedule (2051EC)
No ratings yet
Final Class Schedule (2051EC)
11 pages
Chapter Five Entre Part 1
No ratings yet
Chapter Five Entre Part 1
39 pages
Chapter One
No ratings yet
Chapter One
12 pages
Chapter Four Entre
No ratings yet
Chapter Four Entre
47 pages
CH 2
No ratings yet
CH 2
45 pages
35-Principles of Accounting I Worksheet
No ratings yet
35-Principles of Accounting I Worksheet
6 pages
IT8601 unitIV
No ratings yet
IT8601 unitIV
47 pages
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
No ratings yet
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
24 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
BAM - The Behance Artistic Media Dataset For Recognition Beyond Photography
No ratings yet
BAM - The Behance Artistic Media Dataset For Recognition Beyond Photography
10 pages
Detecting Pesticide Concentration Levels Using Colour Information and Machine Learning (Final, Wo Authors)
No ratings yet
Detecting Pesticide Concentration Levels Using Colour Information and Machine Learning (Final, Wo Authors)
8 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
An Analysis of IoT Devices On Home Networks
No ratings yet
An Analysis of IoT Devices On Home Networks
17 pages
Abuelaish - 2017 - Urban Land Use Change Analysis and Modeling A Case Study of The Gaza Strip
No ratings yet
Abuelaish - 2017 - Urban Land Use Change Analysis and Modeling A Case Study of The Gaza Strip
73 pages
Wheat Leaf Disease Detection Using Machine Learning Method-A Review
No ratings yet
Wheat Leaf Disease Detection Using Machine Learning Method-A Review
6 pages
Electronics 12 00488 v2
No ratings yet
Electronics 12 00488 v2
34 pages
PGP Machine Learning Brochure
No ratings yet
PGP Machine Learning Brochure
20 pages
B Tech III Year CSE Syllabus 2020 21
No ratings yet
B Tech III Year CSE Syllabus 2020 21
24 pages
Final IEEE Parkinson
No ratings yet
Final IEEE Parkinson
4 pages
SIMPLIF
No ratings yet
SIMPLIF
19 pages
Deep Learning
No ratings yet
Deep Learning
42 pages
ICT515 Assignment1
No ratings yet
ICT515 Assignment1
2 pages
BCM601-Module 1
No ratings yet
BCM601-Module 1
35 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Ronen Et Al. - 2022 - DeepDPM Deep Clustering With an Unknown Number of Clusters
No ratings yet
Ronen Et Al. - 2022 - DeepDPM Deep Clustering With an Unknown Number of Clusters
24 pages
Convolutional Neural Network Project On Image Classification
No ratings yet
Convolutional Neural Network Project On Image Classification
8 pages
ML Assignment Report Prithvi D
No ratings yet
ML Assignment Report Prithvi D
15 pages
Wart Treatment Using Machine Learning Support Vector Algorithm
No ratings yet
Wart Treatment Using Machine Learning Support Vector Algorithm
6 pages
Learning: Chapter 17: Rich & Knight
No ratings yet
Learning: Chapter 17: Rich & Knight
30 pages
Mathematical Foundations of Machine Learning
100% (1)
Mathematical Foundations of Machine Learning
340 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Guide
No ratings yet
Guide
210 pages
V14 Cse Aiml Iii Year
No ratings yet
V14 Cse Aiml Iii Year
41 pages
DECISION TREE
No ratings yet
DECISION TREE
9 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
34 pages
ML Ca2
No ratings yet
ML Ca2
3 pages

Supervised Unsupervised

Uploaded by

Supervised Unsupervised

Uploaded by

5.

Supervised and Un-supervised

Electrical and Computer Engineering

Number of correct classifications

❖ Since the posterior probabilities are often not directly known, it is

Naïve Bayes Classifier

❖ Similarly, the unconditional density can be estimated as

❖ Putting everything together, the Bayes classifier becomes

❖ These functions divide the feature space into C decision regions

❖ where g(x) is a discriminant function, and

❖ The minimum error-rate classification can be achieved by the discriminant

❖ Case :- 𝛴𝑖 = 𝜎 2 . 𝐼 where I is the identity matrix

❖ A classifier that uses linear discriminant functions is called “a linear

❖ The hyperplane separating Ri and Rj passes through the point x0

❖ and is orthogonal to the vector w.

❖ Hyperplane passes through x0 but is not necessarily orthogonal to the

❖ The decision boundary takes thus the following form:

❖ In particular, we will analyze three fusion strategies typically

You might also like