0% found this document useful (0 votes)

19 views50 pages

Class10 14 PatternClassification - 13 24sept2019

The document discusses data science and machine learning concepts. It explains that data science uses scientific methods to extract knowledge and insights from structured and unstructured data. Machine learning uses data to build models that can perform predictive tasks like classification. Classification involves building a model from labeled training data, then using the model to predict the class of new unlabeled data. The document provides examples of classification problems and illustrates training data with labeled examples.

Uploaded by

Saili Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views50 pages

Class10 14 PatternClassification - 13 24sept2019

Uploaded by

Saili Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

24-09-2019

Data Modeling

Data Science
• Multi-disciplinary field that uses scientific methods,
processes, algorithms and systems to extract
knowledge and insight from structured and
unstructured data
• Central concept is gaining insight from data
• Machine learning uses data to extract knowledge

Data Modeling Inference

Data Collection (Machine
Learning)

Data Preprocessing

Data
Feature
Database Cleaning and
Representation
Cleansing
2

1
24-09-2019

Data Modeling Inference

Data Collection (Machine
Learning)

Data Preprocessing

Data
Feature
Database Cleaning and
Representation
Cleansing
3

Descriptive Data Analytics

• It helps us to study the general characteristics of data
and identify the presence of noise or outliers
• Data characteristics:
– Central tendency of data
• Centre of the data
• Measuring mean, median and mode
– Dispersion of data
• The degree to which numerical data tend to spread
• Measuring range, quartiles, interquartile range (IQR), the
five-number summery and standard deviation
• Descriptive analytics are the backbone of reporting

2
24-09-2019

Data Modeling Inference

Data Collection (Machine
Learning)

Data Preprocessing

Data
Feature
Database Cleaning and
Representation
Cleansing
5

Predictive Data Analytics

• It is used to identify the trends, correlations and
causation by learning the patterns from data
• Study and construction of algorithms that can learn
from data and make predictions on data
• It involve tasks like
– Classification:
• E.g.: predicting the presence or absence of disease or
• the classification of disease according to symptoms
– Regression: Numeric prediction
• E.g.: predicting the landslide or
• predicting the rainfall
– Clustering:
• E.g.: grouping the similar items to be sold or
• grouping the people from the same region
• Learning from data
6

3
24-09-2019

Pattern Classification

Classification
• Problem of identifying to which of a set of categories a
new observation belongs
• Predicts categorical labels
• Example:
– Assigning a given email to the "spam" or "non-spam"
class
– Assigning a diagnosis (disease) to a given patient based
on observed characteristics of the patient
• Classification is a two step process
– Step1: Building a classifier (data modeling)
• Learning from data (training phase)
– Step2: Using classification model for classification
• Testing phase

4
24-09-2019

Step1: Building a Classification Model

(Training Phase)
• A classifier is built describing a predetermined set of
data classes
• This is a learning step (or training phase)
• Training phase: A classification algorithm builds the
classifier by analysing or learning from a training data
set made up of tuples (samples) and their class labels
• In the context of machine learning, data tuples can be
referred to as samples, examples, instance, data
vectors, data points

Step1: Building a Classification Model

(Training Phase)
• Suppose a training data consist of N tuples (or data
vectors) described by d-attributes (d -dimensions)
D  {x n }nN1 , x n  R d
• Each tuple (or data vector) is assumed to belong to a
predefined class
– Class is determined by another attribute ((d+1)th
attribute) called the class label attribute
– Class label attribute is discrete-valued and unordered
– It is a categorical (nominal) in that each value serves as
a category or class
• Individual tuples (or data vectors) making up training
set are referred as training tuples or training samples
or training examples or training data vectors

5
24-09-2019

2-class Classification
• Example: Classifying a person as child or adult

Weight (x2)
Adult

Height, x1 Class
Adult/Child
Classifier Child
Weight, x2

Adult :Class C1
Child :Class C2 Height (x1)

x = [x1 x2]T

Illustration of Training Set: Adult-Child

• Number of training examples (N) = 20
• Dimension of a training example = 2
• Class label attribute is 3rd dimension
• Class:
– Child (0)
– Adult (1)

Weight
in Kg

Height in cm 12

6
24-09-2019

Illustration of Training Set – Iris

(Flower) Data
• Number of training
examples (N) = 20
• Dimension of a
training example =
4
• Class label attribute
is 5th dimension
• Class:
– Iris Setosa (1)
– Iris Versicolour (2)
– Iris Virginica (3)

Illustration of Training Set – Iris

(Flower) Data

1: Iris Setosa 2: Iris Versicolour 3: Iris Virginica 14

7
24-09-2019

Step1: Building a Classification Model

(Training Phase)
• Training phase or learning phase is viewed as the
learning of a mapping or function that can predict the
associated class label of a given training example
yn  f (x n )
– xn is the nth training example and yn is the associated
class label
• Supervised learning:
– Class label for each training example is provided
– In supervised learning, each example is a pair consisting
of an input example (typically a vector) and a desired
output value

Step1: Building a Classification

Model (Training Phase)
Feature
extraction Training Examples
90 21.5 Child
Feature
extraction
100 32.45 Child

Feature
Training extraction
98 28.43 Child Classifier
Phase

Feature
extraction
183 90 Adult

Feature
extraction
163 67.45 Adult

8
24-09-2019

Step2: Classification (Testing Phase)

• Trained model is used for classification
• Predictive accuracy of the classifier is estimated
• Accuracy of a classifier:
– Accuracy of a classifier on a test set is percentage of test
examples that are correctly classified by the classifier
– The associated class label of each test example (ground
truth) is compared with the learned classifier’s class
prediction for that example
• Generalization ability of trained model: Performance
of trained models on new (test) data
• Target of learning techniques: Good generalization
ability

Step2: Classification (Testing Phase)

Feature
extraction Training Examples
90 21.5 Child
Feature
extraction
100 32.45 Child

Class label
Feature
(Adult)
Training extraction
98 28.43 Child Classifier
Phase

Feature
extraction
183 90 Adult

Feature
extraction
163 67.45 Adult

Feature
extraction
Testing
150 50.6
Phase
18

9
24-09-2019

Pattern Classification Problems

x2 x2 x2

x1 x1 x1
Linearly Nonlinearly Overlapping
separable separable classes
classes classes

• 1, 2, 3, 4, 5, ?, …, 24, 25, 26, 27, ?

• 1, 3, 5, 7, 9, ?, …, 25, 27, 29, 31, ?
• 2, 3, 5, 7, 11, ?, …, 29, 31, 37, 41, ?
• 1, 4, 9, 16, 25, ?, …, 121, 144, 169, ?
• 1, 2, 4, 8, 16, 32, ?,…, 1024, 2048, 4096, ?
• 1, 1, 2, 3, 5, 8, ?, …, 55, 89, 144, 233, ?
• 1, 1, 2, 4, 7, 13, ?, 44, 81, 149, 274, 504, ?
• 3, 5, 12, 24, 41, ?, …., 201, 248, 300, 357, ?
• 1, 6, 19, 42, 59, ?, …, 95, 117, 156, 191, ?

10
24-09-2019

• 1, 2, 3, 4, 5, 6, …, 24, 25, 26, 27, 28

• 1, 3, 5, 7, 9, 11, …, 25, 27, 29, 31, 33
• 2, 3, 5, 7, 11, 13, …, 29, 31, 37, 41, 43
• 1, 4, 9, 16, 25, 36, …, 121, 144, 169, 196
• 1, 2, 4, 8, 16, 32, 64,…, 1024, 2048, 4096, 8192
• 1, 1, 2, 3, 5, 8, 13, …, 55, 89, 144, 233, 377
• 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, 927
• 3, 5, 12, 24, 41, 63, ….., 201, 248, 300, 357, 419
(2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62)
• 1, 6, 19, 42, 59, ?, …, 95, 117, 156, 191, ?

• Pattern: Any regularity or structure in data or source of

data
• Pattern Analysis: Automatic discovery of patterns in
data
21

Image Classification

Tiger

Giraffe

Horse

Bear

11
24-09-2019

Scene Image Classification

Tall Inside Street Highway Coast Open Mountain Forest
building city country

Nearest-Neighbour Method
• Training data with N samples: D  {x n , y n }n 1 ,
N

x n  R d and y n  {1, 2,  , M }
– d: dimension of input example
– M: Number of classes
• Step 1: Compute Euclidian distance for a test example
x with every training examples, x1, x2, …, xn, …, xN
Euclidean distance  x n  x
 (x n  x)T (x n  x)
x2 d
T
x  [ x1 x2 ]
  (x
i 1
ni  xi ) 2

x1
24

12
24-09-2019

Nearest-Neighbour Method
• Training data:D  {x n , y n }n 1 ,
N

x n  R d and yn  {1, 2,  , M }
– d: dimension of input example
– M: Number of classes
• Step 1: Compute Euclidian distance for a test example
x with every training examples, x1, x2, …, xn, …, xN
• Step 2: Sort the examples
in the training set in the
ascending order of the
x2 distance to x

x  [ x1 x2 ]T • Step 3: Assign the class of

the training example with
the minimum distance to
x1 the test example, x
25

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 1: Compute Euclidian distance
(ED) will each training examples

13
24-09-2019

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 2: Sort the examples in the
training set in the ascending order
of the distance to test example

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 3: Assign the class of the
training example with the
minimum distance to the test
example
– Class: Adult
28

14
24-09-2019

Nearest-Neighbour Method
• Training data:D  {x n , y n }n 1 ,
N

x n  R d and yn  {1, 2,  , M }
– d: dimension of input example
– M: Number of classes
• Step 1: Compute Euclidian distance for a test example
x with every training examples, x1, x2, …, xn, …, xN
• Step 2: Sort the examples
in the training set in the
ascending order of the
x2 distance to x

x  [ x1 x2 ]T • Step 3: Assign the class of

the training example with
the minimum distance to
x1 the test example, x
29

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 1: Compute Euclidian distance
(ED) will each training examples

15
24-09-2019

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 2: Sort the examples in the
training set in the ascending order
of the distance to test example

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 3: Assign the class of the
training example with the minimum
distance to the test example
– Class: Adult

16
24-09-2019

K-Nearest Neighbours (K-NN) Method

• Consider the class labels of the K training examples
nearest to the test example
• Step 1: Compute Euclidian distance for a test example
x with every training examples, x1, x2, …, xn, …, xN

Euclidean distance  x n  x
 (x n  x)T (x n  x)
x2 d
T
x  [ x1 x2 ]   (x
i 1
ni  xi ) 2

K-Nearest Neighbours (K-NN) Method

• Consider the class labels of the K training examples
nearest to the test example
• Step 1: Compute Euclidian distance for a test example
x with every training examples, x1, x2, …, xn, …, xN
• Step 2: Sort the examples
in the training set in the
ascending order of the
distance to x
x2 • Step 3: Choose the first K
T examples in the sorted list
x  [ x1 x2 ]
– K is the number of
neighbours for text
x1 example

• Step 4: Test example is assigned the most common

class among its K neighbours
34

17
24-09-2019

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Consider K=5
• Step 3: Choose the first K=5
examples in the sorted list

Illustration of Nearest Neighbour

Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Consider K=5
• Step 4: Test example is assigned
the most common class among its
K neighbours
– Class: Adult
36

18
24-09-2019

Determining K, Number of Neighbours

• This is determined experimentally
• Starting with K=1, test set is used to estimate the
accuracy of the classifier
• This process is repeated each time by incrementing K
to allow for more neighbour
• The K value that gives the maximum accuracy may be
selected
• Preferably the value of K should be an odd number.

Data Normalization
• Since the distance measure is used, K-NN classifier
require normalising the values of each attribute
• Normalising the training data:
– Compute the minimum and maximum values of each of
the attributes in the training data
– Store the minimum and maximum values of each of the
attributes
– Perform the min-max normalization on training data set’
• Normalizing the test data:
– Use the stored minimum and maximum values of each
of the attributes from training set to normalise the test
examples
• NOTE: Ensure that test examples are not causing out-
of-bound error

19
24-09-2019

K-Nearest Neighbours (K-NN) Method

• Step 4: Test example is assigned the most common

class among its K neighbours
39

Learning from Data

20
24-09-2019

• 1, 2, 3, 4, 5, ?, …, 24, 25, 26, 27, ?

• 1, 2, 3, 4, 5, 6, …, 24, 25, 26, 27, 28

• Pattern: Any regularity or structure in data or source of

data
• Pattern Analysis: Automatic discovery of patterns in
data
42

21
24-09-2019

Image Classification

Tiger

Giraffe

Horse

Bear

Scene Image Classification

Tall Inside Street Highway Coast Open Mountain Forest
building city country

22
24-09-2019

Machine Learning for Pattern Recognition

• Learning: Acquiring new knowledge or modifying the
existing knowledge
• Knowledge: Familiarity with information present in data
• Learning by machines for pattern analysis: Acquisition of
knowledge from data to discover patterns in data
• Data-driven techniques for learning by machines: Learning
from examples (Training of models)
• Generalization ability of learning machines: Performance of
trained models on new (test) data
• Target of learning techniques: Good generalization ability
• Learning techniques: Estimation of parameters of models

Lazy Learning : Learning from Neighbours

• The K nearest neighbour classifier is an example of
lazy learner
• Lazy learning waits until the last minute before doing
any model construction to classify test example
• When the training examples are given, a lazy learner
simply stores them and waits until it is given a test
example
• When it sees the test example, then it classify based
on its similarity to the stored training examples
• Since the lazy learns stores the training examples or
instances, they also called instance based learners
• Disadvantages:
– Making classification or prediction is computationally
intensive
– Require efficient huge storage techniques when the
training samples are huge 46

23
24-09-2019

Data Preparation for the Classification

• Divide the data into training set and test set
– Example:
• Training data contain 70% of samples from each class
• Test data contain remaining 30% of samples from each
class

Data Preparation for the Classification

using K-Nearest Classifier
• Suppose the data set has 3000 samples
• Each sample is belonging to one of the 3 classes
• Suppose each class has 1000 samples
– Step1: From class1, 70% i.e. 700 samples considered as
training samples and remaining 30% i.e. 300 samples
are considered as test samples
– Step2: From class2, 70% i.e. 700 samples considered as
training samples and remaining 30% i.e. 300 samples
are considered as test samples
– Step3: From class3, 70% i.e. 700 samples considered as
training samples and remaining 30% i.e. 300 samples
are considered as test samples
– Step4: Combine training examples from each class
• Training set now contain 700+700+700=2100 samples
– Step5: Combine test examples from each class
• Test set now contain 300+300+300=900 samples
48

24
24-09-2019

Performance Evaluation for

Classification

Confusion Matrix
Actual Class

Class1 Class2
(Positive) (Negative)
Predicted

Class1
Class

Class

True Positive False Positive

(Positive)
Class2 False
True Negative
(Negative) Negative

• True Positive: Number of test samples correctly

predicted as positive class.
• True Negative: Number of test samples correctly
predicted as negative class.
• False Positive: Number of test samples predicted as
positive class but actually belonging to negative class.
• False Negative: Number of test samples predicted as
negative class but actually belonging to positive class. 50

25
24-09-2019

Confusion Matrix

Actual Class

Class1 Class2
Class Predicted

(Positive) (Negative)

Class1 True False

Class

(Positive) Positive Positive

Class2 False True

(Negative) Negative Negative

Total test
samples
in class1

Confusion Matrix

Actual Class

Class1 Class2
Class Predicted

(Positive) (Negative)

Class1 True False

Class

(Positive) Positive Positive

Class2 False True

(Negative) Negative Negative

Total test
samples
in class2

26
24-09-2019

Confusion Matrix

Actual Class

Class1 Class2
Class Predicted

(Positive) (Negative)
Total test
Class1 True False
Class

samples
(Positive) Positive Positive predicted as
class1
Class2 False True
(Negative) Negative Negative

Confusion Matrix

Actual Class

Class1 Class2
Class Predicted

(Positive) (Negative)

Class1 True False

Class

(Positive) Positive Positive

Total test
Class2 False True samples
(Negative) Negative Negative predicted as
class2

27
24-09-2019

Accuracy
•

Actual Class

Class1 Class2
(Positive) (Negative)
Predicted

Class1
Class

Class

True Positive False Positive

(Positive)
Class2 False
True Negative
(Negative) Negative

Confusion Matrix - Multiclass

Actual Class

Class1 Class2 Class3

Predicted Class

Class1
C11 C21 C31
Class2
C12 C22 C32

Class2 C13 C23 C33

• True Positive: Number of test samples correctly predicted

as positive class (C11).
• True Negative: Number of test samples correctly predicted
as negative class (C22+C33).
• False Positive: Number of test samples predicted as positive
class but actually belonging to negative class (C21+C31)
• False Negative: Number of test samples predicted as
negative class but actually belonging to positive class
(C12+C13) 56

28
24-09-2019

Confusion Matrix - Multiclass

Actual Class

Class1 Class2 Class3

Total samples
Predicted Class

Class1
C11 C21 C31 predicted as
class1
Total samples
Class2
C12 C22 C32 predicted as
class2
Total samples
Class2
C13 C23 C33 predicted as
class3
Total samples Total samples Total samples in
Total
in class1 in class2 class3

Total samples used for testing

Accuracy of Multiclass Classification

•

Actual Class

Class1 Class2 Class3

Predicted Class

Class1
C11 C21 C31
Class2
C12 C22 C32

Class2 C13 C23 C33

29
24-09-2019

K-Nearest Neighbours (K-NN) Method

• Step 4: Test example is assigned the most common

class among its K neighbours
59

Reference Templates Method

• Each class is represented by its reference templates
– Mean of each data points of each class as reference
template
• The class of the nearest reference template (mean) is
assigned to the test pattern
μi: Mean vector of
Euclidean distance  x  μ i
i
class
T
 (x  μ i ) (x  μ i )
μ1  [ 11 12 ]T
d
  (x
j 1
j   ij ) 2
T
x2 x  [ x1 x2 ] • Learning: Estimating
first order statistics
μ 2  [  21  22 ]T (mean) from the data
of each class
x1 60

30
24-09-2019

Modified Reference Templates Method

• Each class is represented by one or more reference
templates
– Mean and variance of data points of each class as
reference template
• The class of the nearest reference templates is
assigned to the test pattern
μi & Σi : Mean vector
x  μi and Covariance
Mahalanobi s distance 
i matrix of class i
T
μ1  [ 11 12 ]  (x  μ i )T  i1 (x  μ i )
2 1
• Learning: Estimating
x2 – first order statistics (mean)
and
x  [ x1 x2 ]T – Second order statistics
μ 2  [  21  22 ]T (variance and covariance)
from the data of each class
x1 61

Probability Distribution
• Data of a class is represented by a probability
distribution
• For a class whose data is considered to be forming a
single cluster, it can be represented by a normal or
Gaussian distribution
• Multivariate Gaussian distribution:
– Adult-Child class

Weight
in Kg

Height in cm 62

31
24-09-2019

Weight
in Kg
Height in cm 63

Multivariate Gaussian Distribution

• Data in d-dimensional space
p(x)  N (x | μ, Σ)
1  1 
 1/ 2
exp   ( x  μ ) T Σ -1 ( x  μ ) 
(2 ) d / 2 Σ  2 
Mahalanobis distance
–  is the mean vector
– Σ is the covariance matrix
• Bivariate Gaussian distribution: d=2
 x1      E [ x1 ] 
x  μ  1  
 x2    2   E [ x 2 ]

 12  12 
Σ 
 21  2 
2


 E  x1  1 2
Σ
 E  x1  1  x 2   2 

 E  x 2   2 x1   1  
E x 2   2 
2
 
64

32
24-09-2019

Bayes Classifier: Multivariate Data

• Let C1, C2, …, Ci, …, CM be the M classes
– Each class has Ni number of training examples
• Given: a test example x
• Bayes decision rule:
Likelihood Prior
Posterior Probability
of a class p(x | Ci ) P(Ci )
P(Ci | x) 
P(x) Evidence
Ni
– Prior: Prior information of a class P(Ci ) 
N
• where, N is total number of training examples
M

– Evidence: Evidence/probability that x exists p(x)   p(x | Ci )P(Ci )

i 1
• Out of all the samples, what is the probability of the
sample we are looking at
– Likelihood follows the distribution of the data of a class
Class label for x = arg max P(Ci | x) i  1, 2,..., M
i 65

Probability Theory and Bayes Rule

C1 C2 • The sample space is partitioned

into C1, C2, …, Ci, …, CM where
each partitions are disjoint
– Example:
• Data space is sample space
• Each class is my partitions
x
Ci • Let x be an event defined in
sample space
CM – Example: A finite data points
(training data) are the event x

• P(x): Total probability i.e. joint probability of x and Ci,

P(x, Ci), for all i M M
P(x)   px, Ci    px | Ci PCi 
i 1 i 1

• P(x) is marginal probability – probability of x is obtained

by marginalising over the events Ci 66

33
24-09-2019

Probability Theory and Bayes Rule

C1 C2 • Conditional probability:
px, Ci 
p(x | Ci )  (1)
P(Ci )
px, Ci 
p(Ci | x)  (2)
x P(x)
Ci • From (1) and (2)
CM
p(x | Ci ) P(Ci )  p(Ci | x) P(x)

• Bayes decision rule:

p(x | Ci ) P(Ci )
P(Ci | x) 
P(x)

Bayes Classifier: Multivariate Data

• Data of a class is represented by a probability
distribution
• Given: a test example x
• Bayes decision rule:
Likelihood Prior
Posterior Probability
of a class p(x | Ci ) P(Ci )
P(Ci | x) 
P(x) Evidence

– Likelihood of a class follows the distribution of the data

of a class
– Computation of likelihood of a class depends on the
distribution of the data and the parameters of that
distribution
p(x | θi ) P(Ci )
• Bayes decision rule can be given as P(θi | x) 
P(x)
– θi is the parameters of the distribution of class Ci
68

34
24-09-2019

Maximum Likelihood (ML)

Method for Parameter Estimation
• Given: Training data for a class Ci: having Ni samples
Di={x1, x2,…, xn ,…,xNi}, x n  R d
• Data of a class is represented by parameter vector:
θi=[θi1, θi2,…, θiK]T, of its distribution
• Unknown: θi
• Likelihood of training data (Total data likelihood) for a
Ni
given θi :
p (D i | θ i )   p(x n | θi )
n 1
Ni
L (θi )  ln p (D i | θi )   ln p (x n | θi )
n 1

• Choose the parameters for which the total data

likelihood (log likelihood) is maximum:
θiML  arg max L (θ i )
θi 69

ML Method for Parameter Estimation

of Multivariate Gaussian Distribution
• Given: Training data for a class Ci having Ni samples
Di={x1, x2,…, xn ,…,xNi}, x n  R d
• Data of a class is represented by parameter vector:
[µi Σi]T, of Gaussian distribution
• Unknown: µi and Σi
• Likelihood of training data (Total data likelihood) for a
Ni
given µi and Σi : p (D | μ , Σ ) 
i i 
p(x n | μ i , Σi )
n 1
Ni
L (μ i , Σ i )  ln p (D i | μ i , Σ i )   ln p (x n | μ i , Σ i )
n 1

• Choose the parameters for which the total data

likelihood (log likelihood) is maximum:
μ iML , Σ iML  arg max L (μ i , Σ i )
μ i , Σi 70

35
24-09-2019

Illustration of ML Method:
Training Set: Adult-Child
• Number of training examples (N) = 20
• Dimension of a training example = 2
• Class label attribute is 3rd dimension
• Class:
– Child (0)
– Adult (1)

Weight
in Kg

Height in cm 71

Illustration of ML Method: Child class

• Number of training examples (N) = 20
• Dimension of a training example = 2
• Sample mean: [103.6 30.66]
• Sample covariance matrix:

109.3778 61.3500 
 61.3500 43.5415 


Weight
in Kg

Height in cm 72

36
24-09-2019

Illustration of ML Method: Child class

• Covariance matrix value is fixed at :

109.3778 61.3500 
 61.3500 43.5415 


• Search the values for mean vector

µ=[μ1, μ2]T that maximizes the total
data likelihood
• Range of values for mean vectors to
search:
– 1000 equally sampled values from 53.6
to 153.6 for μ1
– 1000 equally sampled values from
-20.66 to 80.66 for μ2
• Compute the likelihood value for each of the 10,00,000
(1000 x 1000) values of the mean vectors
73

Illustration of ML Method: Child class

• A maximum value for the likelihood is
obtained for the value
[103.65 30.71]
• This value is close to sample mean
vector: [103.6 30.66]
p(Di | µi Σi)

µ2
µ1 74

37
24-09-2019

ML Method for Parameter Estimation

of Multivariate Gaussian Distribution
• Parameters of Gaussian distribution of class Ci : µi and Σi
• Likelihood for a single example, xn :
 1 1 T 1 
p (x n | μ i , Σ i )  exp   (x n  μ i ) Σ i (x n  μ i ) 
1/ 2
(2 ) Σ i
d /2
 2 
• Log likelihood for total training data of class Ci ,
Di={x1, x2,…,xN}: Ni Ni
L (μ i , Σ i )  ln p (D i | μ i , Σ i )  ln  p ( x n | μ i , Σ i )   ln p ( x n | μ i , Σ i )
n 1 n 1
Ni
1 d 1
   ln Σ i  ln 2  ( x n  μ i ) T Σ i1 ( x n  μ i )
n 1 2 2 2
• Setting the derivatives of L(µi, Σi) w.r.t. µi and Σi to
zero, we get:
1 Ni 1 Ni
μ iML  
N i n 1
x n Σ iML  
N i n1
(x n  μ iML )(x n  μ iML )T
75

Bayes Classifier with Unimodal Gaussian

Density – Training Process
• Let C1, C2, …, Ci, …, CM be the M classes
• Let D1, D2, …, Di, …, DM be the training data for M
classes
• Estimate the parameters
– θ1= [µ1 Σ1]T ,
– θ2= [µ2 Σ2]T,
– …,
– θi= [µi Σi]T,
– …,
– θM= [µM ΣM]T for each of the classes
• Number of parameters to be estimated for each class
is dependent on dimensionality of the data space d
– Number of parameters: d + (d(d+1))/2
76

38
24-09-2019

Bayes Classifier with Unimodal Gaussian

Density – Training Process
• Let C1, C2, …, Ci, …, CM be the M classes
• Let D1, D2, …, Di, …, DM be the training data for M
classes
• Compute sample mean vector and sample covariance
matrix from training data of class 1, θ1= [µ1 Σ1]T
• Compute sample mean vector and sample covariance
matrix from training data of class 2, θ2= [µ2 Σ2]T,
• …,
• Compute sample mean vector and sample covariance
matrix from training data of class M, θM= [µM ΣM]T

Bayes Classifier with Unimodal Gaussian

Density: Classification
• For a test example x:
– likelihood of x generated from each of the classes
p(x|µi,Σi) is computed
– Assign the label of class for which p(x|µi,Σi) is maximum
p(x|µ1,Σ1)
θ1=[μ1 Σ1]

Test θ2=[μ2 Σ2]

example Decision Class label
x p(x|µ2,Σ2) Logic

Class
label

 arg max p x μ i , Σ i 
i
θM=[μM ΣM]
p(x|µM,ΣM)
78

39
24-09-2019

Illustration of Bayes Classifier with Unimodal

Gaussian Density : Adult(1)-Child(0) Classification
• Training Phase:
– Compute sample mean vector and sample
covariance matrix from training data of
class 1 (Child)
μ1  103.6000 30.6600 

109.3778 61.3500 
Σ1  
 61.3500 43.5415 
– Compute sample mean vector and sample
covariance matrix from training data of
class 2 (Adult)

μ 2  166.0000 67.1150 

110.6667 160.5278 
Σ2   
160.5278 255.4911 
79

Illustration of Bayes Classifier with Unimodal

Gaussian Density : Adult(1)-Child(0) Classification
Test Example, x :
• Test phase: Classification
• Class1 (Child)
μ1  103.6000 30.6600 
109.3778 61.3500 
Σ1  
 61.3500 43.5415  Weight
in Kg
• Class 2 (Adult)
μ 2  166.0000 67.1150 
110.6667 160.5278 
Σ2   
160.5278 255.4911 
Height in cm
• Compute likelihood of test
• Compute likelihood of test
sample, x with class 1 (Child)
sample, x with class 2 (Adult)
p x μ1 , Σ1   3.5237x10-08
p x μ 2 , Σ 2   3.7177x10 -04

Class label of x = Adult 80

40
24-09-2019

Summary: Bayes Classifier with Unimodal

Gaussian Density
• The relation between examples and class can be
captured in a statistical model
– Bayes classifier

• Statistical model:
– Unimodal Gaussian density
• Univariate
• Multivariate

p(x)

Weight
in Kg
Height in cm 81

Summary: Bayes Classifier with Unimodal

Gaussian Density
• The relation between examples and class can be
captured in a statistical model
– Bayes classifier
[166.0 67.1]
• Statistical model:
Weight in Kg

– Unimodal Gaussian
density
• Univariate
• Multivariate
[103.6 30.1]

Height in cm
• The real world data need not be unimodal
– The shape of the density can be arbitrary
– Bayes classifier?
• Multimodal density function
82

41
24-09-2019

Adult-Child Data

[149.7 65.1]

Weight in Kg
Weight in Kg

[117.2 31.5]

Height in cm Height in cm

Multimodal Distribution: Adult-Child Data

• For a class whose data is considered to have multiple
clusters, the probability distribution is multimodal
[171.1 75.2]
[138.2 59.6]
Weight in Kg
Weight in Kg

[129.9 32.6]
[101.7 30.1]

Height in cm Height in cm

42
24-09-2019

Multimodal Distribution: Adult-Child Data

• For a class whose data is considered to have multiple
clusters, the probability distribution is multimodal

• M1: Cluster 1 (mode 1)

Multimodal Gaussian:
Child Data • M2: Cluster 2 (mode 2)

M2
M1
p(x)

Weight
in Kg
Height in cm

Multimodal Gaussian Distribution:

Gaussian Mixture Model
• Given: Training data for a class Ci: having Ni samples
Di={x1, x2,…, xn ,…,xNi}, x n  R d
• Gaussian mixture model (GMM): to represent a
multimodal distribution

• GMM is a linear Multimodal Gaussian:

superposition of Child Data
multiple (Q)
Gaussian
components: p(x)

p (x|Ci )   wq N x | μ q , Σ q 
Q

q 1

– The overall Weight

envelope of the in Kg
curve Height in cm
86

43
24-09-2019

Gaussian Mixture Model (GMM)

• GMM is a linear superposition of multiple Gaussians:

p (x|Ci )   wq N x | μ q , Σ q 
Q

q 1

• For a d-dimensional feature vector representation of

data, the parameters of GMM are
– Mixture coefficients, wq , q = 1,2,…, Q
• Mixture weight or Strength of each clusters (or mixtures or
modes)
Q
• Property: w
q 1
q 1
– d-dimensional mean vector, µq , q = 1,2,…, Q
– dxd size covariance matrices, Σq , q = 1,2,…, Q

• Training process objective: To estimate the

parameters of the GMM
87

Parameter Estimation of GMM:

Incomplete Data Problem
• Given: Training data for a class Ci: having Ni samples
Di={x1, x2,…, xn ,…,xNi}, x n  R d
• Known: Training data is multimodal in nature
• Unknown: identity of the cluster (or mixture) of these
training data points
• Incomplete data problem:
– Given is only data points but not their identity (i.e. to
which cluster it belongs)
– Hidden (latent) information: Identity of data points to
the cluster

44
24-09-2019

Parameter Estimation of GMM:

Incomplete Data Problem
• If identity (latent information) is given, how to
estimate parameters of GMM?
• Apply maximum likelihood method to estimate the
parameters of each of the q mixtures (µq and Σq)
• Mixture coefficients, wq is computed as
N iq • Niq: Number of data points in cluster q
wq 
Ni • Ni: Number of data points in class Ci

• In practice, we do not have this information

• Goal of parameter estimation: To find the best
possible values of parameters of GMM such that the
total likelihood of data is maximized
– Maximum likelihood method for training a GMM:
Expectation-Maximization (EM) method
89

Expectation-Maximization (EM) for GMMs

• An elegant and powerful method for finding the
maximum likelihood solution for a model with latent
variables
• Given a Gaussian mixture model, the goal is to
maximize the likelihood function with respect to the
parameters
1. Initialize the means μq, covariances Σq and mixing
coefficients wq, and evaluate the initial value of the log
likelihood
2. E-step: Evaluate the responsibilities γq(x) using the
current parameter values

45
24-09-2019

EM Method – Responsibility Term

• A quantity that plays an important role is the responsibility
term, γq(x)
• It is given by wqN x | μ q , Σ q 
γq ( x ) 
 w N x | μ ,Σj
Q

j j
j 1
• wq : mixture coefficient or prior probability of component q,
• γq(x) gives the posterior probability of the component q for
the observation x

1
γ1(xn) = 0.99 3 γ1(xm) = 0.08
γ2(xn) = 0.01 γ2(xm) = 0.42
γ3(xn) = 0.00
2 γ3(xm) = 0.34
γ4(xn) = 0.00 4 γ4(xm) = 0.16
91

Expectation-Maximization (EM) for GMMs

• Given a Gaussian mixture model, the goal is to
maximize the likelihood function with respect to the
parameters
1. Initialize the means μq, covariances Σq and mixing
coefficients wq, and evaluate the initial value of the log
likelihood
2. E-step: Evaluate the responsibilities γq(x) using the
current parameter values
new
3. M-step: Re-estimate the parameters μ new
q , Σq and wqnew
using the current responsibilities
4. Evaluate the log likelihood and check for convergence of
the log likelihood
• If the convergence criterion is not satisfied return to step 2

46
24-09-2019

Expectation-Maximization (EM) for GMMs

• Convergence criterion: Difference between log
likelihoods of successive iterations fall below a
threshold (E.g. 10-3)

Log likelihood
L (θ i )  ln p (D i | θ i )

1 2 3 4 5 6 7 8 9 10 11 12
Iterations
93

Illustration of Parameter Estimation

C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

47
24-09-2019

Bayes Classifier: Multimodal Data

• Let C1, C2, …, Ci, …, CM be the M classes
• Given: a test example x
• Bayes decision rule:
Likelihood Prior
Posterior Probability
p(x | Ci ) P(Ci )
P(Ci | x) 
p(x) Evidence

p(x | Ci )   wqN x | μq , Σq 
Q

q 1

Class label for x = arg max P(Ci | x)

Bayes Classifier with Multimodal Gaussian

Density (GMM) – Training Process
• Let C1, C2, …, Ci, …, CM be the M classes
• Let D1, D2, …, Di, …, DM be the training data for M
classes
• Build GMM (λ) for each of the classes
GMM for class 1, λ1 GMM for class 2, λ2 GMM for class M, λM

GMM for Class i , λ i  wq , μ q , Σ q Q

q 1

48
24-09-2019

Bayes Classifier with Multimodal Gaussian

Density (GMM) – Classification

p(x | λ1)

λ1

Class
label
Test Example
x
p(x | λ2) Decision (class 1)
logic
λ2
 arg max p x i 
Class
label i

p(x | λM)

λM

Determining Q, Number of Gaussian

Components
• This is determined experimentally
• Starting with Q=1, test set is used to estimate the
accuracy of the Bayes classifier
• This process is repeated each time by incrementing Q
to allow for more Gaussian components
• The GMM with Q components that gives the maximum
accuracy may be selected

49
24-09-2019

Bayes Classifier with

Gaussian Mixture Models – Summary
• Multimodal probability distribution for each class is
represented by a Gaussian mixture model.
• GMM is a powerful way of modeling data
• Using GMM, a data of any arbitrary shaped distribution can
be modeled
• In GMM, number of parameters to be estimated for each
class is dependent on:
– Dimensionality of the data space d
– Number of Gaussian mixtures Q
Qxd + Qx(d(d+1))/2 + Q
• For large values of d and Q, the number of examples
required to estimate the parameters properly will be large.
• When the estimated class-conditional densities are the
same as the true densities, Bayes classifier gives minimum
classification error

Text Books
1. J. Han and M. Kamber, Data Mining: Concepts and
Techniques, Third Edition, Morgan Kaufmann Publishers,
2011.

2. S. Theodoridis and K. Koutroumbas, Pattern Recognition,

Academic Press, 2009.

3. C. M. Bishop, Pattern Recognition and Machine Learning,

Springer, 2006.

100

Parrot OS Tools
No ratings yet
Parrot OS Tools
56 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Class10 13 PatternClassification 06 13oct2020
No ratings yet
Class10 13 PatternClassification 06 13oct2020
47 pages
Class10-Introduction To ML
No ratings yet
Class10-Introduction To ML
32 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Class11-PatternClassification KNN
No ratings yet
Class11-PatternClassification KNN
28 pages
Classification
No ratings yet
Classification
58 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Lecture 3 Basics of Clssification
No ratings yet
Lecture 3 Basics of Clssification
53 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Donalek Classif
No ratings yet
Donalek Classif
69 pages
L09 - Learning - Part 2
No ratings yet
L09 - Learning - Part 2
41 pages
DM - MP
No ratings yet
DM - MP
15 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Machine
No ratings yet
Machine
61 pages
08 Class Basic
No ratings yet
08 Class Basic
154 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
CSCI946 W5-Classification
No ratings yet
CSCI946 W5-Classification
72 pages
Unit4 PPT
No ratings yet
Unit4 PPT
118 pages
Classification
No ratings yet
Classification
50 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Different Paradigms of Pattern Recognition
No ratings yet
Different Paradigms of Pattern Recognition
8 pages
ITP4-Lesson 4-Week 7-8
No ratings yet
ITP4-Lesson 4-Week 7-8
18 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
Machine Learning and Web Scraping Lecture 03
No ratings yet
Machine Learning and Web Scraping Lecture 03
22 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Chapter1 ML
No ratings yet
Chapter1 ML
101 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Lec 11
No ratings yet
Lec 11
31 pages
Clustering For Clasification
No ratings yet
Clustering For Clasification
13 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
CH 4
No ratings yet
CH 4
106 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
ML UNIT - III-Complete
No ratings yet
ML UNIT - III-Complete
52 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
DSand ML
No ratings yet
DSand ML
76 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Welcome To Another Helpful Hint in My Tips and Tricks' Sessions
No ratings yet
Welcome To Another Helpful Hint in My Tips and Tricks' Sessions
7 pages
tms320f28377d (데이터시트)
No ratings yet
tms320f28377d (데이터시트)
253 pages
Linux Commands
No ratings yet
Linux Commands
1 page
EDR Vs XDR
No ratings yet
EDR Vs XDR
23 pages
Aoc 2217v
No ratings yet
Aoc 2217v
51 pages
AAAdvancedCustomisationPart3 PDF
100% (1)
AAAdvancedCustomisationPart3 PDF
70 pages
KX-21N Operator's Manual N American 05.2010
No ratings yet
KX-21N Operator's Manual N American 05.2010
386 pages
21CSS101J Programming For Problem Solving
No ratings yet
21CSS101J Programming For Problem Solving
135 pages
Css OB
No ratings yet
Css OB
14 pages
Tài liệu không có tiêu đề
No ratings yet
Tài liệu không có tiêu đề
2 pages
AGM Night Vision Catalog 2025
No ratings yet
AGM Night Vision Catalog 2025
44 pages
AI in Accounting 200 Unique Model Questions
No ratings yet
AI in Accounting 200 Unique Model Questions
16 pages
WMN Chapter 1
No ratings yet
WMN Chapter 1
23 pages
Installation Manual & Operation Instructions: Wheel Balancer Pse Wb-260
No ratings yet
Installation Manual & Operation Instructions: Wheel Balancer Pse Wb-260
42 pages
Spark Physical and Logical Plan Analysis
No ratings yet
Spark Physical and Logical Plan Analysis
7 pages
Pizza Hut Management System
No ratings yet
Pizza Hut Management System
14 pages
PFS and PD Compiled Document Exec Edited
No ratings yet
PFS and PD Compiled Document Exec Edited
59 pages
XPLM Siemens Teamcenter Autodesk Revit
No ratings yet
XPLM Siemens Teamcenter Autodesk Revit
2 pages
AC by Sheshank Gupta
No ratings yet
AC by Sheshank Gupta
12 pages
Cyber Security
No ratings yet
Cyber Security
27 pages
Analytics Assignment: To Access The Google Analytics Demo Account
No ratings yet
Analytics Assignment: To Access The Google Analytics Demo Account
2 pages
Draft Reasearch Paper
No ratings yet
Draft Reasearch Paper
3 pages
Raja Shankar Shah University, Chhindwara (M.P.)
No ratings yet
Raja Shankar Shah University, Chhindwara (M.P.)
2 pages
Patient Management Information - System
No ratings yet
Patient Management Information - System
12 pages
Product Catalogue en
No ratings yet
Product Catalogue en
40 pages
1929 Rakul Mathavan
No ratings yet
1929 Rakul Mathavan
11 pages
Google My Business 101
100% (1)
Google My Business 101
28 pages
Interior Ballistics Simulation of Modular Charge Gun System Using Matlab
100% (1)
Interior Ballistics Simulation of Modular Charge Gun System Using Matlab
7 pages
AIS Wk1PostAct
No ratings yet
AIS Wk1PostAct
4 pages