0% found this document useful (0 votes)
12 views28 pages

Class11-PatternClassification KNN

The document discusses supervised machine learning, focusing on pattern classification, which involves categorizing new observations into predefined classes. It outlines the two-step process of building a classifier through training and then using it for prediction, emphasizing the importance of data preparation and accuracy measurement. Additionally, it covers methods like the K-Nearest Neighbors (K-NN) for classification and the need for data normalization.

Uploaded by

Paladin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views28 pages

Class11-PatternClassification KNN

The document discusses supervised machine learning, focusing on pattern classification, which involves categorizing new observations into predefined classes. It outlines the two-step process of building a classifier through training and then using it for prediction, emphasizing the importance of data preparation and accuracy measurement. Additionally, it covers methods like the K-Nearest Neighbors (K-NN) for classification and the need for data normalization.

Uploaded by

Paladin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Supervised Machine Learning:

Pattern Classification
Classification
• Problem of identifying to which of a set of categories a
new observation belongs
• Predicts categorical labels
• Example:
• Predicting a person as adult or child (2-class)
• Predicting the raise in salary based on the year of
experience and salary (2-class)
• Identify an email as spam or not (2-class)
• Predicting the presence or absence of disease (2-class)
– Pima Indians Diabetes Database: predict whether a patient
has diabetes or not based on diagnostic measurements
• Categorising the disease according to symptoms (Multi-
class)
• Categorizing the Iris flowers (Multi-class)

2
Classification
• Classification is a two step process
– Step1: Building a classifier (data modeling)
• Learning from data (training phase)
• Supervised learning: In supervised learning, each example
is a pair consisting of an input example and a desired
output value (class label)
• Training phase or learning phase is viewed as the learning
of a mapping or function that can predict the associated
class label of a given training example

– xn is the nth training example and yn is the associated class label


– Step2: Using classification model for prediction
• Testing phase - Predicting class label for the unseen data
• Accuracy of a classifier: Percentage of test examples
that are correctly classified by the classifier
• Target of learning techniques: Good generalization
ability 3
2-class Classification
• Example: Classifying a person as child or adult

Weight (x2)
Adul
t
Height, x1 Class
Adult/Child
Classifier Chil
Weight, x2 d

Adult :Class C1
Child :Class C2 Height (x1)

x = [x1 x2]T

4
Illustration of Training Set: Adult-Child
• Number of training examples (N) = 20
• Dimension of a training example = 2
• Class label attribute is 3rd dimension
• Class:
– Child (0)
– Adult (1)

Weight
in Kg

Height in cm 5
Step1: Building a Classification
Model (Training Phase)

Feature
extraction Training Examples
Chil
90 21.5
d
Feature
extraction
32.4 Chil
100 d
5

Feature
Training extraction
28.4 Chil Classifier
98
Phase 3 d

Feature
extraction
Adul
183 90
t

Feature
extraction
67.4 Adul
163
5 t

6
Step2: Classification (Testing Phase)

Feature
extraction Training Examples
90 21.5 Child
Feature
extraction
32.4 Child
100
5
Class label
Feature
extraction
(Adult)
Training
98
28.4
Child Classifier
Phase 3

Feature
extraction
183 90 Adult

Feature
extraction
67.4
163 Adult
5

Feature
extraction
Testing
150 50.6
Phase
7
Data Preparation for the Classification

• Divide the data into training set and test set


• Approach 1: When the number samples from each
class are almost equal (Balanced data)
– Most common split is 70-30 split:
• Training data contain 70% of samples from each class
• Test data contain remaining 30% of samples from each
class
– One can use other splits like 50-50 or 60-40 or 80-20 or
90-10

8
Data Preparation for the Classification:
Approach 1
• Suppose that we are doing 70-30 split
• Suppose the data set has 3000 samples
• Each sample is belonging to one of the 3 classes
• Suppose each class has 1000 samples
– Step1: From class1, 70% i.e. 700 samples considered as
training samples and remaining 30% i.e. 300 samples are
considered as test samples
– Step2: From class2, 70% i.e. 700 samples considered as
training samples and remaining 30% i.e. 300 samples are
considered as test samples
– Step3: From class3, 70% i.e. 700 samples considered as
training samples and remaining 30% i.e. 300 samples are
considered as test samples
– Step4: Combine training examples from each class
• Training set now contain 700+700+700=2100 samples
– Step5: Combine test examples from each class
• Test set now contain 300+300+300=900 samples
9
Data Preparation for the Classification
• Divide the data into training set and test set
• Approach 1: When the number samples from each class
are almost equal (Balanced data)
– Example:
• Training data contain 70% of samples from each class
• Test data contain remaining 30% of samples from each class
• Approach 2: When the number samples from each class
are not equal (Imbalanced data)
– One class may have large number of samples and another has
small number of samples
– 70%-30% division may cause learned model to be bias to class
with larger number of training samples
– Solution:
• Consider 70% or 80% of the samples from the class with least
number of samples as training data from that class
• Consider the same number of samples from other class as training
examples
• Each class will have same number of training examples
10
Data Preparation for the Classification:
Approach 2
• Suppose the data set has 3000 samples
• Each sample is belonging to one of the 3 classes
• Suppose class1 has 700 samples, class2 has 300 samples
and class3 has 2000 samples
– Step1: From class2, 70% i.e. 210 samples considered as
training samples and remaining 30% i.e. 90 samples are
considered as test samples
– Step2: From class1, 210 samples considered as training
samples and remaining 490 samples are considered as test
samples
– Step3: From class3, 210 samples considered as training
samples and remaining 1790 samples are considered as test
samples
– Step4: Combine training examples from each class
• Training set now contain 210+210+210=630 samples
– Step5: Combine test examples from each class
• Test set now contain 490+90+1790=2370 samples
11
Nearest-Neighbour Method
• Training data with N samples:

– d: dimension of input example


– M: Number of classes
• Step 1: Compute Euclidean distance for a test
example x with every training examples, x1, x2, …, xn,
…, xN

x2

x1
12
Nearest-Neighbour Method
• Training data:

– d: dimension of input example


– M: Number of classes
• Step 1: Compute Euclidean distance for a test
example x with every training examples, x1, x2, …, xn,
…, xN • Step 2: Sort the examples
in the training set in the
ascending order of the
x2 distance to test example x
• Step 3: Assign the class of
the training example with
the minimum distance to
x1 the test example, x
13
Illustration of Nearest Neighbour Method:
Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm

• Step 1: Compute Euclidean


distance (ED) will each training
examples

14
Illustration of Nearest Neighbour Method:
Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm

• Step 2: Sort the examples in the


training set in the ascending order
of the distance to test example
15
Illustration of Nearest Neighbour
Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 3: Assign the class of the
training example with the
minimum distance to the test
example
– Class: Adult (1)
16
Nearest-Neighbour Method
• Training data:

– d: dimension of input example


– M: Number of classes
• Step 1: Compute Euclidean distance for a test
example x with every training examples, x1, x2, …, xn,
…, xN • Step 2: Sort the examples
in the training set in the
ascending order of the
x2 distance to x
• Step 3: Assign the class of
the training example with
the minimum distance to
x1 the test example, x
17
Illustration of Nearest Neighbour Method:
Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm

• Step 1: Compute Euclidean


distance (ED) will each training
examples
18
Illustration of Nearest Neighbour Method:
Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 2: Sort the examples in the
training set in the ascending order
of the distance to test example

19
Illustration of Nearest Neighbour
Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Step 3: Assign the class of the
training example with the minimum
distance to the test example
– Class: Adult (1) ?
20
K-Nearest Neighbours (K-NN) Method
• Consider the class labels of the K training examples
nearest to the test example
• Step 1: Compute Euclidean distance for a test
example x with every training examples, x1, x2, …, xn,
…, xN

x2

x1

21
K-Nearest Neighbours (K-NN) Method
• Consider the class labels of the K training examples
nearest to the test example
• Step 1: Compute Euclidean distance for a test
example x with every training examples, x1, x2, …, xn,
…, xN • Step 2: Sort the examples in
the training set in the
ascending order of the
distance to x
x2 • Step 3: Choose the first K
examples in the sorted list
– K is the number of
neighbours for text
x1 example
• Step 4: Test example is assigned the most common
class among its K neighbours
22
Illustration of Nearest Neighbour
Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Consider K=5
• Step 3: Choose the first K=5
examples in the sorted list
23
Illustration of Nearest Neighbour
Method: Adult(1)-Child(0) Classification
Test Example:

Weight
in Kg

Height in cm
• Consider K=5
• Step 4: Test example is assigned
the most common class among its
K neighbours
– Class: Adult
24
Determining K, Number of Neighbours

• This is determined experimentally


• Starting with K=1, test set is used to estimate the
accuracy of the classifier
• This process is repeated each time by incrementing K
to allow for more neighbour
• The K value that gives the maximum accuracy may be
selected
• Preferably the value of K should be an odd number
and prime number.

25
Data Normalization
• Since the distance measure is used, K-NN classifier
require normalising the values of each attribute
• Normalising the training data:
– Compute the minimum and maximum values of each of
the attributes in the training data
– Store the minimum and maximum values of each of the
attributes
– Perform the min-max normalization on training data set
• Normalizing the test data:
– Use the stored minimum and maximum values of each
of the attributes from training set to normalise the test
examples
• NOTE: Ensure that test examples are not causing out-
of-bound error

26
Lazy Learning : Learning from Neighbours
• The K nearest neighbour classifier is an example of
lazy learner
• Lazy learning waits until the last minute before doing
any model construction to classify test example
• When the training examples are given, a lazy learner
simply stores them and waits until it is given a test
example
• When it sees the test example, then it classify based
on its similarity to the stored training examples
• Since the lazy learns stores the training examples or
instances, they also called instance based learners
• Disadvantages:
– Making classification or prediction is computationally
intensive
– Require efficient huge storage techniques when the
training samples are huge 27
Text Books

1. J. Han and M. Kamber, Data Mining: Concepts and


Techniques, Third Edition, Morgan Kaufmann Publishers,
2011.

2. S. Theodoridis and K. Koutroumbas, Pattern Recognition,


Academic Press, 2009.

3. C. M. Bishop, Pattern Recognition and Machine Learning,


Springer, 2006.

28

You might also like