Chapter 04

The document discusses non-metric methods for pattern classification, focusing on decision trees, perceptrons, and support vector machines. It explains the construction, advantages, and disadvantages of decision trees, including concepts like overfitting and pruning. Additionally, it covers the workings of perceptrons and support vector machines, detailing their applications in classification problems.

Uploaded by

aakash626273

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views48 pages

Chapter 04

Uploaded by

aakash626273

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

PARULINSTITUTEOF ENGINEERING &TECHNOLOGY

FACULTY OF ENGINEERING & TECHNOLOGY

PARULUNIVERSITY

Subject: Pattern Recognition

Chapter 4 : Non-metric methods for pattern
classification
Computer Science & Engineering
Ishwarlal Rathod (Assistant Prof. PIET-CSE)
Outline
• Concept of construction,
• splitting of nodes,
• choosing of attributes,
• overfitting,
• pruning,
• Linear Discriminant based algorithm: Perceptron, Support
Vector Machines
Decision tree

• Decision Tree is a Supervised learning technique that can be used

for both classification and Regression problems, but mostly it is
preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
• It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
Construction of decision tree
Why use Decision Trees?
• Decision Trees usually mimic human thinking ability while making
a decision, so it is easy to understand.
• The logic behind the decision tree can be easily understood
because it shows a tree-like structure.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
How does the Decision Tree algorithm Work?
• In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree.
• This algorithm compares the values of root attribute with the
record (real dataset) attribute and, based on the comparison,
follows the branch and jumps to the next node.
• For the next node, the algorithm again compares the attribute
value with the other sub-nodes and move further.
• It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below
algorithm:
Steps
• Step-1: Begin the tree with the root node, says S, which contains
the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for
the best attributes.
• Step-4: Generate the decision tree node, which contains the best
attribute.
• Step-5: Recursively make new decision trees using the subsets of
the dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called
the final node as a leaf node.
Example
Attribute Selection Measures
• While implementing a Decision tree, the main issue arises that
how to select the best attribute for the root node and for sub-
nodes.
• So, to solve such problems there is a technique which is called
as Attribute selection measure or ASM.
• By this measurement, we can easily select the best attribute for
the nodes of the tree.
• There are two popular techniques for ASM, which are:
• Information Gain
• Gini Index
Information Gain
• It is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
• It calculates how much information a feature provides us about
a class.
• According to the value of information gain, we split the node
and build the decision tree.
• A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the
below formula:
Information Gain=Entropy(S)[(Weighted Avg) *Entropy(each feature)
Cont.
• Entropy: Entropy is a metric to measure the impurity in a given
attribute. It specifies randomness in data. Entropy can be
calculated as:
Entropy(s)=-P(yes) log2 P(yes) - P(no) log2 P(no)
Where,
• S= Total number of samples
• P(yes)= probability of yes
• P(no)= probability of no
Gini Index
• Gini index is a measure of impurity or purity used while creating
a decision tree in the CART(Classification and Regression Tree)
algorithm.
• An attribute with the low Gini index should be preferred as
compared to the high Gini index.
• It only creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
• Gini index can be calculated using the below formula:
Overfitting
• Overfitting refers to the condition when the model completely
fits the training data but fails to generalize the testing unseen
data.
• Overfit condition arises when the model memorizes the noise of
the training data and fails to capture important patterns.
• A perfectly fit decision tree performs well for training data but
performs poorly for unseen test data.
• If the decision tree is allowed to train to its full strength, the
model will overfit the training data.
Overfitting
Overfitting
• Overfitting refers to the condition when the model completely
fits the training data but fails to generalize the testing unseen
data.
• Overfit condition arises when the model memorizes the noise of
the training data and fails to capture important patterns.
• A perfectly fit decision tree performs well for training data but
performs poorly for unseen test data.
• If the decision tree is allowed to train to its full strength, the
model will overfit the training data.
Overfitting
• There are various techniques to prevent the decision tree model
from overfitting.
• Pruning
– Pre-pruning
– Post-pruning
• Ensemble
– Random forest
Pruning
• Pruning is a process of deleting the unnecessary nodes from a tree
in order to get the optimal decision tree.
• A too-large tree increases the risk of overfitting, and a small tree
may not capture all the important features of the dataset.
Therefore, a technique that decreases the size of the learning tree
without reducing accuracy is known as Pruning. There are mainly
two types of tree pruning technology used:
• Cost Complexity Pruning
• Reduced Error Pruning.
Advantages of Decision tree
• It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of Decision tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
• For more class labels, the computational complexity of the
decision tree may increase.
Linear discriminant based Algorithm: Perceptron
• Perceptron is a linear supervised machine learning algorithm. It
is used for binary classification.
• It helps to detect certain input data computations in business
intelligence.
• The perceptron learning algorithm is treated as the most
straightforward Artificial Neural network.
• Hence, it is a single-layer neural network with four main
parameters, i.e., input values, weights and Bias, net sum, and an
activation function.
Basic components of Perceptron
• Mr. Frank Rosenblatt invented the perceptron model as a binary
classifier which contains three main components. These are as
follows:
Contd.
• Input Nodes or Input Layer: This is the primary component of
Perceptron which accepts the initial data into the system for
further processing. Each input node contains a real numerical
value.
• Wight and Bias: Weight parameter represents the strength of
the connection between units. This is another most important
parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in
deciding the output. Further, Bias can be considered as the line
of intercept in a linear equation.
Contd.
• Activation Function: These are the final and important
components that help to determine whether the neuron will fire
or not. Activation Function can be considered primarily as a step
function.
• Types of Activation functions:
• Sign function
• Step function, and
• Sigmoid function
Contd.
• The data scientist uses the activation function to take a
subjective decision based on various problem statements
and forms the desired outputs. Activation function may
differ (e.g., Sign, Step, and Sigmoid) in perceptron models
by checking whether the learning process is slow or has
vanishing or exploding gradients.
How does perceptron work?
• The perceptron model begins with the multiplication of all input
values and their weights, then adds these values together to
create the weighted sum. Then this weighted sum is applied to
the activation function 'f' to obtain the desired output. This
activation function is also known as the step function and is
represented by 'f'.
Contd.
• This step function or Activation function plays a vital role in
ensuring that output is mapped between required values (0,1) or
(-1,1).
• It is important to note that the weight of input is indicative of the
strength of a node.
• Similarly, an input's bias value gives the ability to shift the
activation function curve up or down.
• Perceptron model works in two important steps as follows:
Step-1
• In the first step first, multiply all input values with corresponding
Contd.
weight values and then add them to determine the weighted sum.
Mathematically, we can calculate the weighted sum as follows:
∑wi*xi = x1*w1 + x2*w2 +…wn*xn
• Add a special term called bias 'b' to this weighted sum to
improve the model's performance.
∑wi*xi + b
Step-2
• An activation function is applied with the above-mentioned
weighted sum, which gives us output either in binary form or a
continuous value as follows:
Y = f(∑wi*xi + b)
Types of Perceptron Models
• Based on the number of layers, perceptrons are broadly
classified into two major categories:
• Single layer perceptron model: It is the simplest Artificial Neural
Network (ANN) model. A single-layer perceptron model consists
of a feed-forward network and includes a threshold transfer
function for thresholding on the Output. The main objective of
the single-layer perceptron model is to classify linearly separable
data with binary labels.
Contd.
• Multi-Layer Perceptron Model: The multi-layer perceptron
learning algorithm has the same structure as a single-layer
perceptron but consists of an additional one or more hidden
layers, unlike a single-layer perceptron, which consists of a single
hidden layer. The distinction between these two types of
perceptron models is shown in the Figure below.
Contd.
Perceptron Function
• Perceptron function ''f(x)'' can be achieved as output by
multiplying the input 'x' with the learned weight coefficient 'w'.
• Mathematically, we can express it as follows:
• f(x)=1; if w.x+b>0
• otherwise, f(x)=0
• 'w' represents real-valued weights vector
• 'b' represents the bias
• 'x' represents a vector of input x values.
Limitations of Perceptron Model
• The output of a perceptron can only be a binary number (0 or 1)
due to the hard limit transfer function.
• Perceptron can only be used to classify the linearly separable sets
of input vectors. If input vectors are non-linear, it is not easy to
classify them properly.
Support vector machine algorithm
• SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in
Machine Learning.
• The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space into
classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is
called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating
the hyperplane. These extreme cases are called as support
vectors, and hence algorithm is termed as SVM.
Contd.
Example
Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a
single straight line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified by
using a straight line, then such data is termed as non-linear data
and classifier used is called as Non-linear SVM classifier.
How does SVM work?
• Linear SVM:
The working of the SVM algorithm can be understood by using
an example. Suppose we have a dataset that has two tags
(green and blue), and the dataset has two features x1 and x2.
We want a classifier that can classify the pair(x1, x2) of
coordinates in either green or blue. Consider the below image:
Contd.
• So as it is 2-d space so by just using a straight line, we can
easily separate these two classes. But there can be multiple
lines that can separate these classes. Consider the below
image:
Contd.
• Hence, the SVM algorithm helps to find the best line or
decision boundary; this best boundary or region is called as
a hyperplane.
• SVM algorithm finds the closest point of the lines from both
the classes.
• These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin.
• And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal
hyperplane.
Contd.
Contd.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a
straight line, but for non-linear data, we cannot draw a single
straight line. Consider the below image:
Contd.
So to separate these data points, we need to add one more
dimension. For linear data, we have used two dimensions x
and y, so for non-linear data, we will add a third dimension z. It
can be calculated as:

By adding the third dimension, the sample space will become

as below image:
Contd.
Contd.
• So now, SVM will divide the datasets into classes in the following
way. Consider the below image:
Contd.
• Since we are in 3-d Space, hence it is looking like a plane parallel to
the x-axis. If we convert it in 2d space with z=1, then it will become
as:

• Hence we get a circumference of radius 1 in case of non-linear

data.
Thank You!!!
www.paruluniversi
ty.ac.in

Anaconda With Python 2 On 32-Bit Windows - Anaconda 2.0 Documentation
No ratings yet
Anaconda With Python 2 On 32-Bit Windows - Anaconda 2.0 Documentation
2 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Unit 4
No ratings yet
Unit 4
33 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Lab 2
No ratings yet
Lab 2
3 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
2179 Unit 3
No ratings yet
2179 Unit 3
29 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
NOTES
No ratings yet
NOTES
18 pages
Tree
No ratings yet
Tree
7 pages
5 Learning
No ratings yet
5 Learning
7 pages
Deciosn Tree
No ratings yet
Deciosn Tree
5 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
15 pages
DMDW 04
No ratings yet
DMDW 04
10 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
Unit 3
No ratings yet
Unit 3
8 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 07
No ratings yet
Chapter 07
18 pages
Chapter 05
No ratings yet
Chapter 05
25 pages
Chapter 02
No ratings yet
Chapter 02
27 pages
Chapter 08
No ratings yet
Chapter 08
12 pages
Chapter 06
No ratings yet
Chapter 06
15 pages
ML-CBT July24
No ratings yet
ML-CBT July24
3 pages
Implicit Method
No ratings yet
Implicit Method
30 pages
Flowchart TD
No ratings yet
Flowchart TD
4 pages
Image Compression Standards
No ratings yet
Image Compression Standards
57 pages
Report 3
No ratings yet
Report 3
3 pages
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
No ratings yet
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
3 pages
Machine Learning Quick Start Guide
No ratings yet
Machine Learning Quick Start Guide
1 page
Lab Ex 1-3
No ratings yet
Lab Ex 1-3
7 pages
8.2.1: Introduction To Neural Networks: Objectives
No ratings yet
8.2.1: Introduction To Neural Networks: Objectives
11 pages
Homework 3 Residential
No ratings yet
Homework 3 Residential
5 pages
Aim: - Project Problem Statement Feasibility Assessment Using NP-Hard and NP-Complete
No ratings yet
Aim: - Project Problem Statement Feasibility Assessment Using NP-Hard and NP-Complete
4 pages
Course Inication
No ratings yet
Course Inication
3 pages
CS3233 C II P I Competitive Programming: Dr. Steven Halim Week 04 - Problem Solving Paradigms
No ratings yet
CS3233 C II P I Competitive Programming: Dr. Steven Halim Week 04 - Problem Solving Paradigms
46 pages
Cutting Plane Method
No ratings yet
Cutting Plane Method
6 pages
Che188-1 Q#2
No ratings yet
Che188-1 Q#2
3 pages
2 - 3 Multi-Roots and Systems of NAE
No ratings yet
2 - 3 Multi-Roots and Systems of NAE
14 pages
Final Exam - 2 PDF
No ratings yet
Final Exam - 2 PDF
5 pages
4 Time Domain Representation of Lti Systems
No ratings yet
4 Time Domain Representation of Lti Systems
9 pages
Tree Data Structure Slides
No ratings yet
Tree Data Structure Slides
7 pages
Modulation and Coding Project: DVB-S2 Communication Chain: Quentin Bolsee Huong Nguyen Ilias Fassi-Fihri May 18, 2016
No ratings yet
Modulation and Coding Project: DVB-S2 Communication Chain: Quentin Bolsee Huong Nguyen Ilias Fassi-Fihri May 18, 2016
15 pages
Dynamic Programming 2
No ratings yet
Dynamic Programming 2
39 pages
Various Neural Network Architect Assignment Questions
No ratings yet
Various Neural Network Architect Assignment Questions
9 pages
JNTUA B Tech 2018 3 1 Sup R15 ECE 15A04502 Digital Communication Systems
No ratings yet
JNTUA B Tech 2018 3 1 Sup R15 ECE 15A04502 Digital Communication Systems
1 page
CT Algorithm Project
No ratings yet
CT Algorithm Project
3 pages
Levenshtein Distance PDF
No ratings yet
Levenshtein Distance PDF
3 pages
Ma311 Numerical Techniques (End - SP22)
No ratings yet
Ma311 Numerical Techniques (End - SP22)
4 pages
12 Online Mandatory 12 12 50 Yes No Yes 0 1 640653100858 No Null
No ratings yet
12 Online Mandatory 12 12 50 Yes No Yes 0 1 640653100858 No Null
12 pages
Greedy
No ratings yet
Greedy
27 pages

Chapter 04

Uploaded by

Chapter 04

Uploaded by

PARULINSTITUTEOF ENGINEERING &TECHNOLOGY

FACULTY OF ENGINEERING & TECHNOLOGY

Subject: Pattern Recognition

• Decision Tree is a Supervised learning technique that can be used

By adding the third dimension, the sample space will become

• Hence we get a circumference of radius 1 in case of non-linear

You might also like