Refer For KNNDecison Tree SVM

Uploaded by

zaidnadaf14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views90 pages

Refer For KNNDecison Tree SVM

Uploaded by

zaidnadaf14

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Module 2

Supervised Learning Algorithms

Contents
• Regression
-Linaer
-Logestic
-Polynomial
• Classification
- KNN Classifier
- Decision Tree
- Random Forest
- SVM
Performance Matrix – Confusion Matrix
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as
nth degree polynomial.
• The Polynomial Regression equation is given below:

•
Linear vs Polynomial
• The main steps involved in Polynomial Regression are given below:
•
K-Nearest Neighborhood Algorithm (KNN)
• Intution behind KNN Algorithm
Features
• (K-NN) algorithm is a versatile and widely used machine learning algorithm
that is primarily used for its simplicity and ease of implementation.
• It does not require any assumptions about the underlying data distribution.
• It can also handle both numerical and categorical data, making it a flexible
choice for various types of datasets in classification and regression tasks.
• It is a non-parametric method that makes predictions based on the
similarity of data points in a given dataset.
• K-NN is less sensitive to outliers compared to other algorithms.
•
• The K-NN algorithm works by finding the K nearest neighbors to a
given data point based on a distance metric, such as Euclidean
distance.
• The class or value of the data point is then determined by the
majority vote or average of the K neighbors.
• This approach allows the algorithm to adapt to different patterns and
make predictions based on the local structure of the data.
•
Distance Metrics Used in KNN Algorithm

• Euclidean Distance
• Manhattan Distance
•
• The K-NN algorithm compares a new data entry to the values in a
given data set (with different classes or categories).
• Based on its closeness or similarities in a given range (K) of neighbors,
the algorithm assigns the new data to a class or category in the data
set (training data).
•
Steps in KNN Algorithm
KNN Example 1
• Since the value of K is 3, the algorithm will only consider the 3 nearest
neighbors to the green point (new entry). This is represented in the
graph above.
•
KNN Example 2
• Consider following dataset
Assumptions
KNN Algorithm
Decision Tree
• Decision trees, a key tool in machine learning,
• This model predict outcomes based on input data through a tree-like
structure.
• They offer interpretability, versatility, and simple visualization,
making them valuable for both categorization and regression tasks.
•
Concept
• It is a tree-like structure where
- each internal node tests on attribute,
- each branch corresponds to attribute value and
- each leaf node represents the final decision or prediction.
• While decision trees have advantages like ease of understanding,
they may face challenges such as overfitting.
• Understanding their terminologies and formation process is essential
for effective application in diverse scenarios.
• Decision trees are upside down which means the root is at the top
and then this root is split into various several nodes.
• Decision trees are nothing but a bunch of if-else statements in layman
terms.
• It checks if the condition is true and if it is then it goes to the next
node attached to that decision.
Example 1:
• Here, it will ask –
• what is the weather?
• Is it sunny, cloudy, or rainy?
• If yes then it will go to the next feature which is humidity and wind.
• It will again check if there is a strong wind or weak, if it’s a weak wind
and it’s rainy then the person may go and play.
We see that if the weather is cloudy then we must go to play.
Why didn’t it split more? Why did it stop there?
• But in simple terms,
• output for the training dataset is always yes for cloudy weather, since
there is no disorderliness here we don’t need to split the node
further.

• Entropy, information gain, and Gini index.

• The goal of machine learning is to decrease uncertainty or disorders

from the dataset and for this, we use decision trees.
Questions
• How do I know what should be the root node?
• what should be the decision node?
• when should I stop splitting?
• To decide this, there is a metric called “Entropy” which is the amount
of uncertainty in the dataset.
• Decision Tree algorithm works in simpler steps
• Starting at the Root: The algorithm begins at the top, called the “root
node,” representing the entire dataset.
• Asking the Best Questions: It looks for the most important feature or
question that splits the data into the most distinct groups. This is like
asking a question at a fork in the tree.
• Branching Out: Based on the answer to that question, it divides the
data into smaller subsets, creating new branches. Each branch
represents a possible route through the tree.
• Repeating the Process: The algorithm continues asking questions and
splitting the data at each branch until it reaches the final “leaf
nodes,” representing the predicted outcomes or classifications.
Entropy
• Entropy is nothing but the uncertainty in our dataset or measure of
disorder.
• Examples to understand concept of Entropy
Example 1
Left node Entropy
• Feature 2:
Right Node Entropy
• For Feature 3:
• Left node has low entropy or more purity than right node since left
node has a greater number of “yes” and it is easy to decide here.

• the higher the Entropy, the lower will be the purity and the higher
will be the impurity.
• The goal of machine learning is to decrease the uncertainty or
impurity in the dataset, here by using the entropy we are getting
the impurity of a particular node
• we don’t know if the parent entropy or the entropy of a particular
node has decreased or not.

• New metric called “Information gain” which tells us how much the
parent entropy has decreased after splitting it with some feature.
Information Gain
• Information gain measures the reduction of uncertainty given some
feature and it is also a deciding factor for which attribute should be
selected as a decision node or root node.

• It is just entropy of the full dataset – entropy of the dataset given

some feature.
Example
• Suppose our entire population has a total of 30 instances.
• The dataset is to predict whether the person will go to the gym or
not. Let’s say 16 people go to the gym and 14 people don’t
• Decide Features as Feature 1: “Energy” which takes two values “high
- 13” and “low 17”
• Feature 2 is “Motivation” which takes 3 values “No motivation”,
“Neutral” and “Highly motivated”.
• Use Decision Tree
• Use Information gain to decide which feature should be the root
node and which feature should be placed after the split.
• Using Feature 1
- Calculate Entropy
- Calculate Information Gain
• Entropy and Information Gain

• Parent entropy was near 0.99 and after looking at this value of information
gain-
Conclusion : entropy of the dataset will decrease by 0.37 if we make
“Energy” as our root node.
• Feature 2
• Conclusions:
• “Energy” feature gives more reduction which is 0.37 than the
“Motivation” feature. Hence we will select the feature which has the
highest information gain and then split the node based on that
feature.
• “Energy” will be our root node and we’ll do the same for
sub-nodes. Here we can see that when the energy is “high” the
entropy is low and hence we can say a person will definitely go to
the gym if he has high energy,
• but what if the energy is low? We will again split the node based on
the new feature which is “Motivation”.
Prunning
• Pruning is another method that can help us avoid overfitting. It helps
in improving the performance of the Decision tree by cutting the
nodes or sub-nodes which are not significant. Additionally, it removes
the branches which have very low importance.
• There are mainly 2 ways for pruning:
• Pre-pruning – we can stop growing the tree earlier, which means we
can prune/remove/cut a node if it has low importance while
growing the tree.
• Post-pruning – once our tree is built to its depth, we can start
pruning the nodes based on their significance.
Example 3
SVM (Support Vector Machine)
⦿ Concept
⦿ Types
⦿ Linear
⦿ Non-linear
⦿ Use of Dot products
⦿ Examples
⦿ Kernel in SVM
Concept
• SVM is a powerful supervised algorithm that works best on smaller
datasets but on complex ones.
• used for both regression and classification tasks, but generally, they
work best in classification problems.
• It is a supervised machine learning problem where we try to find a
hyperplane that best separates the two classes.
• Don’t get confused between SVM and logistic regression.
• Both the algorithms try to find the best hyperplane, but the main
difference is logistic regression is a probabilistic approach whereas
support vector machine is based on statistical approaches.
• Answers to questions like –
- which hyperplane does it select?
- There can be an infinite number of hyperplanes passing through a
point and classifying the two classes perfectly.
- So, which one is the best?
• Depending on the number of features you have you can either
choose Logistic Regression or SVM.
• SVM works best when the dataset is small and complex.
• advisable to first use logistic regression and see how does it performs,
if it fails to give a good accuracy you can go for SVM without any
kernel
• Logistic regression and SVM without any kernel have similar
performance but depending on your features, one may be more
efficient than the other.
Types of SVM
• Linear SVM: When the data is perfectly linearly separable only then
we can use Linear SVM. Perfectly linearly separable means that the
data points can be classified into 2 classes by using a single straight
line(if 2D).
• Non-Linear SVM: When the data is not linearly separable then we can
use Non-Linear SVM, which means when the data points cannot be
separated into 2 classes by using a straight line (if 2D) then we use
some advanced techniques like kernel tricks to classify them.
• In most real-world applications we do not find linearly separable
datapoints hence we use kernel trick to solve them.
Important Definitions
• Support Vectors: These are the points that are closest to the
hyperplane. A separating line will be defined with the help of these
data points.
• Margin: it is the distance between the hyperplane and the
observations closest to the hyperplane (support vectors). In SVM
large margin is considered a good margin. There are two types of
margins hard margin and soft margin.
Example – Linear SVM
• We want to classify that the new data point as either blue or green.
• To classify these points, we can have many decision boundaries, but
the question is which is the best and how do we find it?
• The best hyperplane is that plane that has the maximum distance
from both the classes, and this is the main aim of SVM.
• This is done by finding different hyperplanes which classify the labels
in the best way then it will choose the one which is farthest from the
data points or the one which has a maximum margin.
How does it work?
⦿ Identify Cat or Dog?
⦿ Support Vectors :
⦿ Linear SVM : Hyperplane
⦿ Non-linear SVM example
Example - Non –linear SVM
⦿ Finding equation for SV :
⦿ Final Classification result
Use of Dot Product in SVM
• The dot product can be defined as the projection of one vector along
with another, multiply by the product of another vector.
• Consider a random point X and we want to know whether it lies on
the right side of the plane or the left side of the plane (positive or
negative).
• Assume this point is a vector (X) and then we make a vector (w) which
is perpendicular to the hyperplane. Let’s say the distance of vector w
from origin to decision boundary is ‘c’. Now we take the projection of
X vector on w.
• Criteria for Classification based on dot product:
- projection of any vector or another vector is called dot-product. we
take the dot product of x and w vectors.
• If the dot product is greater than ‘c’ point lies on the right side.
• If the dot product is less than ‘c’ then the point is on the left side
• If the dot product is equal to ‘c’ then the point lies on the decision
boundary.
Margin in Support Vector Machine
• To classify a point as negative or positive we need to define a decision
rule.
• The equation of a hyperplane is w.x+b=0 where w is a vector normal
to hyperplane and b is an offset.

• If the value of w.x+b>0 then we can say it is a positive point otherwise

it is a negative point.
• we need (w,b) such that the margin has a maximum distance. Let’s
say this distance is ‘d’.

• To calculate ‘d’ we need the equation of L1 and L2.

• For this, we will take few assumptions that –
• the equation of L1 is w.x+b=1 and for
• L2 it is w.x+b=-1.
• Why the magnitude is equal, why didn’t we take 1 and -2?
• Why did we only take 1 and -1, why not any other value like 24 and
-100?
• Why did we assume this line?
Example:
• Let’s say the equation of our hyperplane is 2x+y=2
• Create margin for this hyperplane,
Summary
• If you multiply these equations by 10,
- the parallel line (red and green) gets closer to our hyperplane.
• If we divide this equation by 10
-then these parallel lines get bigger
• The parallel lines depend on (w,b) of our hyperplane,
• If we multiply the equation of hyperplane with a factor greater than
1 then the parallel lines will shrink
• If we multiply with a factor less than 1, they expand.
• These lines will move as we do changes in (w,b) and this is how this
gets optimized.
SVM Error
• SVM Error = Margin Error + Classification Error.
• The higher the margin, the lower would-be margin error, and vice
versa
• high value of ‘c’ =1000, this would mean that you don’t want to focus
on margin error and just want a model which doesn’t misclassify any
data point.
• Which is a better model?
- the one where the margin is maximum and has 2 misclassified points
or
- the one where the margin is very less, and all the points are correctly
classified?
• Increase ‘c’ to decrease Classification Error but
• If you want that your margin should be maximized then the value of
‘c’ should be minimized.
• That’s why ‘c’ is a hyperparameter and we find the optimal value of
‘c’
Kernels in SVM
• Need
• Solution:
• Converting this lower dimension space to a higher dimension space
using some quadratic functions which will allow us to find a decision
boundary that clearly divides the data points.
• These functions which help us do this are called Kernels.
• which kernel to use is purely determined by hyperparameter tuning.
• Use of Kernel
Evaluation Matrix for classification:
Confusion Matrix
• Machine learning models are increasingly used in various applications
to classify data into different categories.
• However, evaluating the performance of these models is crucial to
ensure their accuracy and reliability.
• One essential tool in this evaluation process is the confusion matrix.
• A confusion matrix is a matrix that summarizes the performance of a
machine learning model on a set of test data.
• It is a means of displaying the number of accurate and inaccurate
instances based on the model’s predictions.
• It is often used to measure the performance of classification models,
which aim to predict a categorical label for each input instance.
•
• The matrix displays the number of instances produced by the model
on the test data.
•
Metrics based on Confusion Matrix Data
Confusion Matrix For binary classification

• A 2X2 Confusion matrix is shown below for the image recognition

having a Dog image or Not Dog image.
• Scenario: Example: Confusion Matrix for Dog Image Recognition
with Numbers
• Confusion Matrix
Confusion Matrix For Multi-class
Classification
• In multi-class classification, you have more than two possible classes
for your model to predict. The confusion matrix expands to
accommodate these additional classes.
• Rows: Represent the actual classes (ground truth) in your dataset.
• Columns: Represent the predicted classes by your model.
• Each cell within the matrix shows the count of instances where the
model predicted a particular class (column) when the actual class was
another (row).
• A 3X3 Confusion matrix is shown below for the image having three
classes.
• Example: Confusion Matrix for Image Classification (Cat, Dog, Horse)

•True Positive (TP): The image was of a particular animal (cat,

dog, or horse), and the model correctly predicted that animal.
For example, a picture of a cat correctly identified as a cat.

•False Negative (FN): The image was of a particular animal, but

the model incorrectly predicted it as a different animal. For
example, a picture of a dog mistakenly identified as a cat.
• In this scenario:
Cats: 8 were correctly identified, 1 was misidentified as a dog, and 1
was misidentified as a horse.
Dogs: 10 were correctly identified, 2 were misidentified as cats.
Horses: 8 were correctly identified, 2 were misidentified as dogs.

S.No Masjid / Mosque Name Address Google Map Location Bayan Khutba
No ratings yet
S.No Masjid / Mosque Name Address Google Map Location Bayan Khutba
1 page
Chemistry, 2nd Edition - Julia Burdge
92% (13)
Chemistry, 2nd Edition - Julia Burdge
1,121 pages
Service Manual 590 - 2090
100% (6)
Service Manual 590 - 2090
128 pages
Management Theory Analysis
50% (2)
Management Theory Analysis
13 pages
Disclosure in Family Law Cases - Getting It Right
No ratings yet
Disclosure in Family Law Cases - Getting It Right
34 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit - 3
No ratings yet
Unit - 3
73 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Classification
No ratings yet
Classification
7 pages
Ai Unit 4
No ratings yet
Ai Unit 4
17 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Dsbdunitiii T1729232981820-1
No ratings yet
Dsbdunitiii T1729232981820-1
26 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
ML Module 3
No ratings yet
ML Module 3
44 pages
ML Notes
No ratings yet
ML Notes
50 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
ML Unit II - Final
No ratings yet
ML Unit II - Final
138 pages
Unit II
No ratings yet
Unit II
34 pages
AI Unit 4
No ratings yet
AI Unit 4
15 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Module 3
No ratings yet
Module 3
79 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
ML Answers
No ratings yet
ML Answers
7 pages
ML Notes
No ratings yet
ML Notes
12 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Classifying in Machine Learning
No ratings yet
Classifying in Machine Learning
26 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
No ratings yet
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
26 pages
AIch 5
No ratings yet
AIch 5
50 pages
Module 5 - Supervised Learning Algorithms
No ratings yet
Module 5 - Supervised Learning Algorithms
38 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
STEM Lab PDF
100% (1)
STEM Lab PDF
12 pages
SCI-215 Short Paper/Case Study Analysis Nichole Hammer Southern New Hampshire University
No ratings yet
SCI-215 Short Paper/Case Study Analysis Nichole Hammer Southern New Hampshire University
5 pages
CPAR Learning Activity 2
No ratings yet
CPAR Learning Activity 2
9 pages
Caste and Diaspora
No ratings yet
Caste and Diaspora
3 pages
I Robot
No ratings yet
I Robot
4 pages
RPH ENGLISH YEAR 2 UNIT 8 (AutoRecovered)
No ratings yet
RPH ENGLISH YEAR 2 UNIT 8 (AutoRecovered)
41 pages
Duplicate Invoice Check
No ratings yet
Duplicate Invoice Check
17 pages
Qualifications of Board of Nursing
No ratings yet
Qualifications of Board of Nursing
11 pages
Declaration of The Rights of Man
No ratings yet
Declaration of The Rights of Man
2 pages
Bail - Crime No 362 (2024)
No ratings yet
Bail - Crime No 362 (2024)
6 pages
8 +Dedi+Aryadi
No ratings yet
8 +Dedi+Aryadi
7 pages
Religious Beliefs and Practices
No ratings yet
Religious Beliefs and Practices
17 pages
Unit 2 - Formal Link 1
No ratings yet
Unit 2 - Formal Link 1
66 pages
Ruling Planet - Jyotish
No ratings yet
Ruling Planet - Jyotish
2 pages
IAL Unit 3 QP 1
No ratings yet
IAL Unit 3 QP 1
10 pages
Fifth Grade Activity Book 2024-B
No ratings yet
Fifth Grade Activity Book 2024-B
102 pages
Equation/Func: fx-991EX Quick Start Guide 40
No ratings yet
Equation/Func: fx-991EX Quick Start Guide 40
5 pages
The Top 5 Temptations For Senior Adults
No ratings yet
The Top 5 Temptations For Senior Adults
7 pages
Edexcel History A Level Coursework Grade Boundaries
100% (1)
Edexcel History A Level Coursework Grade Boundaries
6 pages
Case No 114 Philippine Tobacco Flu Curing and Redrying Corp Vs NLRC Dec 10, 1998
No ratings yet
Case No 114 Philippine Tobacco Flu Curing and Redrying Corp Vs NLRC Dec 10, 1998
4 pages
Aqeedah 101 Notes - Session 3
No ratings yet
Aqeedah 101 Notes - Session 3
6 pages
Diagnostic Test Grade 6 Sem 2
No ratings yet
Diagnostic Test Grade 6 Sem 2
5 pages
Rubric For Legal Essay Final Requirement
No ratings yet
Rubric For Legal Essay Final Requirement
2 pages
Biva 2 PDF
No ratings yet
Biva 2 PDF
1 page
Assignment Presentation On Eigen Vector
No ratings yet
Assignment Presentation On Eigen Vector
27 pages

Refer For KNNDecison Tree SVM

Uploaded by

Refer For KNNDecison Tree SVM

Uploaded by

Module 2

Supervised Learning Algorithms

• Entropy, information gain, and Gini index.

• The goal of machine learning is to decrease uncertainty or disorders

• It is just entropy of the full dataset – entropy of the dataset given

• If the value of w.x+b>0 then we can say it is a positive point otherwise

• To calculate ‘d’ we need the equation of L1 and L2.

• A 2X2 Confusion matrix is shown below for the image recognition

•True Positive (TP): The image was of a particular animal (cat,

•False Negative (FN): The image was of a particular animal, but

You might also like