0% found this document useful (0 votes)

6 views19 pages

Classification and Clustering Algorithm Notes

The document provides an overview of classification algorithms, a supervised learning technique used to categorize new observations based on training data. It details various types of classification algorithms, including Logistic Regression, Naive Bayes, K-Nearest Neighbors, Decision Trees, Random Forests, and Support Vector Machines, along with their evaluation methods. Additionally, it briefly mentions K-Means clustering as a separate technique for grouping data points into clusters.

Uploaded by

poojithachowdaryuppalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views19 pages

Classification and Clustering Algorithm Notes

Uploaded by

poojithachowdaryuppalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

The Classification algorithm is a Supervised Learning technique that is

used to identify the category of new observations on the basis of

training data. In Classification, a program learns from the given dataset
or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or
dog, etc. Classes can be called as targets/labels or categories. Unlike
regression, the output variable of Classification is a category, not a
value, such as "Green or Blue", "fruit or animal", etc. Since the
Classification algorithm is a Supervised learning technique, hence it
takes labelled input data, which means it contains input with the
corresponding output.

Classification Algorithms can be further divided into the Mainly two category:

o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

Learners in Classification Problems

There are two types of learners.

 Lazy Learners

It first stores the training dataset before waiting for the test dataset to arrive.
When using a lazy learner, the classification is carried out using the training
dataset's most appropriate data. Less time is spent on training, but more
time is spent on predictions. Some of the examples are case-based
reasoning and the KNN algorithm.

 Eager Learners

Before obtaining a test dataset, eager learners build a classification model

using a training dataset. They spend more time studying and less time
predicting. Some of the examples are ANN, naive Bayes, and Decision trees.
Types of Classification Algorithms

1. Logistic Regression

It is a supervised learning classification technique that forecasts the

likelihood of a target variable. There will only be a choice between two
classes. Data can be coded as either one or yes, representing success, or as
0 or no, representing failure. The dependent variable can be predicted most
effectively using logistic regression. When the forecast is categorical, such as
true or false, yes or no, or a 0 or 1, you can use it. A logistic regression
technique can be used to determine whether or not an email is a spam.

2. Naive Byes

Naive Bayes determines whether a data point falls into a particular category.
It can be used to classify phrases or words in text analysis as either falling
within a predetermined classification or not. It assumes that predictors in a
dataset are independent. This means that it assumes the features are
unrelated to each other. For example, if given a banana, the classifier will
see that the fruit is of yellow color, oblong-shaped and long and tapered. All
of these features will contribute independently to the probability of it being a
banana and are not dependent on each other. Naive Bayes is based on
Bayes’ theorem, which is given as:

Figure 3 : Bayes’ Theorem

Where :

P(A | B) = how often happens given that B happens

P(A) = how likely A will happen

P(B) = how likely B will happen

P(B | A) = how often B happens given that A happens

Text Tag

“A great game” Sports

“The election is over” Not Sports

“What a great score” Sports

“A clean and unforgettable game” Sports

“The spelling bee winner was a surprise” Not Sports

3. K-Nearest Neighbors

It calculates the likelihood that a data point will join the groups based on
which group the data points closest to it are a part of. When using k-NN for
classification, you determine how to classify the data according to its nearest
neighbor.

The parameter k in kNN refers to the number of labeled points (neighbors)

considered for classification. The value of k indicates the number of these
points used to determine the result. Our task is to calculate the distance and
identify which categories are closest to our unknown entity.

Given a point whose class we do not know, we can try to understand which
points in our feature space are closest to it. These points are the k-nearest
neighbors. Since similar things occupy similar places in feature space, it’s
very likely that the point belongs to the same class as its neighbors. Based on
that, it’s possible to classify a new point as belonging to one class or another.

Some advanced methods for selecting k that are suitable for these cases.

1. Square root method

The optimal K value can be calculated as the square root of the total number
of samples in the training dataset. Use an error plot or accuracy plot to find
the most favorable K value. KNN performs well with multi-label classes, but in
case of the structure with many outliers, it can fail, and you’ll need to use
other methods.

2. Cross-validation method (Elbow method)

Begin with k=1, then perform cross-validation (5 to 10 fold – these figures are
common practice as they provide a good balance between the computational
efforts and statistical validity), and evaluate the accuracy. Keep repeating the
same steps until you get consistent results. As k goes up, the error usually
decreases, then stabilizes, and then grows again. The optimal k lies at the
beginning of the stable zone.

K-distance is the distance between data points and a given query point. To
calculate it, we have to pick a distance metric.
Some of the most popular metrics are explained below.

Euclidean distance

The Euclidean distance between two points is the length of the straight line
segment connecting them. This most common distance metric is applied to
real-valued vectors.

Manhattan distance

The Manhattan distance between two points is the sum of the absolute
differences between the x and y coordinates of each point. Used to measure
the minimum distance by summing the length of all the intervals needed to get
from one location to another in a city, it’s also known as the taxicab distance.
Minkowski distance

Minkowski distance generalizes the Euclidean and Manhattan distances. It

adds a parameter called “order” that allows different distance measures to be
calculated. Minkowski distance indicates a distance between two points in a
normed vector space

Hamming distance

Hamming distance is used to compare two binary vectors (also called data
strings or bitstrings). To calculate it, data first has to be translated into a binary
system.

4. Decision Tree

A decision tree is an example of supervised learning. Although it can solve

regression and classification problems, it excels in classification problems.
Similar to a flow chart, it divides data points into two similar groups at a
time, starting with the "tree trunk" and moving through the "branches" and
"leaves" until the categories are more closely related to one another.
Decision tree builds classification or regression models in the form of a tree
structure. It breaks down a dataset into smaller and smaller subsets while at
the same time an associated decision tree is incrementally developed. The
final result is a tree with decision nodes and leaf nodes. A decision node
(e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy).
Leaf node (e.g., Play) represents a classification or decision. The topmost
decision node in a tree which corresponds to the best predictor called root
node. Decision trees can handle both categorical and numerical data.
Decision tree includes all predictors with the dependence assumptions between predictors.

Entropy
A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain
instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sam
the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entrop
one.

To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:
a) Entropy using the frequency table of one attribute:

b) Entropy using the frequency table of two attributes:

Information Gain
The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a
decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneou
branches).

Step 1: Calculate entropy of the target.

Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is
proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before th
The result is the Information Gain, or decrease in entropy.

Step 3: Choose attribute with the largest information gain as the decision node, divide the dataset by its branch
repeat the same process on every branch.
Step 4a: A branch with entropy of 0 is a leaf node.

Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
5. Random Forest Algorithm

The random forest algorithm is an extension of the Decision Tree algorithm

where you first create a number of decision trees using training data and
then fit your new data into one of the created ‘tree’ as a ‘random forest’. It
averages the data to connect it to the nearest tree data based on the data
scale. These models are great for improving the decision tree’s problem of
forcing data points unnecessarily within a category.

The following steps explain the working Random Forest Algorithm:

Step 1: Select random samples from a given data or training set.

Step 2: This algorithm will construct a decision tree for every training data.

Step 3: Voting will take place by averaging the decision tree.

Step 4: Finally, select the most voted prediction result as the final prediction
result.

This combination of multiple models is called Ensemble. Ensemble uses two

methods:

1. Bagging: Creating a different training subset from sample training

data with replacement is called Bagging. The final output is based
on majority voting.

2. Boosting: Combing weak learners into strong learners by creating

sequential models such that the final model has the highest
accuracy is called Boosting. Example: ADA BOOST, XG BOOST.

6. Support Vector Machine

Support Vector Machine is a popular supervised machine learning technique
for classification and regression problems. It goes beyond X/Y prediction by
using algorithms to classify and train the data according to polarity.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily
put the new data point in the correct category in the future. This best
decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the

hyperplane. These extreme cases are called as support vectors, and hence
algorithm is termed as Support Vector Machine. Consider the below diagram
in which there are two different categories that are classified using a
decision boundary or hyperplane:

Evaluating a Classification model:

Once our model is completed, it is necessary to evaluate its performance;
either it is a Classification or Regression model. So for evaluating a
Classification model, we have the following ways:

1. Log Loss or Cross-Entropy Loss:

o It is used for evaluating the performance of a classifier, whose output

is a probability value between the 0 and 1.
o For a good binary Classification model, the value of log loss should be
near to 0.
o The value of log loss increases if the predicted value deviates from the
actual value.
o The lower log loss represents the higher accuracy of the model.
o For Binary classification, cross-entropy can be calculated as:
1. ?(ylog(p)+(1?y)log(1?p))

Where y= Actual output, p= predicted output.

2. Confusion Matrix:

o The confusion matrix provides us a matrix/table as output and

describes the performance of the model.
o It is also known as the error matrix.
o The matrix consists of predictions result in a summarized form, which
has a total number of correct predictions and incorrect predictions. The
matrix looks like as below table:
Actual Positive Actual Negative
o

Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

3. AUC-ROC curve:

o ROC curve stands for Receiver Operating Characteristics

Curve and AUC stands for Area Under the Curve.
o It is a graph that shows the performance of the classification model at
different thresholds.
o To visualize the performance of the multi-class classification model, we
use the AUC-ROC Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive
Rate) on Y-axis and FPR(False Positive Rate) on X-axis.

K-Means Clustering Algorithm

The K-means clustering algorithm computes centroids and repeats

until the optimal centroid is found. It is presumptively known how
many clusters there are. It is also known as the flat clustering
algorithm. The number of clusters found from data by the method is
denoted by the letter ‘K’ in K-means.

In this method, data points are assigned to clusters in such a way

that the sum of the squared distances between the data points and
the centroid is as small as possible. It is essential to note that
reduced diversity within clusters leads to more identical data points
within the same cluster.

The following stages will help us understand how the K-Means

clustering technique works-
 Step 1: First, we need to provide the number of clusters, K,
that need to be generated by this algorithm.
 Step 2: Next, choose K data points at random and assign each
to a cluster. Briefly, categorize the data based on the number
of data points.
 Step 3: The cluster centroids will now be computed.
 Step 4: Iterate the steps below until we find the ideal centroid,
which is the assigning of data points to clusters that do not
vary.
 4.1 The sum of squared distances between data points and
centroids would be calculated first.
 4.2 At this point, we need to allocate each data point to the
cluster that is closest to the others (centroid).
 4.3 Finally, compute the centroids for the clusters by averaging
all of the cluster’s data points.

Applications of K-Means clustering

 To get relevant insights from the data we’re dealing with.
 Distinct models will be created for different subgroups in a
cluster-then-predict approach.
 Market segmentation
 Document Clustering
 Image segmentation
 Image compression
 Customer segmentation
 Analysing the trend on dynamic data

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
Data Science Training Content Naresh IT Hyderabad
No ratings yet
Data Science Training Content Naresh IT Hyderabad
13 pages
Cse Vsem 503 B PR Unit 2 Notes
No ratings yet
Cse Vsem 503 B PR Unit 2 Notes
17 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Classification
No ratings yet
Classification
7 pages
Module Iii
No ratings yet
Module Iii
15 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
Slide 3
No ratings yet
Slide 3
23 pages
ML Notes
No ratings yet
ML Notes
50 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Chapter
100% (1)
Chapter
101 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
Unit - II
No ratings yet
Unit - II
37 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Machine Learning
100% (6)
Machine Learning
115 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
WK 07
No ratings yet
WK 07
8 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
Introduction To Classification and Classification Algorithms
No ratings yet
Introduction To Classification and Classification Algorithms
9 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
No ratings yet
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
50 pages
ML - Course - 15 - 17
No ratings yet
ML - Course - 15 - 17
31 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Classification
No ratings yet
Classification
50 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Unit 5
No ratings yet
Unit 5
25 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Module 5 - Supervised Learning Algorithms
No ratings yet
Module 5 - Supervised Learning Algorithms
38 pages
ML4 ML Algorithms
No ratings yet
ML4 ML Algorithms
123 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Learning Types ML
No ratings yet
Learning Types ML
18 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
DM - MP
No ratings yet
DM - MP
15 pages
ML Unit4
No ratings yet
ML Unit4
10 pages
Classification Chapter 5
No ratings yet
Classification Chapter 5
26 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Weighted Majority Voting Ensemble Approach For Classification
No ratings yet
A Weighted Majority Voting Ensemble Approach For Classification
6 pages
Artificial Intelligence For Fault Diagnosis of Rotating Machinery A Review
100% (1)
Artificial Intelligence For Fault Diagnosis of Rotating Machinery A Review
15 pages
CA - 605 - MJP Machine Learning Practical Slips
No ratings yet
CA - 605 - MJP Machine Learning Practical Slips
25 pages
Recognition of Food Type and Calorie Estimation Using Neural Network
No ratings yet
Recognition of Food Type and Calorie Estimation Using Neural Network
22 pages
Learning Vector Quantization (LVQ) and K-Nearest Neighbor For Intrusion Classification
No ratings yet
Learning Vector Quantization (LVQ) and K-Nearest Neighbor For Intrusion Classification
5 pages
Chapter 4 Association Rule Mining (ARM)
No ratings yet
Chapter 4 Association Rule Mining (ARM)
22 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
Employability Prediction Model Using Academic Performance in Higher Education Through Deep Learning Techniques
No ratings yet
Employability Prediction Model Using Academic Performance in Higher Education Through Deep Learning Techniques
16 pages
MAKE A COPY OF THE DOC - Harvard CV Template
No ratings yet
MAKE A COPY OF THE DOC - Harvard CV Template
1 page
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
No ratings yet
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
5 pages
Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation
No ratings yet
Smart User Consumption Profiling: Incremental Learning-Based OTT Service Degradation
18 pages
Crime Data Prediction Based On Geographical Location Using Machi
No ratings yet
Crime Data Prediction Based On Geographical Location Using Machi
60 pages
Shreyansh Verma DS Resume
No ratings yet
Shreyansh Verma DS Resume
1 page
Lab6 Writeup
No ratings yet
Lab6 Writeup
2 pages
Detection of Emergent Leaks Using Machine Learning
No ratings yet
Detection of Emergent Leaks Using Machine Learning
17 pages
Electronics: Network Slicing For Beyond 5G Systems: An Overview of The Smart Port Use Case
No ratings yet
Electronics: Network Slicing For Beyond 5G Systems: An Overview of The Smart Port Use Case
17 pages
Research Paper1
No ratings yet
Research Paper1
11 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
00a Template Precision Agriculture PUBLISHED
No ratings yet
00a Template Precision Agriculture PUBLISHED
18 pages
Thesis Slide
No ratings yet
Thesis Slide
24 pages
FinalReport Asif Seum Tanvir Simanta CSE498R 15
No ratings yet
FinalReport Asif Seum Tanvir Simanta CSE498R 15
41 pages
ML Question Bank 6th Sem
No ratings yet
ML Question Bank 6th Sem
4 pages
Rainfall Prediction Using Machine Learning
100% (1)
Rainfall Prediction Using Machine Learning
6 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
9 AIML Question Bank Updated 5 Units
No ratings yet
9 AIML Question Bank Updated 5 Units
21 pages
Objection Detection
No ratings yet
Objection Detection
25 pages
Supervised Learning - A Systematic Literature Review
No ratings yet
Supervised Learning - A Systematic Literature Review
22 pages

Classification and Clustering Algorithm Notes

Uploaded by

Classification and Clustering Algorithm Notes

Uploaded by

The Classification algorithm is a Supervised Learning technique that is

used to identify the category of new observations on the basis of

Learners in Classification Problems

Before obtaining a test dataset, eager learners build a classification model

It is a supervised learning classification technique that forecasts the

Figure 3 : Bayes’ Theorem

P(A | B) = how often happens given that B happens

P(A) = how likely A will happen

P(B) = how likely B will happen

P(B | A) = how often B happens given that A happens

“A great game” Sports

“The election is over” Not Sports

“What a great score” Sports

“A clean and unforgettable game” Sports

“The spelling bee winner was a surprise” Not Sports

The parameter k in kNN refers to the number of labeled points (neighbors)

1. Square root method

2. Cross-validation method (Elbow method)

Minkowski distance generalizes the Euclidean and Manhattan distances. It

A decision tree is an example of supervised learning. Although it can solve

b) Entropy using the frequency table of two attributes:

Step 1: Calculate entropy of the target.

The random forest algorithm is an extension of the Decision Tree algorithm

The following steps explain the working Random Forest Algorithm:

Step 1: Select random samples from a given data or training set.

Step 3: Voting will take place by averaging the decision tree.

This combination of multiple models is called Ensemble. Ensemble uses two

1. Bagging: Creating a different training subset from sample training

2. Boosting: Combing weak learners into strong learners by creating

6. Support Vector Machine

SVM chooses the extreme points/vectors that help in creating the

Evaluating a Classification model:

1. Log Loss or Cross-Entropy Loss:

o It is used for evaluating the performance of a classifier, whose output

Where y= Actual output, p= predicted output.

o The confusion matrix provides us a matrix/table as output and

Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

o ROC curve stands for Receiver Operating Characteristics

K-Means Clustering Algorithm

The K-means clustering algorithm computes centroids and repeats

In this method, data points are assigned to clusters in such a way

The following stages will help us understand how the K-Means

Applications of K-Means clustering

You might also like