0% found this document useful (0 votes)

25 views9 pages

7 Types of Classification Algorithms

Uploaded by

kPrasad8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views9 pages

7 Types of Classification Algorithms

Uploaded by

kPrasad8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

7 Types of Classification Algorithms

The purpose of this research is to put together the 7 most common

types of classification algorithms along with the python code: Logistic
Regression, Naïve Bayes, Stochastic Gradient Descent, K-Nearest
Neighbours, Decision Tree, Random Forest, and Support Vector
Machine

1 Introduction

1.1 Structured Data Classification

Classification can be performed on structured or unstructured data.

Classification is a technique where we categorize data into a given
number of classes. The main goal of a classification problem is to
identify the category/class to which a new data will fall under.

Few of the terminologies encountered in machine learning –

classification:

• Classifier: An algorithm that maps the input data to a specific

category.
• Classification model: A classification model tries to draw some
conclusion from the input values given for training. It will predict
the class labels/categories for the new data.
• Feature: A feature is an individual measurable property of a
phenomenon being observed.
• Binary Classification: Classification task with two possible
outcomes. Eg: Gender classification (Male / Female)
• Multi-class classification: Classification with more than two
classes. In multi class classification each sample is assigned to
one and only one target label. Eg: An animal can be cat or dog
but not both at the same time
• Multi-label classification: Classification task where each sample is
mapped to a set of target labels (more than one class). Eg: A
news article can be about sports, a person, and location at the
same time.
The following are the steps involved in building a classification model:

• Initialize the classifier to be used.

• Train the classifier: All classifiers in scikit-learn uses a fit(X, y)
method to fit the model(training) for the given train data X and
train label y.
• Predict the target: Given an unlabeled observation X, the
predict(X) returns the predicted label y.
• Evaluate the classifier model

1.2 Dataset Source and Contents

The dataset contains salaries. The following is a description of our

dataset:

• of Classes: 2 (‘>50K’ and ‘<=50K’)

• of attributes (Columns): 7
• of instances (Rows): 48,842
This data was extracted from the census bureau database found at:
http://www.census.gov/ftp/pub/DES/www/welcome.html

1.3 Exploratory Data Analysis

2 Types of Classification Algorithms (Python)

2.1 Logistic Regression

Definition: Logistic regression is a machine learning algorithm for
classification. In this algorithm, the probabilities describing the possible
outcomes of a single trial are modelled using a logistic function.

Advantages: Logistic regression is designed for this purpose

(classification), and is most useful for understanding the influence of
several independent variables on a single outcome variable.

Disadvantages: Works only when the predicted variable is binary,

assumes all predictors are independent of each other and assumes
data is free of missing values.

2.2 Naïve Bayes

Definition: Naive Bayes algorithm based on Bayes’ theorem with the

assumption of independence between every pair of features. Naive
Bayes classifiers work well in many real-world situations such as
document classification and spam filtering.

Advantages: This algorithm requires a small amount of training data to

estimate the necessary parameters. Naive Bayes classifiers are
extremely fast compared to more sophisticated methods.

Disadvantages: Naive Bayes is is known to be a bad estimator.

2.3 Stochastic Gradient Descent

Definition: Stochastic gradient descent is a simple and very efficient
approach to fit linear models. It is particularly useful when the number
of samples is very large. It supports different loss functions and
penalties for classification.

Advantages: Efficiency and ease of implementation.

Disadvantages: Requires a number of hyper-parameters and it is

sensitive to feature scaling.

2.4 K-Nearest Neighbours

Definition: Neighbours based classification is a type of lazy learning

as it does not attempt to construct a general internal model, but simply
stores instances of the training data. Classification is computed from a
simple majority vote of the k nearest neighbours of each point.

Advantages: This algorithm is simple to implement, robust to noisy

training data, and effective if training data is large.

Disadvantages: Need to determine the value of K and the

computation cost is high as it needs to compute the distance of each
instance to all the training samples.

2.5 Decision Tree

Definition: Given a data of attributes together with its classes, a
decision tree produces a sequence of rules that can be used to classify
the data.

Advantages: Decision Tree is simple to understand and visualise,

requires little data preparation, and can handle both numerical and
categorical data.

Disadvantages: Decision tree can create complex trees that do not

generalise well, and decision trees can be unstable because small
variations in the data might result in a completely different tree being
generated.

2.6 Random Forest

Definition: Random forest classifier is a meta-estimator that fits a

number of decision trees on various sub-samples of datasets and uses
average to improve the predictive accuracy of the model and controls
over-fitting. The sub-sample size is always the same as the original
input sample size but the samples are drawn with replacement.

Advantages: Reduction in over-fitting and random forest classifier is

more accurate than decision trees in most cases.

Disadvantages: Slow real time prediction, difficult to implement, and

complex algorithm.
2.7 Support Vector Machine

Definition: Support vector machine is a representation of the training

data as points in space separated into categories by a clear gap that is
as wide as possible. New examples are then mapped into that same
space and predicted to belong to a category based on which side of
the gap they fall.

Advantages: Effective in high dimensional spaces and uses a subset

of training points in the decision function so it is also memory efficient.

Disadvantages: The algorithm does not directly provide probability

estimates, these are calculated using an expensive five-fold cross-
validation.

3 Conclusion

3.1 Comparison Matrix

• Accuracy: (True Positive + True Negative) / Total Population

o Accuracy is a ratio of correctly predicted observation to the
total observations. Accuracy is the most intuitive
performance measure.
o True Positive: The number of correct predictions that the
occurrence is positive
o True Negative: The number of correct predictions that the
occurrence is negative
• F1-Score: (2 x Precision x Recall) / (Precision + Recall)
o F1-Score is the weighted average of Precision and Recall
used in all types of classification algorithms. Therefore, this
score takes both false positives and false negatives into
account. F1-Score is usually more useful than accuracy,
especially if you have an uneven class distribution.
o Precision: When a positive value is predicted, how often is
the prediction correct?
o Recall: When the actual value is positive, how often is the
prediction correct?

Classification Algorithms Accuracy F1-Score

Logistic Regression 84.60% 0.6337

Naïve Bayes 80.11% 0.6005

Stochastic Gradient Descent 82.20% 0.5780

K-Nearest Neighbours 83.56% 0.5924

Decision Tree 84.23% 0.6308

Random Forest 84.33% 0.6275

Support Vector Machine 84.09% 0.6145

Code location: https://github.com/f2005636/Classification

3.2 Algorithm Selection

(Types of Classification Algorithms)

Big Data Mining and Analytics Notes
No ratings yet
Big Data Mining and Analytics Notes
7 pages
Data Engineering
No ratings yet
Data Engineering
1 page
Translations of Shapes
No ratings yet
Translations of Shapes
4 pages
UNIT-3 (2)
No ratings yet
UNIT-3 (2)
123 pages
Lec 17 -Dsfa23
No ratings yet
Lec 17 -Dsfa23
32 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Kanksha2021_Chapter_SupervsedLearnngAlgorthmASu
No ratings yet
Kanksha2021_Chapter_SupervsedLearnngAlgorthmASu
9 pages
Classification Notes (1)
No ratings yet
Classification Notes (1)
14 pages
Unit 4 DS
No ratings yet
Unit 4 DS
16 pages
13 a
No ratings yet
13 a
14 pages
unit 3 &4 BDA notes
No ratings yet
unit 3 &4 BDA notes
20 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
unit 5
No ratings yet
unit 5
25 pages
Ai
No ratings yet
Ai
10 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
69 pages
overview_basics
No ratings yet
overview_basics
16 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
classification
No ratings yet
classification
34 pages
Assignment 2
No ratings yet
Assignment 2
111 pages
Classification_Report_Research_Lab_(2)
No ratings yet
Classification_Report_Research_Lab_(2)
6 pages
Prac 5
No ratings yet
Prac 5
4 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
ml unit 3
No ratings yet
ml unit 3
13 pages
unit 3 pdf
No ratings yet
unit 3 pdf
7 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
classification basic concept.data mining
No ratings yet
classification basic concept.data mining
20 pages
Syllabus
No ratings yet
Syllabus
2 pages
Assessing a Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing a Single Classification Algorithm and Two Classification Algorithms
12 pages
unit3 ml
No ratings yet
unit3 ml
7 pages
3 DM Classification (2)
No ratings yet
3 DM Classification (2)
62 pages
1 - Supervised Learning & Its Types
No ratings yet
1 - Supervised Learning & Its Types
24 pages
Adbms Assignment 5: Q.1) Comparison of All Classification Algorithms Logistic Regression
No ratings yet
Adbms Assignment 5: Q.1) Comparison of All Classification Algorithms Logistic Regression
4 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
ML models
No ratings yet
ML models
21 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
Big Data Lesson 5 Lucrezia Noli
No ratings yet
Big Data Lesson 5 Lucrezia Noli
30 pages
Machine Learning Supervised
No ratings yet
Machine Learning Supervised
42 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Classification
No ratings yet
Classification
33 pages
machine learning notes
No ratings yet
machine learning notes
19 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Lec05 - Supervised
No ratings yet
Lec05 - Supervised
26 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Harjot 8 - 18
No ratings yet
Harjot 8 - 18
12 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DWBI4
No ratings yet
DWBI4
10 pages
Bubble Sorting Algorithm in Python
No ratings yet
Bubble Sorting Algorithm in Python
12 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Parsing - 4: - Using Ambiguous Grammars For Parsing - LALR (K) Parsing
No ratings yet
Parsing - 4: - Using Ambiguous Grammars For Parsing - LALR (K) Parsing
19 pages
AI & ML FDP
No ratings yet
AI & ML FDP
7 pages
2.BI LAb Tableau
No ratings yet
2.BI LAb Tableau
30 pages
Machine Learning
100% (6)
Machine Learning
115 pages
مشین سیکھنا
No ratings yet
مشین سیکھنا
5 pages
Dld-Mbu Mid 1 QP
100% (1)
Dld-Mbu Mid 1 QP
1 page
Machine Learning Section4 Ebook v03
No ratings yet
Machine Learning Section4 Ebook v03
20 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
7 pages
3 – Baselines – Machine Learning Blog _ ML@CMU _ Carnegie Mellon University
No ratings yet
3 – Baselines – Machine Learning Blog _ ML@CMU _ Carnegie Mellon University
9 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
5 Strategies to beat Stage Fright
No ratings yet
5 Strategies to beat Stage Fright
8 pages
7+ use-cases of generative AI in marketing
No ratings yet
7+ use-cases of generative AI in marketing
16 pages
Lecture pptParameterEstimation
No ratings yet
Lecture pptParameterEstimation
24 pages
It Reduce Manual Repetitive Work With IT Automation Executive Brief V3
No ratings yet
It Reduce Manual Repetitive Work With IT Automation Executive Brief V3
22 pages
Assignment Problem
No ratings yet
Assignment Problem
18 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
SPCC Practicalss
No ratings yet
SPCC Practicalss
6 pages
syllabus sowmi
No ratings yet
syllabus sowmi
2 pages
Go To: CS230: Lecture 5 Attacking Networks With Adversarial Examples - Generative Adversarial Networks
No ratings yet
Go To: CS230: Lecture 5 Attacking Networks With Adversarial Examples - Generative Adversarial Networks
30 pages
Nitya Samskara Vedic Devatha Puja Introduction
No ratings yet
Nitya Samskara Vedic Devatha Puja Introduction
18 pages
P, Q, R Q P, Q, R: Questions Mark S Co Po
No ratings yet
P, Q, R Q P, Q, R: Questions Mark S Co Po
2 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
1) Draw A Red-Black Tree For The Following Values Inserted in This Order. Illustrate Each Operation That Occurs: K W o S y T P R 10 Points
No ratings yet
1) Draw A Red-Black Tree For The Following Values Inserted in This Order. Illustrate Each Operation That Occurs: K W o S y T P R 10 Points
18 pages
Classification
No ratings yet
Classification
4 pages
Unit - I: Random Access Machine Model
No ratings yet
Unit - I: Random Access Machine Model
39 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
W2020-3140708-APY Material
No ratings yet
W2020-3140708-APY Material
2 pages
Employee Master
No ratings yet
Employee Master
5 pages
Problem Based Task 2
No ratings yet
Problem Based Task 2
7 pages
Grid Search For KNN
No ratings yet
Grid Search For KNN
17 pages
CSE 2202 Lab Manual
No ratings yet
CSE 2202 Lab Manual
17 pages
Triangular Numbers: Finding The NTH Term Using A Loop
No ratings yet
Triangular Numbers: Finding The NTH Term Using A Loop
5 pages
EE263: Introduction To Linear Dynamical Systems Review Session 1
No ratings yet
EE263: Introduction To Linear Dynamical Systems Review Session 1
20 pages
Algo Syllabus New
No ratings yet
Algo Syllabus New
2 pages
DUALITY Mod
No ratings yet
DUALITY Mod
9 pages
56 Assignments
No ratings yet
56 Assignments
12 pages
GenSpark Tracker For - AI Architect Curriculum
No ratings yet
GenSpark Tracker For - AI Architect Curriculum
4 pages
IF Function
No ratings yet
IF Function
2 pages
Grid Search For Random Forest
No ratings yet
Grid Search For Random Forest
12 pages
ST Annes June 01
No ratings yet
ST Annes June 01
13 pages
Csc349a f2023 Asn3
No ratings yet
Csc349a f2023 Asn3
4 pages
Soft Computing Question Paper
No ratings yet
Soft Computing Question Paper
3 pages
Grid Search For SVM
No ratings yet
Grid Search For SVM
9 pages
SNOBOL
No ratings yet
SNOBOL
15 pages
Predicate Logic
No ratings yet
Predicate Logic
48 pages
II Year - II Semester L T P C 4 0 0 3
No ratings yet
II Year - II Semester L T P C 4 0 0 3
3 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
Analyse GDP of Countries
No ratings yet
Analyse GDP of Countries
1 page
Analyse The Federal Aviation Authority Dataset Using Pandas
No ratings yet
Analyse The Federal Aviation Authority Dataset Using Pandas
1 page
Comcast Telecom Consumer Complaints
No ratings yet
Comcast Telecom Consumer Complaints
1 page
Retail Analysis With Walmart Data
No ratings yet
Retail Analysis With Walmart Data
2 pages
QB of AT ESE Even 22-23
100% (1)
QB of AT ESE Even 22-23
4 pages
Learn Random Forest Using Excel
No ratings yet
Learn Random Forest Using Excel
9 pages
Assessment 1 Solution
No ratings yet
Assessment 1 Solution
3 pages
Data Science With R Exam Questions: PG Program in Analytics
100% (2)
Data Science With R Exam Questions: PG Program in Analytics
4 pages
Building An NLP Chatbot For A Restaurant With Flask
No ratings yet
Building An NLP Chatbot For A Restaurant With Flask
30 pages
Flat Jan 2016
No ratings yet
Flat Jan 2016
2 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

7 Types of Classification Algorithms

Uploaded by

7 Types of Classification Algorithms

Uploaded by

7 Types of Classification Algorithms

The purpose of this research is to put together the 7 most common

1.1 Structured Data Classification

Classification can be performed on structured or unstructured data.

Few of the terminologies encountered in machine learning –

• Classifier: An algorithm that maps the input data to a specific

• Initialize the classifier to be used.

1.2 Dataset Source and Contents

The dataset contains salaries. The following is a description of our

• of Classes: 2 (‘>50K’ and ‘<=50K’)

1.3 Exploratory Data Analysis

2 Types of Classification Algorithms (Python)

2.1 Logistic Regression

Advantages: Logistic regression is designed for this purpose

Disadvantages: Works only when the predicted variable is binary,

2.2 Naïve Bayes

Definition: Naive Bayes algorithm based on Bayes’ theorem with the

Advantages: This algorithm requires a small amount of training data to

Disadvantages: Naive Bayes is is known to be a bad estimator.

2.3 Stochastic Gradient Descent

Advantages: Efficiency and ease of implementation.

Disadvantages: Requires a number of hyper-parameters and it is

2.4 K-Nearest Neighbours

Definition: Neighbours based classification is a type of lazy learning

Advantages: This algorithm is simple to implement, robust to noisy

Disadvantages: Need to determine the value of K and the

2.5 Decision Tree

Advantages: Decision Tree is simple to understand and visualise,

Disadvantages: Decision tree can create complex trees that do not

2.6 Random Forest

Definition: Random forest classifier is a meta-estimator that fits a

Advantages: Reduction in over-fitting and random forest classifier is

Disadvantages: Slow real time prediction, difficult to implement, and

Definition: Support vector machine is a representation of the training

Advantages: Effective in high dimensional spaces and uses a subset

Disadvantages: The algorithm does not directly provide probability

3.1 Comparison Matrix

• Accuracy: (True Positive + True Negative) / Total Population

Classification Algorithms Accuracy F1-Score

Logistic Regression 84.60% 0.6337

Naïve Bayes 80.11% 0.6005

Stochastic Gradient Descent 82.20% 0.5780

K-Nearest Neighbours 83.56% 0.5924

Decision Tree 84.23% 0.6308

Random Forest 84.33% 0.6275

Support Vector Machine 84.09% 0.6145

Code location: https://github.com/f2005636/Classification

3.2 Algorithm Selection

You might also like