Unit 3 PDF

The document discusses various classification algorithms in machine learning, including Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes, and K-Nearest Neighbors. It also covers performance measures for evaluating these algorithms, such as confusion matrices, accuracy, precision, recall, F1 score, and specificity. The classification algorithms are essential for categorizing data and automating decision-making processes.

Uploaded by

sauravpatel220930

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

Unit 3 PDF

Uploaded by

sauravpatel220930

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

UNIT-III

Classifica on algorithm: - Logis c Regression, Decision Tree Classifica on, Neural Network, K-Nearest Neighbors (K-
NN), Support Vector Machine, Naive Bayes (Gaussian, Mul nomial, Bernoulli). Performance Measures: Confusion
Matrix, Classifica on Accuracy, Classifica on Report: Precisions, Recall, F1 score and Support.

What is Classiﬁca on in Machine Learning?

Classifica on in machine learning is a type of supervised learning approach where the goal is to predict the category
or class of an instance that are based on its features. In classifica on it involves training model ona dataset that have
instances or observa ons that are already labeled with Classes and then using that model to classify new, and unseen
instances into one of the predefined categories.
List of Machine Learning Classifica on Algorithms
Classifica on algorithms organize and understand complex datasets in machine learning. These algorithms are
essen al for categorizing data into classes or labels, automa ng decision-making and pa ern iden fica on.
Classifica on algorithms are o en used to detect email spam by analyzing email content. These algorithms enable
machines to quickly recognize spam trends and make real- me judgments, improving email security.
Some of the top-ranked machine learning algorithms for Classifica on are:
1. Logis c Regression
2. Decision Tree
3. Random Forest
4. Support Vector Machine (SVM)
5. Naive Bayes
6. K-Nearest Neighbors (KNN)
Let us see about each of them one by one:
1. Logis c Regression Classifica on Algorithm in Machine Learning
In Logis c regression is classifica on algorithm used to es mate discrete values, typically binary, such as 0 and 1, yes
or no. It predicts the probability of an instance belonging to a class that makes it essec al for binary classifica on
problems like spam detec on or diagnosing disease.
Logis c func ons are ideal for classifica on problems since their output is between 0 and 1. Many fields employ it
because of its simplicity, interpretability, and efficiency. Logis c Regression works well when features and event
probability are linear. Logis c Regression used for binary classifica on tasks. Logis c regression is used for binary
categoriza on. Despite its name, it predicts class membership likelihood. A logis c func on models probability in this

linear model.

Logis c Regression (Graph)

Features of Logis c Regression
1. Binary Outcome: Logis c regression is used when the dependent variable is binary in nature, meaning it has
only two possible outcomes (e.g., yes/no, 0/1, true/false).
2. Probabilis c Results: It predicts the probability of the occurrence of an event by fi ng data to a logis c
func on. The output is a value between 0 and 1, which represents the probability that a given input belongs
to the '1' category.
3. Odds Ra o: It es mates the odds ra o in the presence of more than one explanatory variable. The odds ra o
can be used to understand the strength of the associa on between the independent variables and the
dependent binary variable.
4. Logit Func on: Logis c regression uses the logit func on (or logis c func on) to model the data. The logit
func on is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1.
2. Decision Tree
Decision Trees are versa le and simple classifica on and regression techniques. Recursively spli ng the dataset into
key-criteria subgroups provides a tree-like structure. Judgments at each node produce leaf nodes. Decision trees are
easy to understand and depict, making them useful for decision-making. Overfi ng may occur, therefore trimming
improves generality. A tree-like model of decisions and their consequences, including chance event outcomes,
resource costs and u lity.
The algorithm used for both classifica on and regression tasks. They model decisions and their possible results as
tree, with branches represen ng choices and leaves represen ng outcomes.

Decision Tree
Features of Decision Tree
1. Tree-Like Structure: Decision Trees have a flowchart-like structure, where each internal node represents a
"test" on an a ribute, each branch represents the outcome of the test, and each leaf node represents a class
label (decision taken a er compu ng all a ributes). The paths from root to leaf represent classifica on rules.
2. Simple to Understand and Interpret: One of the main advantages of Decision Trees is their simplicity and
ease of interpreta on. They can be visualized, which makes it easy to understand how decisions are made
and explain the reasoning behind predic ons.
3. Versa lity: Decision Trees can handle both numerical and categorical data and can be used for both
regression and classifica on tasks, making them versa le across different types of data and problems.
4. Feature Importance: Decision Trees inherently perform feature selec on, giving insights into the most
significant variables for making the predic ons. The top nodes in a tree are the most important features,
providing a straigh orward way to iden fy cri cal variables.
3. Random Forest
Random forest are an ensemble learning techniques that combines mul ple decision trees to improve predic ve
accuracy and control over-fi ng. By aggrega ng the predic ons of numerous trees, Random Forests enhance
the decision-making process, making them robust against noise and bias.
Random Forest uses numerous decision trees to increase predic on accuracy and reduce overfi ng. It constructs
many trees and integrates their predic ons to create a reliable model. Diversity is added by using a random dataset
and characteris cs in each tree. Random Forests excel at high-dimensional data, feature importance metrics, and
overfi ng resistance. Many fields use them for classifica on and regression.

Random Forest
Features of Random Forest
1. Ensemble Method: Random Forest uses the ensemble learning technique, where mul ple learners (decision
trees, in this case) are trained to solve the same problem and combined to get be er results. The ensemble
approach improves the model's accuracy and robustness.
2. Handling Both Types of Data: It can handle both categorical and con nuous input and output variables,
making it versa le for different types of data.
3. Reduc on in Overfi ng: By averaging mul ple trees, Random Forest reduces the risk of overfi ng, making
the model more generalizable than a single decision tree.
4. Handling Missing Values: Random Forest can handle missing values. When it encounters a missing value in a
variable, it can use the median for numerical variables or the mode for categorical variables of all samples
reaching the node where the missing value is encountered.
4.Support Vector Machine (SVM)
SVM is an effec ve classifica on and regression algorithm. It seeks the hyperplane that best classifies data while
increasing the margin. SVM works well in high-dimensional areas and handles nonlinear feature interac ons with its
kernel technique. It is powerful classifica on algorithm known for their accuracy in high-dimensional spaces
SVM is robust against overfi ng and generalizes well to different datasets. It finds applica ons in image recogni on,
text classifica on, and bioinforma cs, among other fields. Its use cases span image recogni on, text categoriza on,
and bioinforma cs, where precision is paramount.

Support Vector Machine

Feature of Support Vector Machine
1. Margin Maximiza on: SVM aims to find the hyperplane that separates different classes in the feature space
with the maximum margin. The margin is defined as the distance between the hyperplane and the nearest
data points from each class, known as support vectors. Maximizing this margin increases the model's
robustness and its ability to generalize well to unseen data.
2. Support Vectors: The algorithm is named a er these support vectors, which are the cri cal elements of the
training dataset. The posi on of the hyperplane is determined based on these support vectors, making SVMs
rela vely memory efficient since only the support vectors are needed to define the model.
3. Kernel Trick: One of the most powerful features of SVM is its use of kernels, which allows the algorithm to
operate in a higher-dimensional space without explicitly compu ng the coordinates of the data in that space.
This makes it possible to handle non-linearly separable data by applying linear separa on in this higher-
dimensional feature space.
4. Versa lity: Through the choice of the kernel func on (linear, polynomial, radial basis func on (RBF), sigmoid,
etc.), SVM can be adapted to solve a wide range of problems, including those with complex, non-linear
decision boundaries.
5.Naive Bayes
Text categoriza on and spam filtering benefit from Bayes theorem-based probabilis c classifica on algorithm Naive
Bayes. Despite its simplicity and "naive" assump on of feature independence, Naive Bayes o en works well in
prac ce. It uses condi onal probabili es of features to calculate the class likelihood of an instance. Naive Bayes
handles high-dimensional datasets quickly.
Naive Bayes which describes the probability of an event, based on prior knowledge of condi ons that might be
related to the event. Naive Bayes classifiers assume that the presence (or absence) of a par cular feature of a class is
unrelated to the presence (or absence) of any other feature, given the class variable
Features of Naive Bayes
1. Probabilis c Founda on: Naive Bayes classifiers apply Bayes' theorem to compute the probability that a
given instance belongs to a par cular class, making decisions based on the posterior probabili es.
2. Feature Independence: The algorithm assumes that the features used to predict the class are independent of
each other given the class. This assump on, although naive and o en violated in real-world data, simplifies
the computa on and is surprisingly effec ve in prac ce.
3. Efficiency: Naive Bayes classifiers are highly efficient, requiring a small amount of training data to es mate
the necessary parameters (probabili es) for classifica on.
4. Easy to Implement and Understand: The algorithm is straigh orward to implement and interpret, making it
accessible for beginners in machine learning. It provides a good star ng point for classifica on tasks.
6.K-Nearest Neighbors (KNN)
KNN uses the majority class of k-nearest neighbours for easy and adap ve classifica on and regression. Non-
parametric KNN has no data distribu on assump ons. It works best with uneven decision boundaries and performs
well for varied jobs. K-Nearest Neighbors (KNN) is an instance-based, or lazy learning algorithm, where the func on is
only approximated locally, and all computa on is deferred un l func on evalua on. It classifies new cases based on a
similarity measure (e.g., distance func ons). KNN is widely used in recommenda on systems, anomaly detec on,
and pa ern recogni on due to its simplicity and effec veness in handling non-linear data.

K-Nearest Algorithm
Fetures of K-Nearest Neighbors (KNN)
1. Instance-Based Learning: KNN is a type of instance-based or lazy learning algorithm, meaning it does not
explicitly learn a model. Instead, it memorizes the training dataset and uses it to make predic ons.
2. Simplicity: One of the main advantages of KNN is its simplicity. The algorithm is straigh orward to
understand and easy to implement, requiring no training phase in the tradi onal sense.
3. Non-Parametric: KNN is a non-parametric method, meaning it makes no underlying assump ons about the
distribu on of the data. This flexibility allows it to be used in a wide variety of situa ons, including those
where the data distribu on is unknown or non-standard.
4. Flexibility in Distance Choice: The algorithm's performance can be significantly influenced by the choice of
distance metric (e.g., Euclidean, Manha an, Minkowski). This flexibility allows for customiza on based on
the specific characteris cs of the data.

# Performance Measures: Confusion Matrix, Classiﬁca on Accuracy, Classiﬁca on Report:

Precisions, Recall, F1 score and Support.

Confusion Matrix
A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. It
is a means of displaying the number of accurate and inaccurate instances based on the model’s predic ons. It is o en
used to measure the performance of classifica on models, which aim to predict a categorical label for each input
instance.
The matrix displays the number of instances produced by the model on the test data.
 True Posi ve (TP): The model correctly predicted a posi ve outcome (the actual outcome was posi ve).
 True Nega ve (TN): The model correctly predicted a nega ve outcome (the actual outcome was nega ve).
 False Posi ve (FP): The model incorrectly predicted a posi ve outcome (the actual outcome was nega ve).
Also known as a Type I error.
 False Nega ve (FN): The model incorrectly predicted a nega ve outcome (the actual outcome was posi ve).
Also known as a Type II error.
Why do we need a Confusion Matrix?
When assessing a classifica on model’s performance, a confusion matrix is essen al. It offers a thorough analysis of
true posi ve, true nega ve, false posi ve, and false nega ve predic ons, facilita ng a more profound
comprehension of a model’s recall, accuracy, precision, and overall effec veness in class dis nc on. When there is
an uneven class distribu on in a dataset, this matrix is especially helpful in evalua ng a model’s performance beyond
basic accuracy metrics.
Metrics based on Confusion Matrix Data
1. Accuracy
Accuracy is used to measure the performance of the model. It is the ra o of Total correct instances to the total
instances.

For the above case:

Accuracy = (5+3)/(5+3+1+1) = 8/10 = 0.8

2. Precision
Precision is a measure of how accurate a model’s posi ve predic ons are. It is deﬁned as the ra o of true posi ve
predic ons to the total number of posi ve predic ons made by the model.

For the above case:

Precision = 5/(5+1) =5/6 = 0.8333

3. Recall
Recall measures the eﬀec veness of a classiﬁca on model in iden fying all relevant instances from a dataset. It is the
ra o of the number of true posi ve (TP) instances to the sum of true posi ve and false nega ve (FN) instances.

For the above case:

Recall = 5/(5+1) =5/6 = 0.8333
Note: We use precision when we want to minimize false posi ves, crucial in scenarios like spam email detec on
where misclassifying a non-spam message as spam is costly. And we use recall when minimizing false nega ves is
essen al, as in medical diagnoses, where iden fying all actual posi ve cases is cri cal, even if it results in some false
posi ves.

4. F1-Score
F1-score is used to evaluate the overall performance of a classiﬁca on model. It is the harmonic mean of precision
and recall,

For the above case:

F1-Score: = (2* 0.8333* 0.8333)/( 0.8333+ 0.8333) = 0.8333
We balance precision and recall with the F1-score when a trade-oﬀ between minimizing false posi ves and false
nega ves is necessary, such as in informa on retrieval systems.

5. Specificity
Specificity is another important metric in the evalua on of classifica on models, par cularly in binary classifica on. It
measures the ability of a model to correctly iden fy nega ve instances. Specificity is also known as the True Nega ve
Rate. Formula is given by:

For example,
Speciﬁcity=3/(1+3)=3/4=0.75

6. Type 1 and Type 2 error

1. Type 1 error
Type 1 error occurs when the model predicts a posi ve instance, but it is actually nega ve. Precision is aﬀected by
false posi ves, as it is the ra o of true posi ves to the sum of true posi ves and false posi ves.

Type 1
For example, in a courtroom scenario, a Type 1 Error, o en referred to as a false posi ve, occurs when the court
mistakenly convicts an individual as guilty when, in truth, they are innocent of the alleged crime. This grave error can
have profound consequences, leading to the wrongful punishment of an innocent person who did not commit the
offense in ques on. Preven ng Type 1 Errors in legal proceedings is paramount to ensuring that jus ce is accurately
served and innocent individuals are protected from unwarranted harm and punishment.
2. Type 2 error
Type 2 error occurs when the model fails to predict a posi ve instance. Recall is directly affected by false nega ves,
as it is the ra o of true posi ves to the sum of true posi ves and false nega ves.
In the context of medical tes ng, a Type 2 Error, o en known as a false nega ve, occurs when a diagnos c test fails
to detect the presence of a disease in a pa ent who genuinely has it. The consequences of such an error are
significant, as it may result in a delayed diagnosis and subsequent treatment.

Type 2
Precision emphasizes minimizing false posi ves, while recall focuses on minimizing false nega ves.

CCTV O&M Manual
No ratings yet
CCTV O&M Manual
64 pages
Nissan Ac Mr18de MTC
No ratings yet
Nissan Ac Mr18de MTC
90 pages
Naruto - The Wind Calamity by DevilHex-8lrgvp44
No ratings yet
Naruto - The Wind Calamity by DevilHex-8lrgvp44
2,734 pages
Cobra 200gtl DX Service Info
No ratings yet
Cobra 200gtl DX Service Info
8 pages
AC Induction Motor MC Webinar
No ratings yet
AC Induction Motor MC Webinar
61 pages
Free Whale Intarsia Pattern by JGR PDF
No ratings yet
Free Whale Intarsia Pattern by JGR PDF
11 pages
EM5305
No ratings yet
EM5305
12 pages
FEM Analysis of CORNERING CHARACTERISTICS OF ROTATING TIRES PDF
No ratings yet
FEM Analysis of CORNERING CHARACTERISTICS OF ROTATING TIRES PDF
178 pages
Hotel Project Report
No ratings yet
Hotel Project Report
97 pages
Basic Knowledge of Periodic Maintenance
No ratings yet
Basic Knowledge of Periodic Maintenance
58 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
May Edition
No ratings yet
May Edition
2 pages
ML Notes
No ratings yet
ML Notes
50 pages
Harga Atk-2
No ratings yet
Harga Atk-2
42 pages
General Physics 2 Take Home Long Exam #2 With Answer
No ratings yet
General Physics 2 Take Home Long Exam #2 With Answer
4 pages
The Effect of Delay Time On Fragmentation Distribution Through Small and Medium Scale Testing and Analysis
No ratings yet
The Effect of Delay Time On Fragmentation Distribution Through Small and Medium Scale Testing and Analysis
25 pages
Class 10 Science Chapter 13 Magnetic Effects of Electric Current Revision Notes
No ratings yet
Class 10 Science Chapter 13 Magnetic Effects of Electric Current Revision Notes
28 pages
8th Grade Verses For Graduation 2016-2017
No ratings yet
8th Grade Verses For Graduation 2016-2017
3 pages
ML Unit II - Final
No ratings yet
ML Unit II - Final
138 pages
Method of Load Ow Solution of Radial Distribution Network
No ratings yet
Method of Load Ow Solution of Radial Distribution Network
9 pages
S1 - Flex in LTE
No ratings yet
S1 - Flex in LTE
6 pages
DSD Dica Lesson Plan-C
No ratings yet
DSD Dica Lesson Plan-C
3 pages
Service Bulletin Trucks: Turbocharger Air Intake Pipe Replacement
No ratings yet
Service Bulletin Trucks: Turbocharger Air Intake Pipe Replacement
3 pages
Narative Text
No ratings yet
Narative Text
2 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
Bài Mẫu XP
No ratings yet
Bài Mẫu XP
30 pages
What Is An SVM
No ratings yet
What Is An SVM
24 pages
Aam Unit 2 QB With Answer
No ratings yet
Aam Unit 2 QB With Answer
16 pages
Comparative Study
No ratings yet
Comparative Study
17 pages
Irtm Edit
No ratings yet
Irtm Edit
3 pages
Unit 3
No ratings yet
Unit 3
12 pages
Ai 14
No ratings yet
Ai 14
11 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Electronic Device Lab Report 1
No ratings yet
Electronic Device Lab Report 1
15 pages
AAM Unit 2
No ratings yet
AAM Unit 2
17 pages
Decision Tree Random Forrest Naive Bayes 02
No ratings yet
Decision Tree Random Forrest Naive Bayes 02
13 pages
Risen - 430-450W RSM-130-8
No ratings yet
Risen - 430-450W RSM-130-8
2 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Classification
No ratings yet
Classification
10 pages
Assessing A Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing A Single Classification Algorithm and Two Classification Algorithms
12 pages
DWM Exp3 63
No ratings yet
DWM Exp3 63
7 pages
Module 5
No ratings yet
Module 5
6 pages
Chapter 2: Artificial Intelligence (Deep Learning and Machine Learning)
No ratings yet
Chapter 2: Artificial Intelligence (Deep Learning and Machine Learning)
9 pages
Master Wordlist: Triumph Educational Consultants 1-10-240/A, Lane Besides GTB, Ashok Nagar, Hyd 20 6788223/6788224
No ratings yet
Master Wordlist: Triumph Educational Consultants 1-10-240/A, Lane Besides GTB, Ashok Nagar, Hyd 20 6788223/6788224
1 page
Radioactivity 2022
No ratings yet
Radioactivity 2022
14 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
12 Avoto - Terminado (ACTUALIZADO)
No ratings yet
12 Avoto - Terminado (ACTUALIZADO)
12 pages
Machine Learning Techniques - Overview of Decision Trees, Logistic Regression, SVM, and K-NN
No ratings yet
Machine Learning Techniques - Overview of Decision Trees, Logistic Regression, SVM, and K-NN
1 page
Machine Learning
100% (6)
Machine Learning
115 pages
Uni Hockey Learning Objectives Lesson 1-6
No ratings yet
Uni Hockey Learning Objectives Lesson 1-6
6 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
ML Important
No ratings yet
ML Important
11 pages
ML Unit 3
No ratings yet
ML Unit 3
13 pages
Unit 5
No ratings yet
Unit 5
25 pages
Postgraduate PG - Mcom - Semester 4 - 2022 - November - Industrial Economic Environment 2019 Pattern
No ratings yet
Postgraduate PG - Mcom - Semester 4 - 2022 - November - Industrial Economic Environment 2019 Pattern
4 pages
ML Models
No ratings yet
ML Models
21 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Unit 3
No ratings yet
Unit 3
20 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Machine Learning For Interviews
No ratings yet
Machine Learning For Interviews
12 pages
Ightham Motes Circular Walk To Wilmot Hill Walking
No ratings yet
Ightham Motes Circular Walk To Wilmot Hill Walking
2 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Slide 3
No ratings yet
Slide 3
23 pages
Classification
No ratings yet
Classification
7 pages
PID5108657
No ratings yet
PID5108657
8 pages
Big Data Lesson 5 Lucrezia Noli
No ratings yet
Big Data Lesson 5 Lucrezia Noli
30 pages
Primer On Major Data Mining Algorithms
No ratings yet
Primer On Major Data Mining Algorithms
86 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
ml2 PDF
No ratings yet
ml2 PDF
5 pages
Mapping The Sustainable Development Goals Relationships: Sustainability
No ratings yet
Mapping The Sustainable Development Goals Relationships: Sustainability
15 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet