0% found this document useful (0 votes)
6 views7 pages

Unit 3 PDF

The document discusses various classification algorithms in machine learning, including Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes, and K-Nearest Neighbors. It also covers performance measures for evaluating these algorithms, such as confusion matrices, accuracy, precision, recall, F1 score, and specificity. The classification algorithms are essential for categorizing data and automating decision-making processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Unit 3 PDF

The document discusses various classification algorithms in machine learning, including Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes, and K-Nearest Neighbors. It also covers performance measures for evaluating these algorithms, such as confusion matrices, accuracy, precision, recall, F1 score, and specificity. The classification algorithms are essential for categorizing data and automating decision-making processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

UNIT-III

Classifica on algorithm: - Logis c Regression, Decision Tree Classifica on, Neural Network, K-Nearest Neighbors (K-
NN), Support Vector Machine, Naive Bayes (Gaussian, Mul nomial, Bernoulli). Performance Measures: Confusion
Matrix, Classifica on Accuracy, Classifica on Report: Precisions, Recall, F1 score and Support.

What is Classifica on in Machine Learning?


Classifica on in machine learning is a type of supervised learning approach where the goal is to predict the category
or class of an instance that are based on its features. In classifica on it involves training model ona dataset that have
instances or observa ons that are already labeled with Classes and then using that model to classify new, and unseen
instances into one of the predefined categories.
List of Machine Learning Classifica on Algorithms
Classifica on algorithms organize and understand complex datasets in machine learning. These algorithms are
essen al for categorizing data into classes or labels, automa ng decision-making and pa ern iden fica on.
Classifica on algorithms are o en used to detect email spam by analyzing email content. These algorithms enable
machines to quickly recognize spam trends and make real- me judgments, improving email security.
Some of the top-ranked machine learning algorithms for Classifica on are:
1. Logis c Regression
2. Decision Tree
3. Random Forest
4. Support Vector Machine (SVM)
5. Naive Bayes
6. K-Nearest Neighbors (KNN)
Let us see about each of them one by one:
1. Logis c Regression Classifica on Algorithm in Machine Learning
In Logis c regression is classifica on algorithm used to es mate discrete values, typically binary, such as 0 and 1, yes
or no. It predicts the probability of an instance belonging to a class that makes it essec al for binary classifica on
problems like spam detec on or diagnosing disease.
Logis c func ons are ideal for classifica on problems since their output is between 0 and 1. Many fields employ it
because of its simplicity, interpretability, and efficiency. Logis c Regression works well when features and event
probability are linear. Logis c Regression used for binary classifica on tasks. Logis c regression is used for binary
categoriza on. Despite its name, it predicts class membership likelihood. A logis c func on models probability in this

linear model.

Logis c Regression (Graph)


Features of Logis c Regression
1. Binary Outcome: Logis c regression is used when the dependent variable is binary in nature, meaning it has
only two possible outcomes (e.g., yes/no, 0/1, true/false).
2. Probabilis c Results: It predicts the probability of the occurrence of an event by fi ng data to a logis c
func on. The output is a value between 0 and 1, which represents the probability that a given input belongs
to the '1' category.
3. Odds Ra o: It es mates the odds ra o in the presence of more than one explanatory variable. The odds ra o
can be used to understand the strength of the associa on between the independent variables and the
dependent binary variable.
4. Logit Func on: Logis c regression uses the logit func on (or logis c func on) to model the data. The logit
func on is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1.
2. Decision Tree
Decision Trees are versa le and simple classifica on and regression techniques. Recursively spli ng the dataset into
key-criteria subgroups provides a tree-like structure. Judgments at each node produce leaf nodes. Decision trees are
easy to understand and depict, making them useful for decision-making. Overfi ng may occur, therefore trimming
improves generality. A tree-like model of decisions and their consequences, including chance event outcomes,
resource costs and u lity.
The algorithm used for both classifica on and regression tasks. They model decisions and their possible results as
tree, with branches represen ng choices and leaves represen ng outcomes.

Decision Tree
Features of Decision Tree
1. Tree-Like Structure: Decision Trees have a flowchart-like structure, where each internal node represents a
"test" on an a ribute, each branch represents the outcome of the test, and each leaf node represents a class
label (decision taken a er compu ng all a ributes). The paths from root to leaf represent classifica on rules.
2. Simple to Understand and Interpret: One of the main advantages of Decision Trees is their simplicity and
ease of interpreta on. They can be visualized, which makes it easy to understand how decisions are made
and explain the reasoning behind predic ons.
3. Versa lity: Decision Trees can handle both numerical and categorical data and can be used for both
regression and classifica on tasks, making them versa le across different types of data and problems.
4. Feature Importance: Decision Trees inherently perform feature selec on, giving insights into the most
significant variables for making the predic ons. The top nodes in a tree are the most important features,
providing a straigh orward way to iden fy cri cal variables.
3. Random Forest
Random forest are an ensemble learning techniques that combines mul ple decision trees to improve predic ve
accuracy and control over-fi ng. By aggrega ng the predic ons of numerous trees, Random Forests enhance
the decision-making process, making them robust against noise and bias.
Random Forest uses numerous decision trees to increase predic on accuracy and reduce overfi ng. It constructs
many trees and integrates their predic ons to create a reliable model. Diversity is added by using a random dataset
and characteris cs in each tree. Random Forests excel at high-dimensional data, feature importance metrics, and
overfi ng resistance. Many fields use them for classifica on and regression.

Random Forest
Features of Random Forest
1. Ensemble Method: Random Forest uses the ensemble learning technique, where mul ple learners (decision
trees, in this case) are trained to solve the same problem and combined to get be er results. The ensemble
approach improves the model's accuracy and robustness.
2. Handling Both Types of Data: It can handle both categorical and con nuous input and output variables,
making it versa le for different types of data.
3. Reduc on in Overfi ng: By averaging mul ple trees, Random Forest reduces the risk of overfi ng, making
the model more generalizable than a single decision tree.
4. Handling Missing Values: Random Forest can handle missing values. When it encounters a missing value in a
variable, it can use the median for numerical variables or the mode for categorical variables of all samples
reaching the node where the missing value is encountered.
4.Support Vector Machine (SVM)
SVM is an effec ve classifica on and regression algorithm. It seeks the hyperplane that best classifies data while
increasing the margin. SVM works well in high-dimensional areas and handles nonlinear feature interac ons with its
kernel technique. It is powerful classifica on algorithm known for their accuracy in high-dimensional spaces
SVM is robust against overfi ng and generalizes well to different datasets. It finds applica ons in image recogni on,
text classifica on, and bioinforma cs, among other fields. Its use cases span image recogni on, text categoriza on,
and bioinforma cs, where precision is paramount.

Support Vector Machine


Feature of Support Vector Machine
1. Margin Maximiza on: SVM aims to find the hyperplane that separates different classes in the feature space
with the maximum margin. The margin is defined as the distance between the hyperplane and the nearest
data points from each class, known as support vectors. Maximizing this margin increases the model's
robustness and its ability to generalize well to unseen data.
2. Support Vectors: The algorithm is named a er these support vectors, which are the cri cal elements of the
training dataset. The posi on of the hyperplane is determined based on these support vectors, making SVMs
rela vely memory efficient since only the support vectors are needed to define the model.
3. Kernel Trick: One of the most powerful features of SVM is its use of kernels, which allows the algorithm to
operate in a higher-dimensional space without explicitly compu ng the coordinates of the data in that space.
This makes it possible to handle non-linearly separable data by applying linear separa on in this higher-
dimensional feature space.
4. Versa lity: Through the choice of the kernel func on (linear, polynomial, radial basis func on (RBF), sigmoid,
etc.), SVM can be adapted to solve a wide range of problems, including those with complex, non-linear
decision boundaries.
5.Naive Bayes
Text categoriza on and spam filtering benefit from Bayes theorem-based probabilis c classifica on algorithm Naive
Bayes. Despite its simplicity and "naive" assump on of feature independence, Naive Bayes o en works well in
prac ce. It uses condi onal probabili es of features to calculate the class likelihood of an instance. Naive Bayes
handles high-dimensional datasets quickly.
Naive Bayes which describes the probability of an event, based on prior knowledge of condi ons that might be
related to the event. Naive Bayes classifiers assume that the presence (or absence) of a par cular feature of a class is
unrelated to the presence (or absence) of any other feature, given the class variable
Features of Naive Bayes
1. Probabilis c Founda on: Naive Bayes classifiers apply Bayes' theorem to compute the probability that a
given instance belongs to a par cular class, making decisions based on the posterior probabili es.
2. Feature Independence: The algorithm assumes that the features used to predict the class are independent of
each other given the class. This assump on, although naive and o en violated in real-world data, simplifies
the computa on and is surprisingly effec ve in prac ce.
3. Efficiency: Naive Bayes classifiers are highly efficient, requiring a small amount of training data to es mate
the necessary parameters (probabili es) for classifica on.
4. Easy to Implement and Understand: The algorithm is straigh orward to implement and interpret, making it
accessible for beginners in machine learning. It provides a good star ng point for classifica on tasks.
6.K-Nearest Neighbors (KNN)
KNN uses the majority class of k-nearest neighbours for easy and adap ve classifica on and regression. Non-
parametric KNN has no data distribu on assump ons. It works best with uneven decision boundaries and performs
well for varied jobs. K-Nearest Neighbors (KNN) is an instance-based, or lazy learning algorithm, where the func on is
only approximated locally, and all computa on is deferred un l func on evalua on. It classifies new cases based on a
similarity measure (e.g., distance func ons). KNN is widely used in recommenda on systems, anomaly detec on,
and pa ern recogni on due to its simplicity and effec veness in handling non-linear data.

K-Nearest Algorithm
Fetures of K-Nearest Neighbors (KNN)
1. Instance-Based Learning: KNN is a type of instance-based or lazy learning algorithm, meaning it does not
explicitly learn a model. Instead, it memorizes the training dataset and uses it to make predic ons.
2. Simplicity: One of the main advantages of KNN is its simplicity. The algorithm is straigh orward to
understand and easy to implement, requiring no training phase in the tradi onal sense.
3. Non-Parametric: KNN is a non-parametric method, meaning it makes no underlying assump ons about the
distribu on of the data. This flexibility allows it to be used in a wide variety of situa ons, including those
where the data distribu on is unknown or non-standard.
4. Flexibility in Distance Choice: The algorithm's performance can be significantly influenced by the choice of
distance metric (e.g., Euclidean, Manha an, Minkowski). This flexibility allows for customiza on based on
the specific characteris cs of the data.

# Performance Measures: Confusion Matrix, Classifica on Accuracy, Classifica on Report:


Precisions, Recall, F1 score and Support.

Confusion Matrix
A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. It
is a means of displaying the number of accurate and inaccurate instances based on the model’s predic ons. It is o en
used to measure the performance of classifica on models, which aim to predict a categorical label for each input
instance.
The matrix displays the number of instances produced by the model on the test data.
 True Posi ve (TP): The model correctly predicted a posi ve outcome (the actual outcome was posi ve).
 True Nega ve (TN): The model correctly predicted a nega ve outcome (the actual outcome was nega ve).
 False Posi ve (FP): The model incorrectly predicted a posi ve outcome (the actual outcome was nega ve).
Also known as a Type I error.
 False Nega ve (FN): The model incorrectly predicted a nega ve outcome (the actual outcome was posi ve).
Also known as a Type II error.
Why do we need a Confusion Matrix?
When assessing a classifica on model’s performance, a confusion matrix is essen al. It offers a thorough analysis of
true posi ve, true nega ve, false posi ve, and false nega ve predic ons, facilita ng a more profound
comprehension of a model’s recall, accuracy, precision, and overall effec veness in class dis nc on. When there is
an uneven class distribu on in a dataset, this matrix is especially helpful in evalua ng a model’s performance beyond
basic accuracy metrics.
Metrics based on Confusion Matrix Data
1. Accuracy
Accuracy is used to measure the performance of the model. It is the ra o of Total correct instances to the total
instances.

For the above case:


Accuracy = (5+3)/(5+3+1+1) = 8/10 = 0.8

2. Precision
Precision is a measure of how accurate a model’s posi ve predic ons are. It is defined as the ra o of true posi ve
predic ons to the total number of posi ve predic ons made by the model.

For the above case:


Precision = 5/(5+1) =5/6 = 0.8333

3. Recall
Recall measures the effec veness of a classifica on model in iden fying all relevant instances from a dataset. It is the
ra o of the number of true posi ve (TP) instances to the sum of true posi ve and false nega ve (FN) instances.

For the above case:


Recall = 5/(5+1) =5/6 = 0.8333
Note: We use precision when we want to minimize false posi ves, crucial in scenarios like spam email detec on
where misclassifying a non-spam message as spam is costly. And we use recall when minimizing false nega ves is
essen al, as in medical diagnoses, where iden fying all actual posi ve cases is cri cal, even if it results in some false
posi ves.

4. F1-Score
F1-score is used to evaluate the overall performance of a classifica on model. It is the harmonic mean of precision
and recall,

For the above case:


F1-Score: = (2* 0.8333* 0.8333)/( 0.8333+ 0.8333) = 0.8333
We balance precision and recall with the F1-score when a trade-off between minimizing false posi ves and false
nega ves is necessary, such as in informa on retrieval systems.

5. Specificity
Specificity is another important metric in the evalua on of classifica on models, par cularly in binary classifica on. It
measures the ability of a model to correctly iden fy nega ve instances. Specificity is also known as the True Nega ve
Rate. Formula is given by:

For example,
Specificity=3/(1+3)=3/4=0.75

6. Type 1 and Type 2 error


1. Type 1 error
Type 1 error occurs when the model predicts a posi ve instance, but it is actually nega ve. Precision is affected by
false posi ves, as it is the ra o of true posi ves to the sum of true posi ves and false posi ves.

Type 1
For example, in a courtroom scenario, a Type 1 Error, o en referred to as a false posi ve, occurs when the court
mistakenly convicts an individual as guilty when, in truth, they are innocent of the alleged crime. This grave error can
have profound consequences, leading to the wrongful punishment of an innocent person who did not commit the
offense in ques on. Preven ng Type 1 Errors in legal proceedings is paramount to ensuring that jus ce is accurately
served and innocent individuals are protected from unwarranted harm and punishment.
2. Type 2 error
Type 2 error occurs when the model fails to predict a posi ve instance. Recall is directly affected by false nega ves,
as it is the ra o of true posi ves to the sum of true posi ves and false nega ves.
In the context of medical tes ng, a Type 2 Error, o en known as a false nega ve, occurs when a diagnos c test fails
to detect the presence of a disease in a pa ent who genuinely has it. The consequences of such an error are
significant, as it may result in a delayed diagnosis and subsequent treatment.

Type 2
Precision emphasizes minimizing false posi ves, while recall focuses on minimizing false nega ves.

You might also like