0% found this document useful (0 votes)
10 views

Classification Algorithms 3rd

machine learning 3rd year rgpv

Uploaded by

rajishere12345
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Classification Algorithms 3rd

machine learning 3rd year rgpv

Uploaded by

rajishere12345
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Classification Algorithms

1.Logistic Regression:
Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.

Key Concepts:

Logistic Function (Sigmoid Function): Logistic regression models the probability of class
membership using the sigmoid function, which is an S-shaped curve. The logistic function
maps any real-valued number (input to the model) to a value between 0 and 1. This output
can then be interpreted as the probability of the instance belonging to the positive class.

The sigmoid function is defined as:

Where:

z is the linear combination of input features.


e is the base of the natural logarithm.

The sigmoid function outputs values between 0 and 1, which can be interpreted as the
probability of the instance belonging to the positive class (e.g., class "1").
Purpose: Logistic regression is used for binary classification tasks, predicting one of two
outcomes. It's based on the logistic function (sigmoid), which outputs a probability value
between 0 and 1.

Mechanism: It estimates the probability that a given input point belongs to a certain class.
The model calculates a weighted sum of the input features, applies the sigmoid function, and
maps the result to a probability score. A threshold (usually 0.5) is applied to predict the class.

Mathematical Model:

Strengths: Simple, interpretable, works well with linearly separable data.

Assumptions for Logistic Regression:


The dependent variable must be categorical in nature.
The independent variable should not have multi-collinearity.

2.Decision Tree Classification Algorithm


Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems. It
is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.

Advantages of Decision Trees

Simplicity and Interpretability: Decision trees are easy to understand and


interpret. The visual representation closely mirrors human decision-making
processes.
Versatility: Can be used for both classification and regression tasks.
No Need for Feature Scaling: Decision trees do not require normalization or
scaling of the data.
Handles Non-linear Relationships: Capable of capturing non-linear
relationships between features and target variables.

Disadvantages of Decision Trees

Overfitting: Decision trees can easily overfit the training data, especially if they
are deep with many nodes.
Instability: Small variations in the data can result in a completely different tree
being generated.
Bias towards Features with More Levels: Features with more levels can
dominate the tree structure.

Applications of Decision Trees

Business Decision Making: Used in strategic planning and resource allocation.


Healthcare: Assists in diagnosing diseases and suggesting treatment plans.
Finance: Helps in credit scoring and risk assessment.
Marketing: Used to segment customers and predict customer behavior.
3.Neural Networks (Artificial Neural Networks, ANN):
Purpose: Neural networks are used for complex classification tasks, capable of handling non-
linear relationships.

Mechanism: A neural network consists of layers of neurons (input, hidden, output). Each
neuron applies a weighted sum to the inputs, passes it through an activation function, and
sends the result to the next layer.

Types: Feedforward neural networks, Convolutional Neural Networks (CNN), Recurrent Neural
Networks (RNN).

Strengths: Can capture complex patterns in data, suitable for large and unstructured
datasets (e.g., images, text).

Advantages:

1. Parallel Processing: ANN can perform multiple tasks simultaneously.


2. Distributed Data Storage: Data is stored across the network, preventing failure from
missing data.
3. Handling Incomplete Data: ANN can generate outputs even with incomplete information.
4. Memory Distribution: Adapts based on examples, even without seeing all data aspects.
5. Fault Tolerance: Can function even if some neurons fail.

Disadvantages:

1. Uncertain Structure: Optimal network design is found through trial and error.
2. Lack of Transparency: ANN doesn't explain how it arrives at decisions, reducing trust.
3. Hardware Dependency: Requires specialized hardware for parallel processing.
4. Data Conversion: Problems need to be converted to numerical data, impacting
performance.
5. Training Duration: Training time is uncertain, and optimal results aren’t always achieved.

4.K-Nearest Neighbors (K-NN)


K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:

Step-1: Select the number K of the neighbors


Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
Step-6: Our model is ready.

Advantages of KNN Algorithm:


It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


Always needs to determine the value of K which may be complex some time.
The computation cost is high because of calculating the distance between the data
points for all the training samples.

5.Support Vector Machine Algorithm


Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which
is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider
the below diagram in which there are two different categories that are classified using a decision
boundary or hyperplane:
SVM can be of two types:

Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if
a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM classifier.

Purpose: SVM is a powerful classifier that finds the hyperplane that best separates the
classes in a high-dimensional space.

Mechanism: It tries to find a hyperplane that maximizes the margin (distance between the
hyperplane and the closest data points, known as support vectors). It can be extended to
non-linear decision boundaries by using kernel functions (e.g., polynomial, radial basis
function).

Strengths: Effective in high-dimensional spaces, robust to overfitting, works well for both
linear and non-linear data.

6.Naïve Bayes Classifier Algorithm


Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Performance Measures:
Confusion Matrix
A Confusion Matrix is used to evaluate the performance of a classification model, comparing
predicted values against actual values. It is particularly useful for understanding the types of
errors made by the model.

Key Terminologies:

True Negative (TN): Correctly predicted "No".


True Positive (TP): Correctly predicted "Yes".
False Negative (FN): Incorrectly predicted "No" when the actual value was "Yes" (Type II
error).
False Positive (FP): Incorrectly predicted "Yes" when the actual value was "No" (Type I
error).

Matrix Example for a Binary Classification:


Predicted No Predicted Yes

Actual No True Negative (TN) False Positive (FP)

Actual Yes False Negative (FN) True Positive (TP)

Importance:

Evaluates performance: It shows how well the classifier predicts both positive and
negative classes.
Helps calculate performance metrics: Accuracy, precision, recall, F1 score, etc.

Key Metrics:

Additional Concepts:

Null Error Rate: Error rate if the model always predicts the majority class.
ROC Curve: Graph showing performance across all classification thresholds, plotting the
true positive rate (Recall) vs. the false positive rate (1 - Specificity)

Classification Accuracy:

Definition: Accuracy is the proportion of correctly classified instances (both positive and
negative) to the total instances.
Formula:

Limitations: Accuracy can be misleading when dealing with imbalanced datasets, where
one class is much more frequent than the other.

Classification Report:
This report provides a summary of the precision, recall, F1 score, and support for each class
in the dataset.

Precision:

Definition: Precision is the proportion of correctly predicted positive instances out of all
instances predicted as positive.
Formula:

Significance: Precision is important when the cost of false positives is high (e.g., email
spam detection).

Recall (Sensitivity, True Positive Rate):

Definition: Recall is the proportion of correctly predicted positive instances out of all
actual positive instances.
Formula:

Significance: Recall is important when the cost of false negatives is high (e.g., in medical
diagnoses where missing a positive case is critical).

F1 Score:
Definition: F1 score is the harmonic mean of precision and recall. It balances the two
metrics, giving a single score that combines both.
Formula:

Significance: F1 score is useful when we want a balance between precision and recall,
especially when dealing with imbalanced classes.

Support:

Definition: Support refers to the number of actual occurrences of the class in the
dataset. It helps in interpreting the results, especially when classes have unequal
distribution.

Questions
1. Write a note on Classification Accuracy
Classification Accuracy is a metric used to evaluate the performance of a classification
model in machine learning. It measures how often the model makes correct predictions by
comparing the predicted labels with the actual labels in the dataset.

Formula for Classification Accuracy:

Where:

Correct Predictions are the instances where the predicted label matches the actual label.
Total Predictions is the total number of instances in the dataset.

Example:

If a model makes 90 correct predictions out of 100 total predictions, its accuracy is:
Interpretation:

High Accuracy: Indicates the model is making a high proportion of correct predictions.
Low Accuracy: Suggests the model is performing poorly.

Limitations:

While accuracy is simple and easy to understand, it might not always provide a complete
picture of a model's performance, especially in cases of imbalanced classes (when one class
significantly outnumbers another). In such cases, other metrics like precision, recall, F1-score,
or the confusion matrix may be more informative.

2.How Logistic Regression Models Probability:


1. Linear Combination: Logistic regression computes a linear combination of the input
features (with weights) to produce a value z.
2. Sigmoid Function: This value z is then passed through the sigmoid function, which maps
it to a probability between 0 and 1.
3. Probability Output: The sigmoid output is interpreted as the probability of class
membership (probability of being in class "1").
4. Classification: A threshold (usually 0.5) is applied to convert the probability into a binary
class prediction.
5. Training: The model parameters (weights) are learned through optimization, typically by
maximizing the likelihood of observing the data using the log-loss function.

This probabilistic interpretation allows logistic regression to provide more nuanced


predictions and insights than simply classifying an instance as "0" or "1".

3.Describe decision tree classification and how it constructs a tree-


based model for classification tasks

Decision tree classification: above


The process of creating a decision tree involves:

1. Selecting the Best Attribute: Using a metric like Gini impurity, entropy, or
information gain, the best attribute to split the data is selected.
2. Splitting the Dataset: The dataset is split into subsets based on the selected
attribute.
3. Repeating the Process: The process is repeated recursively for each subset,
creating a new internal node or leaf node until a stopping criterion is met (e.g.,
all instances in a node belong to the same class or a predefined depth is
reached).
4.Support Vector Machine (SVM) for Linearly Separable Data

For linearly separable data, SVM aims to find the optimal hyperplane that separates the data
points into different classes with the maximum margin.

Key Concepts

1. Hyperplane
A hyperplane is a decision boundary that separates data points into distinct classes. For
2D data, it's a line; for 3D data, it's a plane.
2. Margin
The margin is the distance between the hyperplane and the closest data points from
each class. SVM maximizes this margin to ensure robust classification.
3. Support Vectors
Support vectors are the data points closest to the hyperplane. These points define the
margin and are crucial for determining the optimal hyperplane.
Steps in SVM for Linearly Separable Data

1. Preprocessing
Scale the features to ensure equal importance for all dimensions.
2. Define the Hyperplane
Use optimization techniques (like quadratic programming) to find the optimal www and b.
3. Classify
Use the hyperplane to classify new data points

Predicted class=sign(w.x+b)

5.Difference Between Regression and Classification


Aspect Regression Classification

Objective Predict a continuous Predict a discrete label or


numerical value. category.

Output Type Real numbers (e.g., 3.5, Categorical values (e.g.,


100, -2.3). "Yes" or "No").

Example Predicting house prices or Identifying whether an


temperature. email is spam or not.

Model Output A numerical prediction A probability or class label.


function (e.g., y=mx+b).

Evaluation Metrics Mean Squared Error (MSE), Accuracy, Precision, Recall,


Mean Absolute Error (MAE), F1-Score, AUC.
R2R^2R2 score.

Algorithms Linear Regression, Ridge, Logistic Regression, SVM


Lasso, etc. (classification), Decision
Trees.

Real-world Example Predicting a person's Predicting whether a


income based on education customer will buy a
and experience. product.

You might also like