Classification Algorithms 3rd
Classification Algorithms 3rd
1.Logistic Regression:
Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
Key Concepts:
Logistic Function (Sigmoid Function): Logistic regression models the probability of class
membership using the sigmoid function, which is an S-shaped curve. The logistic function
maps any real-valued number (input to the model) to a value between 0 and 1. This output
can then be interpreted as the probability of the instance belonging to the positive class.
Where:
The sigmoid function outputs values between 0 and 1, which can be interpreted as the
probability of the instance belonging to the positive class (e.g., class "1").
Purpose: Logistic regression is used for binary classification tasks, predicting one of two
outcomes. It's based on the logistic function (sigmoid), which outputs a probability value
between 0 and 1.
Mechanism: It estimates the probability that a given input point belongs to a certain class.
The model calculates a weighted sum of the input features, applies the sigmoid function, and
maps the result to a probability score. A threshold (usually 0.5) is applied to predict the class.
Mathematical Model:
Overfitting: Decision trees can easily overfit the training data, especially if they
are deep with many nodes.
Instability: Small variations in the data can result in a completely different tree
being generated.
Bias towards Features with More Levels: Features with more levels can
dominate the tree structure.
Mechanism: A neural network consists of layers of neurons (input, hidden, output). Each
neuron applies a weighted sum to the inputs, passes it through an activation function, and
sends the result to the next layer.
Types: Feedforward neural networks, Convolutional Neural Networks (CNN), Recurrent Neural
Networks (RNN).
Strengths: Can capture complex patterns in data, suitable for large and unstructured
datasets (e.g., images, text).
Advantages:
Disadvantages:
1. Uncertain Structure: Optimal network design is found through trial and error.
2. Lack of Transparency: ANN doesn't explain how it arrives at decisions, reducing trust.
3. Hardware Dependency: Requires specialized hardware for parallel processing.
4. Data Conversion: Problems need to be converted to numerical data, impacting
performance.
5. Training Duration: Training time is uncertain, and optimal results aren’t always achieved.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider
the below diagram in which there are two different categories that are classified using a decision
boundary or hyperplane:
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if
a dataset cannot be classified by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM classifier.
Purpose: SVM is a powerful classifier that finds the hyperplane that best separates the
classes in a high-dimensional space.
Mechanism: It tries to find a hyperplane that maximizes the margin (distance between the
hyperplane and the closest data points, known as support vectors). It can be extended to
non-linear decision boundaries by using kernel functions (e.g., polynomial, radial basis
function).
Strengths: Effective in high-dimensional spaces, robust to overfitting, works well for both
linear and non-linear data.
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
The formula for Bayes' theorem is given as:
Where,
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
Performance Measures:
Confusion Matrix
A Confusion Matrix is used to evaluate the performance of a classification model, comparing
predicted values against actual values. It is particularly useful for understanding the types of
errors made by the model.
Key Terminologies:
Importance:
Evaluates performance: It shows how well the classifier predicts both positive and
negative classes.
Helps calculate performance metrics: Accuracy, precision, recall, F1 score, etc.
Key Metrics:
Additional Concepts:
Null Error Rate: Error rate if the model always predicts the majority class.
ROC Curve: Graph showing performance across all classification thresholds, plotting the
true positive rate (Recall) vs. the false positive rate (1 - Specificity)
Classification Accuracy:
Definition: Accuracy is the proportion of correctly classified instances (both positive and
negative) to the total instances.
Formula:
Limitations: Accuracy can be misleading when dealing with imbalanced datasets, where
one class is much more frequent than the other.
Classification Report:
This report provides a summary of the precision, recall, F1 score, and support for each class
in the dataset.
Precision:
Definition: Precision is the proportion of correctly predicted positive instances out of all
instances predicted as positive.
Formula:
Significance: Precision is important when the cost of false positives is high (e.g., email
spam detection).
Definition: Recall is the proportion of correctly predicted positive instances out of all
actual positive instances.
Formula:
Significance: Recall is important when the cost of false negatives is high (e.g., in medical
diagnoses where missing a positive case is critical).
F1 Score:
Definition: F1 score is the harmonic mean of precision and recall. It balances the two
metrics, giving a single score that combines both.
Formula:
Significance: F1 score is useful when we want a balance between precision and recall,
especially when dealing with imbalanced classes.
Support:
Definition: Support refers to the number of actual occurrences of the class in the
dataset. It helps in interpreting the results, especially when classes have unequal
distribution.
Questions
1. Write a note on Classification Accuracy
Classification Accuracy is a metric used to evaluate the performance of a classification
model in machine learning. It measures how often the model makes correct predictions by
comparing the predicted labels with the actual labels in the dataset.
Where:
Correct Predictions are the instances where the predicted label matches the actual label.
Total Predictions is the total number of instances in the dataset.
Example:
If a model makes 90 correct predictions out of 100 total predictions, its accuracy is:
Interpretation:
High Accuracy: Indicates the model is making a high proportion of correct predictions.
Low Accuracy: Suggests the model is performing poorly.
Limitations:
While accuracy is simple and easy to understand, it might not always provide a complete
picture of a model's performance, especially in cases of imbalanced classes (when one class
significantly outnumbers another). In such cases, other metrics like precision, recall, F1-score,
or the confusion matrix may be more informative.
1. Selecting the Best Attribute: Using a metric like Gini impurity, entropy, or
information gain, the best attribute to split the data is selected.
2. Splitting the Dataset: The dataset is split into subsets based on the selected
attribute.
3. Repeating the Process: The process is repeated recursively for each subset,
creating a new internal node or leaf node until a stopping criterion is met (e.g.,
all instances in a node belong to the same class or a predefined depth is
reached).
4.Support Vector Machine (SVM) for Linearly Separable Data
For linearly separable data, SVM aims to find the optimal hyperplane that separates the data
points into different classes with the maximum margin.
Key Concepts
1. Hyperplane
A hyperplane is a decision boundary that separates data points into distinct classes. For
2D data, it's a line; for 3D data, it's a plane.
2. Margin
The margin is the distance between the hyperplane and the closest data points from
each class. SVM maximizes this margin to ensure robust classification.
3. Support Vectors
Support vectors are the data points closest to the hyperplane. These points define the
margin and are crucial for determining the optimal hyperplane.
Steps in SVM for Linearly Separable Data
1. Preprocessing
Scale the features to ensure equal importance for all dimensions.
2. Define the Hyperplane
Use optimization techniques (like quadratic programming) to find the optimal www and b.
3. Classify
Use the hyperplane to classify new data points
Predicted class=sign(w.x+b)