0% found this document useful (0 votes)
6 views

Module 5 - Supervised Learning Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 5 - Supervised Learning Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Machine Learning

Algorithms
1 - Linear Regression
• Simple linear regression is a statistical method that we can use to find a
relationship between two variables and make predictions.
• It is used to learn to predict a continuous value (dependent variable)
based on the features (independent variable) in the training dataset.
• The value of the dependent variable, which represents the effect, is
influenced by changes in the value of the independent variable.
• A simple linear regression model will produce a line of best fit, or the
regression line.
• Predicting a person's weight based on their height is a straightforward
example of this concept.
Linear Regression
2 - Logistic Regression
• Logistic Regression is a special case of Linear Regression where target
variable (y) is discrete/categorical such as 1 or 0, True or False, Yes or
No, Default or No Default.
• Logistic regression is a statistical method for predicting binary classes
normally.
• Using a logit function, logistic regression makes predictions about the
probability that a binary event will occur.
• Logistic Regression can be used for various classification problems, such
as spam detection.
• Linear regression gives a continuous output, but logistic regression
provides a constant output.
• An example of the continuous output is house price and stock price.
Example's of the discrete output is predicting whether a patient has cancer
or not.
Logistic Regression
2 – Type of Logistic Regression
1. Binary Logistic Regression: The target variable has only two possible
outcomes such as Spam or Not Spam, Cancer or No Cancer.
2. Multinomial Logistic Regression: The target variable has three or more
nominal categories, such as predicting the color. Nominal categories
represent categories or groups that have no inherent order or ranking
between them. These categories are mutually exclusive, but there is no
meaningful way to say that one category is "higher" or "lower" than
another.
3. Ordinal Logistic Regression: the target variable has three or more
ordinal categories, such as restaurant or product rating from 1 to 5.
Ordinal categories represent categories that have a specific order or
ranking but do not have a consistent interval between them. The order
implies that one category is higher or lower than another, but the magnitude
of the difference between them is not defined.
3 – Decision Tree
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for
solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features
of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
3 – Decision Tree
• A decision tree is a non-parametric supervised learning algorithm.
• Decision Tree algorithms continuously separate data in order to categorize
or make predictions depending on the results of the previous set of questions.
• The model analyzes the data and provides responses to the questions in order
to assist us in making more informed choices.
• We could, for instance, utilize a decision tree in which the answers Yes or No
are used to select a certain species of bird based on data elements like the
bird's feathers, its ability to fly or swim, the sort of beak it has, and so on.
3 – Decision Tree
3 – Decision Tree
• Decision trees attempt to classify a pattern through a sequence of
questions. For example, attributes such as gender and height can be
used to classify people as short or tall. But the best threshold for height
is gender dependent.
• A decision tree consists of nodes and leaves, with each leaf
denoting a class.
• Classes (tall or short) are the outputs of the tree.
• Attributes (gender and height) are a set of features that describe
the data.
• The input data consists of values of the different attributes. Using these
attribute values, the decision tree generates a class as the output for
each input data.
Decision Tree Example
• Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not.
• So, to solve this problem, the decision tree starts with the root node (Salary
attribute by Attribute Selection Measures-ASM).
• The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels.
• The next decision node further gets split into one decision node (Cab facility)
and one leaf node.
• Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer).
3 – Decision Tree Example
Decision Tree
Attribute Selection Measures
• While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes.
• So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM.
• By this measurement, we can easily select the best attribute for the nodes of
the tree.
• There are two popular techniques for ASM, which are:
• Information Gain
• Gini Index
Pruning: Getting an Optimal
• Decision
Pruning is a process of deleting unnecessarytree
nodes from a tree in order to get
the optimal decision tree.
• A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique that
decreases the size of the learning tree without reducing accuracy is known as
Pruning.
• There are mainly two types of tree pruning technology used:
• Cost Complexity Pruning
• Reduced Error Pruning.
4 – K Nearest Neighbors
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears, then it can be easily
classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
4 – K Nearest Neighbors
• KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new
data.
• Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know whether it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. KNN model will
find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve
this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below
diagram:
5 – Random Forest
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of
combining multiple classifiers to solve a complex problem and to improve the
performance of the model.
• Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset.
• Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and
it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
5 – Random Forest
5 – Random Forest Example

• Suppose there is a dataset that


contains multiple fruit images. So,
this dataset is given to the Random
forest classifier.
• The dataset is divided into subsets
and given to each decision tree.
• During the training phase, each
decision tree produces a prediction
result, and when a new data point
occurs, then based on the majority
of results, the Random Forest
classifier predicts the final decision.
6 – Naive Bayes
• Naïve Bayes algorithm is a supervised learning algorithm, which is based on
Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building fast machine learning models that can
make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Type of Naive Bayes Algorithm
• Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from
the Gaussian distribution.
• Multinomial: The Multinomial Naïve Bayes classifier is used when the
data is multinomial distributed. It is primarily used for document
classification problems; it means a particular document belongs to
which category, such as Sports, Politics, education, etc. The classifier
uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similarly to the Multinomial
classifier, but the predictor variables are the independent Booleans
variables. Such as if a particular word is present or not in a document.
This model is also famous for document classification tasks.
7 – Support Vector Machine
• Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future. This
best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and hence
algorithm is termed as Support Vector Machine.
• Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
7 – Support Vector Machine
• Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
Support Vector Machine Example
• SVM can be understood with the example that we have used in the KNN
classifier.
• Suppose we see a strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat or dog, so such a
model can be created by using the SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it can
learn about different features of cats and dogs, and then we test it with this
strange creature.
• So as support vector creates a decision boundary between these two data
(cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.
Support Vector Machine Example
Types of Support Vector Machine
• Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier used is called as Non-
linear SVM classifier.
Performance Metrics
It is extremely important to use quantitative metrics for evaluating a machine
learning model
For classification
• Accuracy/Precision/Recall/F1-score, ROC curves,…
For regression
• Normalized RMSE, Normalized Mean Absolute Error (NMAE),…
Confusion Matrix
• A confusion matrix is a table used in classification to evaluate the
performance of a machine learning model, particularly in tasks like binary
classification.
• It helps in understanding the model's performance by providing a summary of
the model's predictions and their actual outcomes.
Confusion Matrix
• In a binary classification scenario, the confusion matrix consists of four
values:

1. True Positives (TP): These are cases where the model predicted the
positive class correctly, and the actual outcome is also positive.
2. True Negatives (TN): These are cases where the model predicted the
negative class correctly, and the actual outcome is also negative.
3. False Positives (FP): These are cases where the model predicted the
positive class, but the actual outcome is negative. Also known as a Type I
error.
4. False Negatives (FN): These are cases where the model predicted the
negative class, but the actual outcome is positive. Also known as a Type II
error.
Confusion Matrix
• The values in the confusion matrix allow you to calculate various metrics to
assess the performance of the classification model, including:

• Accuracy: (TP + TN) / (TP + TN + FP + FN) - the proportion of correct


predictions.
• Precision: TP / (TP + FP) - the accuracy of positive predictions.
• Recall (Sensitivity or True Positive Rate): TP / (TP + FN) - the ability to
find all the positive samples.
• Specificity (True Negative Rate): TN / (TN + FP) - the ability to find all the
negative samples.
• F1 Score: 2 * (Precision * Recall) / (Precision + Recall) - a balanced measure
of precision and recall.
Accuracy
Accuracy is a measure of how close a given set of guessing from our model is
closed
to their true value.

• If a classifier makes 10 predictions and 9 of them are correct, the accuracy is


90%.

• Accuracy is a measure of how well a binary classifier correctly identifies or


excludes
a condition.
• It’s the proportion of correct predictions among the total number of cases
examined.
Precision
Precision is a metric used to evaluate the accuracy of positive predictions made
by a classification model. Precision (How much we are precise in the
detection). Precision is the accuracy of positive predictions. It's defined as:

• TP (True Positives) is the number of correctly predicted positive instances.


• FP (False Positives) is the number of instances that were predicted as
positive but were actually negative.
• In other words, precision measures the proportion of the positive predictions
made by the model that were actually correct.
• It is a crucial metric when you want to minimize false positives, as it tells you
how well the model performs when it predicts a positive outcome.
Recall (Sensitivity or True Positive
Rate)
Recall, also known as Sensitivity or True Positive Rate, is a metric used to
evaluate the ability of a classification model to correctly identify all the relevant or
positive instances in a dataset. Recall (How much we are good at detecting).
Recall is the ability to find all the positive samples. It is defined as:

• High recall indicates that the model is good at finding most of the positive
cases, even though it may also produce some false positives.
• Recall is particularly important in situations where missing a true positive is
costly or has serious consequences, such as in medical diagnoses or fraud
detection.
F1 Score
• The F1 Score is a metric that combines both precision and recall into a single
value, providing a balanced measure of a classification model's overall
performance.
• It's especially useful when dealing with imbalanced datasets where the number
of negative and positive samples is significantly different.
• The F1 Score is calculated as the harmonic mean of precision and recall.
• The F1 Score takes both false positives (FP) and false negatives (FN) into
account.
• It reaches its best value at 1 (perfect precision and recall) and its worst value at
0.
• An F1 Score of 1 indicates that the model has perfect precision and recall, while
an F1 Score of 0 suggests that the model has either low precision or low recall.
Quiz No. 01
• Total Marks: 03
• Date: Next Week Lecture
• Experiment yourself with all studied “Supervised Learning Algorithms” using
Python code.
• I will provide a scenario and dataset in the class randomly to all students. They have to
solve that problem on their laptops with the randomly provided “Supervised Learning
Algorithm” and they have to provide results in all “studied” quality assessment metrics
in this slide.
• User of Internet and ChatGPT or any other support in the class will lead to ZERO marks.

You might also like