Module 5 - Supervised Learning Algorithms
Module 5 - Supervised Learning Algorithms
Algorithms
1 - Linear Regression
• Simple linear regression is a statistical method that we can use to find a
relationship between two variables and make predictions.
• It is used to learn to predict a continuous value (dependent variable)
based on the features (independent variable) in the training dataset.
• The value of the dependent variable, which represents the effect, is
influenced by changes in the value of the independent variable.
• A simple linear regression model will produce a line of best fit, or the
regression line.
• Predicting a person's weight based on their height is a straightforward
example of this concept.
Linear Regression
2 - Logistic Regression
• Logistic Regression is a special case of Linear Regression where target
variable (y) is discrete/categorical such as 1 or 0, True or False, Yes or
No, Default or No Default.
• Logistic regression is a statistical method for predicting binary classes
normally.
• Using a logit function, logistic regression makes predictions about the
probability that a binary event will occur.
• Logistic Regression can be used for various classification problems, such
as spam detection.
• Linear regression gives a continuous output, but logistic regression
provides a constant output.
• An example of the continuous output is house price and stock price.
Example's of the discrete output is predicting whether a patient has cancer
or not.
Logistic Regression
2 – Type of Logistic Regression
1. Binary Logistic Regression: The target variable has only two possible
outcomes such as Spam or Not Spam, Cancer or No Cancer.
2. Multinomial Logistic Regression: The target variable has three or more
nominal categories, such as predicting the color. Nominal categories
represent categories or groups that have no inherent order or ranking
between them. These categories are mutually exclusive, but there is no
meaningful way to say that one category is "higher" or "lower" than
another.
3. Ordinal Logistic Regression: the target variable has three or more
ordinal categories, such as restaurant or product rating from 1 to 5.
Ordinal categories represent categories that have a specific order or
ranking but do not have a consistent interval between them. The order
implies that one category is higher or lower than another, but the magnitude
of the difference between them is not defined.
3 – Decision Tree
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for
solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features
of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
3 – Decision Tree
• A decision tree is a non-parametric supervised learning algorithm.
• Decision Tree algorithms continuously separate data in order to categorize
or make predictions depending on the results of the previous set of questions.
• The model analyzes the data and provides responses to the questions in order
to assist us in making more informed choices.
• We could, for instance, utilize a decision tree in which the answers Yes or No
are used to select a certain species of bird based on data elements like the
bird's feathers, its ability to fly or swim, the sort of beak it has, and so on.
3 – Decision Tree
3 – Decision Tree
• Decision trees attempt to classify a pattern through a sequence of
questions. For example, attributes such as gender and height can be
used to classify people as short or tall. But the best threshold for height
is gender dependent.
• A decision tree consists of nodes and leaves, with each leaf
denoting a class.
• Classes (tall or short) are the outputs of the tree.
• Attributes (gender and height) are a set of features that describe
the data.
• The input data consists of values of the different attributes. Using these
attribute values, the decision tree generates a class as the output for
each input data.
Decision Tree Example
• Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not.
• So, to solve this problem, the decision tree starts with the root node (Salary
attribute by Attribute Selection Measures-ASM).
• The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels.
• The next decision node further gets split into one decision node (Cab facility)
and one leaf node.
• Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer).
3 – Decision Tree Example
Decision Tree
Attribute Selection Measures
• While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes.
• So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM.
• By this measurement, we can easily select the best attribute for the nodes of
the tree.
• There are two popular techniques for ASM, which are:
• Information Gain
• Gini Index
Pruning: Getting an Optimal
• Decision
Pruning is a process of deleting unnecessarytree
nodes from a tree in order to get
the optimal decision tree.
• A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique that
decreases the size of the learning tree without reducing accuracy is known as
Pruning.
• There are mainly two types of tree pruning technology used:
• Cost Complexity Pruning
• Reduced Error Pruning.
4 – K Nearest Neighbors
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears, then it can be easily
classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
4 – K Nearest Neighbors
• KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new
data.
• Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know whether it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. KNN model will
find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve
this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below
diagram:
5 – Random Forest
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of
combining multiple classifiers to solve a complex problem and to improve the
performance of the model.
• Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset.
• Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and
it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
5 – Random Forest
5 – Random Forest Example
1. True Positives (TP): These are cases where the model predicted the
positive class correctly, and the actual outcome is also positive.
2. True Negatives (TN): These are cases where the model predicted the
negative class correctly, and the actual outcome is also negative.
3. False Positives (FP): These are cases where the model predicted the
positive class, but the actual outcome is negative. Also known as a Type I
error.
4. False Negatives (FN): These are cases where the model predicted the
negative class, but the actual outcome is positive. Also known as a Type II
error.
Confusion Matrix
• The values in the confusion matrix allow you to calculate various metrics to
assess the performance of the classification model, including:
• High recall indicates that the model is good at finding most of the positive
cases, even though it may also produce some false positives.
• Recall is particularly important in situations where missing a true positive is
costly or has serious consequences, such as in medical diagnoses or fraud
detection.
F1 Score
• The F1 Score is a metric that combines both precision and recall into a single
value, providing a balanced measure of a classification model's overall
performance.
• It's especially useful when dealing with imbalanced datasets where the number
of negative and positive samples is significantly different.
• The F1 Score is calculated as the harmonic mean of precision and recall.
• The F1 Score takes both false positives (FP) and false negatives (FN) into
account.
• It reaches its best value at 1 (perfect precision and recall) and its worst value at
0.
• An F1 Score of 1 indicates that the model has perfect precision and recall, while
an F1 Score of 0 suggests that the model has either low precision or low recall.
Quiz No. 01
• Total Marks: 03
• Date: Next Week Lecture
• Experiment yourself with all studied “Supervised Learning Algorithms” using
Python code.
• I will provide a scenario and dataset in the class randomly to all students. They have to
solve that problem on their laptops with the randomly provided “Supervised Learning
Algorithm” and they have to provide results in all “studied” quality assessment metrics
in this slide.
• User of Internet and ChatGPT or any other support in the class will lead to ZERO marks.