100% found this document useful (1 vote)
235 views75 pages

Unit 4 Supervised Learning

Supervised learning involves learning the relationship between input features and output targets using labeled datasets to train algorithms that classify data or predict outcomes accurately. K-nearest neighbors (KNN) is a simple supervised machine learning algorithm that stores all available data and classifies new data based on similarity, by finding the K closest neighbors and predicting the class based on majority voting.

Uploaded by

Soumya Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
235 views75 pages

Unit 4 Supervised Learning

Supervised learning involves learning the relationship between input features and output targets using labeled datasets to train algorithms that classify data or predict outcomes accurately. K-nearest neighbors (KNN) is a simple supervised machine learning algorithm that stores all available data and classifies new data based on similarity, by finding the K closest neighbors and predicting the class based on majority voting.

Uploaded by

Soumya Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

What is supervised learning?

• Supervised learning, also known as supervised machine learning, is a


subcategory of machine learning and AI. It is defined by its use of
labeled datasets to train algorithms that to classify data or predict
outcomes accurately.
• As input data is fed into the model, it adjusts its weights until the
model has been fitted appropriately, which occurs as part of the cross
validation process.
• Supervised learning helps organizations solve for a variety of real-
world problems at scale, such as classifying spam in a separate folder
Suppose we have a dataset of different types of shapes which
includes square, rectangle, triangle, and Polygon. Now we need to
train and test the model for each shape.
Steps Involved in Supervised Learning:
• First Determine the type of training dataset
• Collect/Gather the labelled training data.
• Split the training dataset into training dataset, test dataset, and validation dataset.
• Determine the input features of the training dataset, which should have enough knowledge so
that the model can accurately predict the output.
• Determine the suitable algorithm for the model, such as support vector machine, decision tree,
etc.
• Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
• Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.
Advantages of Supervised learning:
• With the help of supervised learning, the
model can predict the output on the basis of
prior experiences.
• In supervised learning, we can have an
exact idea about the classes of objects.
• No human intervention needed
(automation).
• Supervised learning model helps us to
Disadvantages of supervised learning:
•Supervised learning models are not suitable for handling the
complex tasks.
•Supervised learning cannot predict the correct output if the test
data is different from the training dataset.
•Training required lots of computation times.
•In supervised learning, we need enough knowledge about the
classes of object.
Classification
• Classification algorithms are used when the output variable is
categorical, which means there are different classes such as Yes-No,
Male-Female, True-false, different fruits, colors etc. Classification
is a predictive model that approximates a mapping function from
input variables to identify discrete output variables, which can be
labels or categories.
• The mapping function of classification algorithms is responsible for
predicting the label or category of the given input variables. A
classification algorithm can have both discrete and real-valued
Scatter Plot of Multi-Class Classification Dataset

Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with
the corresponding output.
•Binary Classifier: If the classification, problem has only two possible

outcomes, then it is called as Binary Classifier.

Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT

or DOG, etc.

•Multi-class Classifier: If a classification, problem has more than two

outcomes, then it is called as Multi-class Classifier.

Examples: Classifications of types of crops, Classification of types of


The different types of classification algorithms include:

•Linear Models
• Logistic Regression
• Support Vector Machines
•Non-linear Models
• K-Nearest Neighbors
• Decision Tree Classification
Logistic Regression:

• Logistic regression is a supervised machine learning


algorithm mainly used for classification tasks where the
goal is to predict the probability that an instance of
belonging to a given class or not. It is a kind of statistical
algorithm, which analyze the relationship between a set of
independent variables and the dependent binary variables.
It is a powerful tool for decision-making. For example
email spam or not.
• It’s referred to as regression because it takes the linear
regression function as input and uses a sigmoid function to
SUPERVISED LEARNING
SUPERVISED LEARNING
• Supervised machine learning involves learning the relationship
between input features and output targets.
• The inputs are known as features or ‘X variables’ and output is
generally referred to as the target or ‘y variable’.
• It is defined by its use of labeled
datasets to train algorithms that classify
data or predict outcomes accurately.
• As input data is fed into the model, it
adjusts its weights until the model has
been fitted appropriately, which occurs
as part of the cross-validation process.
SUPERVISED LEARNING: Applications
• Prediction
• Classification
• Risk Assessment
• Fraud Detection, Spam detection
• Recommendation Systems
• Natural Language Processing
UNSUPERVISED AND SUPERVISED LEARNING
Types of Machine Learning Algorithms

Machine Learning

Supervised Learning Unsupervised Learning

Classification Regression Clustering Association


Supervised Learning
1.Classification: In classification tasks, the machine learning program
must draw a conclusion from observed values and determine to
what category new observations belong. For example, when filtering
emails as ‘spam’ or ‘not spam’, the program must look at existing
observational data and filter the emails accordingly.
2.Regression: In regression tasks, the machine learning program must
estimate – and understand – the relationships among variables.
Regression analysis focuses on one dependent variable and a series of
other changing variables – making it particularly useful for prediction
and forecasting.
Classification Examples
Regression Examples
Student Regre Student’s
’s Percenta
Profile ssion ge
Regression vs Classification

Predicting values Predicting classes


Classification

K-Nearest Neighbor(KNN)

Predict movie genre

Predict genre of “Barbie” movie with IMDB rating 7.4 and duration 114 minutes.
What is KNN here?
OR
How to find KNN?
Step 1
What is Euclidian distance?

It is a straight line distance


Step 2

Select K Nearest Neighbors Take KNN=1 here

Best is to take K = 5

Take KNN=3 here

Step 3

Majority voting (classification)


K-Nearest Neighbor(KNN) Algorithm

•K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
•K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
•K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
•K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
•K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
K-Nearest Neighbor(KNN) Algorithm

•It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action
on the dataset.
•KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories. To solve this type of problem, we need a K-
NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

Step-4: Among these k neighbors, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the

neighbor is maximum.

Step-6: Our model is ready.


• Firstly, we will choose the number of neighbors, so we will choose the k=5.

• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance

is the distance between two points, which we have already studied in geometry. It can be

calculated as:
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in

category A and two nearest neighbors in category B. Consider the below image:
How to select the value of K in the K-NN Algorithm?

• There is no particular way to determine the best value for "K", so we need to try

some values to find the best out of them. The most preferred value for K is 5.

• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of

outliers in the model.

• Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:

•It is simple to implement.

•It is robust to the noisy training data

•It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

•Always needs to determine the value of K which may be complex some time.

•The computation cost is high because of calculating the distance between the data

points for all the training samples.


Logistic regression

Logistic regression
Logistic regression

A data set with one or more independent variables is used to determine binary output of the dependent

variable.
Example 2: Plot between time of person operating a website and if he/she clicked on website.

Note:
• Plotting a regression line between dependent and
independent variable is not giving any prediction.
• So, in the classification, like Yes/No, True/False we use
logistic regression that is based on probability.
• Sigmoid function is used to convert independent variable values into an expression of probability.
• All the probability values will lie between 0 and 1.
• For the binary classification, the data
points can be classified as Class A and
Class B.
Instead of the straight line, we have gradual relationship in probability case, which results into the
sigmoid curve.
What is logistic regression?

• This type of statistical model (also known as logit model) is often used for classification. Logistic regression

estimates the probability of an event occurring, such as voted or didn’t vote, based on a given dataset of

independent variables.

• Since the outcome is a probability, the dependent variable is bounded between 0 and 1. In logistic regression, a

logit transformation is applied on the odds—that is, the probability of success divided by the probability of

failure.

• For binary classification, a probability less than .5 will predict 0 while a probability greater than 0 will predict

1. After the model has been computed, it’s best practice to evaluate the how well the model predicts the

dependent variable, which is called goodness of fit.


Regression
• Regression is a statistical technique used to analyze the relationship
between two or more variables.
• The purpose of regression analysis is to identify the nature of the
relationship between the independent variable(s) and the dependent
variable, and to use this relationship to make predictions about the
dependent variable.
• In regression analysis, the dependent variable is the variable that is being
predicted, while the independent variable(s) are the variables that are
being used to make the prediction.
• The relationship between the independent and dependent variables can
be linear or non-linear, and the goal of regression analysis is to identify
the nature of this relationship.
Regression: Example
Suppose there is a marketing company A, who does various advertisement every year and get sales on that.
The below list shows the advertisement made by the company in the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the


current year and wants to know the prediction about the sales for this
year. So to solve such type of prediction problems in machine learning,
we need regression analysis.
Types of Regression: Linear Regression
• Linear regression algorithm shows a
linear relationship between a
dependent (y) and one or more
independent (y) variables, hence called
as linear regression.
Linear Regression

• Linear regression is one of the easiest and most popular Machine


Learning algorithms. It is a statistical method that is used for predictive
analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
Underfitting & Overfitting
• Underfitting is a scenario in data science where a data model is
unable to capture the relationship between the input and output
variables accurately, generating a high error rate on both the training
set and unseen data.
• It occurs when a model is too simple, which can be a result of a model
needing more training time, more input features, or less
regularization.
• Overfitting is the opposite of underfitting, occurring when the model
has been overtrained or when it contains too much complexity,
resulting in high error rates on test data.
Underfitting vs. Overfitting
Underfitting & Overfitting
Regression: Examples
• Given data about the size of houses on the real estate market, try to
predict their price.
• Given a picture of Male/Female, We have to predict his/her age on
the basis of given picture.
Regression
• In Regression, we plot a graph between the variables which best fits
the given datapoints, using this plot, the machine learning model can
make predictions about the data.
• In simple words, "Regression shows a line or curve that passes
through all the datapoints on target-predictor graph in such a way
that the vertical distance between the datapoints and the regression
line is minimum.“
• The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
Regressi
on
Dependent Variable

Independent Variable

Linear regression Non-linear regression


Linear Regression
• Linear regression is one of the easiest and most popular Machine
Learning algorithms. It is a statistical method that is used for predictive
analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc

HYPOTHESIS h 𝜃 ( 𝑥 ) =𝜃 1+ 𝜃 2 𝑥

PARAMETERS 𝜃1 , 𝜃2
𝑚
1 (𝑖) 2
COST FUNCTION 𝐽 ( 𝜃1 , 𝜃 2 )= ∑ ( h𝜃 ( 𝑥 ) − 𝑦 )
(𝑖)
𝑚 𝑖=1

GOAL w.r.t.
STEP-BY-STEP PROCEDURE
1.Collect data: Collect data on the dependent variable and one or more
independent variables. Ensure that the data is in a format that can be used
for statistical analysis.
2.Plot the data: Plot the data on a scatter plot with the dependent variable on
the y-axis and the independent variable on the x-axis. This will give you a
visual representation of the relationship between the variables.
3.Calculate the correlation coefficient: Calculate the correlation coefficient
between the dependent variable and each independent variable. This will
tell you the strength and direction of the relationship between the variables.
4.Fit the regression line: Use a regression equation to fit a line to the data.
The equation of the line is typically in the form of y = mx + b, where y is
the dependent variable, x is the independent variable, m is the slope of the
line, and b is the y-intercept.
STEP-BY-STEP PROCEDURE
5.Evaluate the model: Evaluate the model by examining the residual
plot, which shows the difference between the actual and predicted
values. A good model will have residuals that are randomly distributed
around the regression line.
6.Test the model: Test the model by using it to predict the value of the
dependent variable for new values of the independent variable. This is
known as making predictions or inference.
7.Interpret the results: Interpret the results by analyzing the coefficients
of the model. The slope of the line tells you the strength and direction of
the relationship between the variables, while the intercept tells you the
value of the dependent variable when the independent variable is zero.
8.Refine the model: Refine the model by adding more independent
variables or using a more complex regression equation if necessary.
Linear Regression
HYPOTHESIS h 𝜃 ( 𝑥 ) =𝑎 1+ 𝑎 2 𝑥

PARAMETERS 𝑎 1 ,𝑎 2
J
COST FUNCTION

GOAL w.r.t.
What if there are multiple features?
GRADIENT DESCENT
• A linear regression model can be trained
using the optimization algorithm gradient
descent by iteratively modifying the
model’s parameters to reduce the mean
squared error (MSE) of the model on a
training dataset.
• To update θ1 and θ2 values in order to
reduce the Cost function (minimizing
RMSE value) and achieve the best-fit line
the model uses Gradient Descent.
• The idea is to start with random θ1 and θ2
values and then iteratively update the
values, reaching minimum cost.
GRADIENT DESCENT
PROCEDURE
i) Assume initial values of θ1, θ2 and learning rate
ii) Calculate h, (h-y) and (h-y)*x
iii) Determine gradient
iv) Update θ1and θ2
v) Repeat the procedure until convergence criteria is met (error is
negligible)
PRACTICE PROBLEM
For a component subjected to fluctuating load, its life was determined
experimentally for different loading conditions as mentioned in the
table. Determine regression model and predict the life of component
when maximum stress is 175 MPa.

Maximum Life (in


Experiment stress million
(MPa) revolution)
1 210 203
2 200 236
3 185 249.5
4 163 265.5
5 150 319
Multiple regression model
•Assumptions: Multiple regression assumes that there is a linear relationship
between the independent variables and the dependent variable. It also assumes
no or low multicollinearity (high correlation between independent variables),
homoscedasticity (equal variance of residuals), and that residuals follow a
normal distribution.
•Applications: Multiple regression is widely used in various fields such as
economics, social sciences, marketing, and science to model and predict real-
world phenomena. It's helpful for making predictions, understanding
relationships between variables, and identifying significant factors.

You might also like