Unit 4 Supervised Learning
Unit 4 Supervised Learning
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with
the corresponding output.
•Binary Classifier: If the classification, problem has only two possible
or DOG, etc.
•Linear Models
• Logistic Regression
• Support Vector Machines
•Non-linear Models
• K-Nearest Neighbors
• Decision Tree Classification
Logistic Regression:
Machine Learning
K-Nearest Neighbor(KNN)
Predict genre of “Barbie” movie with IMDB rating 7.4 and duration 114 minutes.
What is KNN here?
OR
How to find KNN?
Step 1
What is Euclidian distance?
Best is to take K = 5
Step 3
•K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
•K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
•K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
•K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
•K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
K-Nearest Neighbor(KNN) Algorithm
•It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action
on the dataset.
•KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories. To solve this type of problem, we need a K-
NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance
is the distance between two points, which we have already studied in geometry. It can be
calculated as:
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:
How to select the value of K in the K-NN Algorithm?
• There is no particular way to determine the best value for "K", so we need to try
some values to find the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
• Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
•Always needs to determine the value of K which may be complex some time.
•The computation cost is high because of calculating the distance between the data
Logistic regression
Logistic regression
A data set with one or more independent variables is used to determine binary output of the dependent
variable.
Example 2: Plot between time of person operating a website and if he/she clicked on website.
Note:
• Plotting a regression line between dependent and
independent variable is not giving any prediction.
• So, in the classification, like Yes/No, True/False we use
logistic regression that is based on probability.
• Sigmoid function is used to convert independent variable values into an expression of probability.
• All the probability values will lie between 0 and 1.
• For the binary classification, the data
points can be classified as Class A and
Class B.
Instead of the straight line, we have gradual relationship in probability case, which results into the
sigmoid curve.
What is logistic regression?
• This type of statistical model (also known as logit model) is often used for classification. Logistic regression
estimates the probability of an event occurring, such as voted or didn’t vote, based on a given dataset of
independent variables.
• Since the outcome is a probability, the dependent variable is bounded between 0 and 1. In logistic regression, a
logit transformation is applied on the odds—that is, the probability of success divided by the probability of
failure.
• For binary classification, a probability less than .5 will predict 0 while a probability greater than 0 will predict
1. After the model has been computed, it’s best practice to evaluate the how well the model predicts the
Independent Variable
HYPOTHESIS h 𝜃 ( 𝑥 ) =𝜃 1+ 𝜃 2 𝑥
PARAMETERS 𝜃1 , 𝜃2
𝑚
1 (𝑖) 2
COST FUNCTION 𝐽 ( 𝜃1 , 𝜃 2 )= ∑ ( h𝜃 ( 𝑥 ) − 𝑦 )
(𝑖)
𝑚 𝑖=1
GOAL w.r.t.
STEP-BY-STEP PROCEDURE
1.Collect data: Collect data on the dependent variable and one or more
independent variables. Ensure that the data is in a format that can be used
for statistical analysis.
2.Plot the data: Plot the data on a scatter plot with the dependent variable on
the y-axis and the independent variable on the x-axis. This will give you a
visual representation of the relationship between the variables.
3.Calculate the correlation coefficient: Calculate the correlation coefficient
between the dependent variable and each independent variable. This will
tell you the strength and direction of the relationship between the variables.
4.Fit the regression line: Use a regression equation to fit a line to the data.
The equation of the line is typically in the form of y = mx + b, where y is
the dependent variable, x is the independent variable, m is the slope of the
line, and b is the y-intercept.
STEP-BY-STEP PROCEDURE
5.Evaluate the model: Evaluate the model by examining the residual
plot, which shows the difference between the actual and predicted
values. A good model will have residuals that are randomly distributed
around the regression line.
6.Test the model: Test the model by using it to predict the value of the
dependent variable for new values of the independent variable. This is
known as making predictions or inference.
7.Interpret the results: Interpret the results by analyzing the coefficients
of the model. The slope of the line tells you the strength and direction of
the relationship between the variables, while the intercept tells you the
value of the dependent variable when the independent variable is zero.
8.Refine the model: Refine the model by adding more independent
variables or using a more complex regression equation if necessary.
Linear Regression
HYPOTHESIS h 𝜃 ( 𝑥 ) =𝑎 1+ 𝑎 2 𝑥
PARAMETERS 𝑎 1 ,𝑎 2
J
COST FUNCTION
GOAL w.r.t.
What if there are multiple features?
GRADIENT DESCENT
• A linear regression model can be trained
using the optimization algorithm gradient
descent by iteratively modifying the
model’s parameters to reduce the mean
squared error (MSE) of the model on a
training dataset.
• To update θ1 and θ2 values in order to
reduce the Cost function (minimizing
RMSE value) and achieve the best-fit line
the model uses Gradient Descent.
• The idea is to start with random θ1 and θ2
values and then iteratively update the
values, reaching minimum cost.
GRADIENT DESCENT
PROCEDURE
i) Assume initial values of θ1, θ2 and learning rate
ii) Calculate h, (h-y) and (h-y)*x
iii) Determine gradient
iv) Update θ1and θ2
v) Repeat the procedure until convergence criteria is met (error is
negligible)
PRACTICE PROBLEM
For a component subjected to fluctuating load, its life was determined
experimentally for different loading conditions as mentioned in the
table. Determine regression model and predict the life of component
when maximum stress is 175 MPa.