0% found this document useful (0 votes)
34 views32 pages

Unit - 2 ML

Uploaded by

imjyoti1511
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views32 pages

Unit - 2 ML

Uploaded by

imjyoti1511
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit-2

Supervised Learning
Contents
Types of Supervised Learning
 Supervised Machine Learning Algorithms
k Nearest Neighbors
Regression Models
Naive Bayes Classifiers
Decision Trees
Ensembles of Decision Trees
Kernelized Support Vector Machines
Uncertainty Estimates from Classifiers
Supervised Machine Learning

 Supervised learning is the types of machine learning in which machines are


trained using well "labelled" training data, and on basis of that data,
machines predict the output.
 The labelled data means some input data is already tagged with the correct
output.
 In supervised learning, the training data provided to the machines work as
the supervisor that teaches the machines to predict the output correctly.
 It applies the same concept as a student learns in the supervision of the
teacher.
 Supervised learning is a process of providing input data as well as correct
output data to the machine learning model.
 The aim of a supervised learning algorithm is to find a mapping function to
map the input variable(x) with the output variable(y).
How Supervised Learning Works?

 In supervised learning, models are trained using labelled dataset, where the
model learns about each type of data.
 Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.
KNN Algorithm :
K-Nearest Neighbour is one of the simplest
Machine Learning algorithms based on Supervised
Learning technique.
K-NN algorithm can be used for Regression as
well as for Classification but mostly it is used for
the Classification problems.
K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
KNN algorithm at the training phase just stores
the dataset and when it gets new data, then it
classifies that data into a category that is much
similar to the new data.
Why do we need a K-NN Algorithm?
How does K-NN work?
The K-NN working can be explained on the basis of
the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K
number of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the
number of the data points in each category.
Step-5: Assign the new data points to that category
for which the number of the neighbor is maximum.
Step-6: Our model is ready.
How to select the value of K in the K-NN Algorithm :
There is no particular way to determine the
best value for "K", so we need to try some
values to find the best out of them. The most
preferred value for K is 5.
A very low value for K such as K=1 or K=2,
can be noisy and lead to the effects of outliers
in the model.
Large values for K are good, but it may find
some difficulties.
Advantages of KNN Algorithm:
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is
large.
Disadvantages of KNN Algorithm:
Always needs to determine the value of K
which may be complex some time.
The computation cost is high because of
calculating the distance between the data
points for all the training samples.
Example
Start by visualizing some data points:
import matplotlib.pyplot as plt

x = [4, 5, 10, 4, 3, 11, 14 , 8, 10, 12]


y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
classes = [0, 0, 1, 0, 0, 1, 1, 0, 1, 1]

plt.scatter(x, y, c=classes)
plt.show()
 Now we fit the KNN algorithm with K=1
 from sklearn.neighbors import KNeighborsClassifier

data = list(zip(x, y))


knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(data, classes)
 And use it to classify a new data point:
Example
 new_x = 8
new_y = 21
new_point = [(new_x, new_y)]

prediction = knn.predict(new_point)

plt.scatter(x + [new_x], y + [new_y], c=classes + [prediction[0]])


plt.text(x=new_x-1.7, y=new_y-0.7, s=f"new point, class:
{prediction[0]}")
plt.show()
Regression Analysis

Regression analysis is a statistical method to


model the relationship between a dependent
(target) and independent (predictor) variables
with one or more independent variables.
 More specifically, Regression analysis helps
us to understand how the value of the
dependent variable is changing
corresponding to an independent variable
when other independent variables are held
fixed.
 It predicts continuous/real values such
as temperature, age, salary, price, etc.
Example: Suppose there is a marketing
company A, who does various advertisement
every year and get sales on that. The below
list shows the advertisement made by the
company in the last 5 years and the
corresponding sales:
"Regression shows a line or curve that
passes through all the datapoints on
target-predictor graph in such a way that
the vertical distance between the
datapoints and the regression line is
minimum." The distance between datapoints
and line tells whether a model has captured a
strong relationship or not.
Some examples of regression can be as:
Prediction of rain using temperature and
other factors
Determining Market trends
Prediction of road accidents due to rash
driving.
Types of Regression
Linear Regression:
 Linear regression is a statistical regression method which is used for
predictive analysis.
 It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
 It is used for solving the regression problem in machine learning.
 Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.
 If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one
input variable, then such linear regression is called multiple linear
regression.
 The relationship between variables in the linear regression model
can be explained using the below image. Here we are predicting the
salary of an employee on the basis of the year of experience.
Y= aX+b
Here, Y = dependent variables (target
variables),
X= Independent variables (predictor
variables),
a and b are the linear coefficients
Some popular applications of linear
regression are:
Analyzing trends and sales estimates
Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.
Logistic Regression:
Logistic regression is another supervised learning
algorithm which is used to solve the classification
problems. In classification problems, we have
dependent variables in a binary or discrete format such as
0 or 1.
Logistic regression algorithm works with the categorical
variable such as 0 or 1, Yes or No, True or False, Spam or
not spam, etc.
It is a predictive analysis algorithm which works on the
concept of probability.
Logistic regression is a type of regression, but it is
different from the linear regression algorithm in the term
how they are used.
Logistic regression uses sigmoid function or logistic
function which is a complex cost function. This sigmoid
function is used to model the data in logistic regression.
The function can be represented as:
f(x)= Output between the 0 and 1 value.
x= input to the function
e= base of natural logarithm.
When we provide the input values (data) to
the function, it gives the S-curve as follows:
It uses the concept of threshold levels, values
above the threshold level are rounded up to
1, and values below the threshold level are
rounded up to 0.
There are three types of logistic regression:
Binary(0/1, pass/fail)
Multi(cats, dogs, lions)
Ordinal(low, medium, high)
Polynomial Regression:
Polynomial Regression is a type of regression which
models the non-linear dataset using a linear model.
It is similar to multiple linear regression, but it fits a
non-linear curve between the value of x and
corresponding conditional values of y.
Suppose there is a dataset which consists of
datapoints which are present in a non-linear fashion,
so for such case, linear regression will not best fit to
those datapoints. To cover such datapoints, we need
Polynomial regression.
In Polynomial regression, the original features
are transformed into polynomial features of given
degree and then modeled using a linear
model. Which means the datapoints are best fitted
using a polynomial line.
The equation for polynomial regression also
derived from linear regression equation that
means Linear regression equation Y= b0+ b1x, is
transformed into Polynomial regression equation
Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
Here Y is the predicted/target output, b0, b1,...
bn are the regression coefficients. x is
our independent/input variable.
The model is still linear as the coefficients are
still linear with quadratic
Support Vector Regression:
 Support Vector Machine is a supervised learning algorithm
which can be used for regression as well as classification
problems. So if we use it for regression problems, then it is
termed as Support Vector Regression.
 Support Vector Regression is a regression algorithm which
works for continuous variables. Below are some keywords
which are used in Support Vector Regression:
 Kernel: It is a function used to map a lower-dimensional
data into higher dimensional data.
 Hyperplane: In general SVM, it is a separation line
between two classes, but in SVR, it is a line which helps to
predict the continuous variables and cover most of the
datapoints.
 Boundary line: Boundary lines are the two lines apart
from hyperplane, which creates a margin for datapoints.
 Support vectors: Support vectors are the datapoints
which are nearest to the hyperplane and opposite class.
In SVR, we always try to determine a
hyperplane with a maximum margin, so that
maximum number of datapoints are covered
in that margin. The main goal of SVR is to
consider the maximum datapoints within
the boundary lines and the hyperplane
(best-fit line) must contain a maximum
number of datapoints. Consider the below
image:
Decision Tree Regression:
Decision Tree is a supervised learning algorithm which
can be used for solving both classification and
regression problems.
It can solve problems for both categorical and
numerical data
Decision Tree regression builds a tree-like structure in
which each internal node represents the "test" for an
attribute, each branch represent the result of the test,
and each leaf node represents the final decision or
result.
A decision tree is constructed starting from the root
node/parent node (dataset), which splits into left and
right child nodes (subsets of dataset). These child
nodes are further divided into their children node, and
themselves become the parent node of those nodes.
Consider the below image:
Random forest is one of the most powerful supervised
learning algorithms which is capable of performing
regression as well as classification tasks.
The Random Forest regression is an ensemble learning
method which combines multiple decision trees and
predicts the final output based on the average of each tree
output. The combined decision trees are called as base
models, and it can be represented more formally as:
g(x)= f0(x)+ f1(x)+ f2(x)+....
Random forest uses Bagging or Bootstrap
Aggregation technique of ensemble learning in which
aggregated decision tree runs in parallel and do not
interact with each other.
With the help of Random Forest regression, we can
prevent Overfitting in the model by creating random
subsets of the dataset.
Ridge Regression:
 Ridge regression is one of the most robust versions of linear
regression in which a small amount of bias is introduced so that we
can get better long term predictions.
 The amount of bias added to the model is known as Ridge
Regression penalty. We can compute this penalty term by
multiplying with the lambda to the squared weight of each individual
features.
 The equation for ridge regression will be:

 A general linear or polynomial regression will fail if there is high


collinearity between the independent variables, so to solve such
problems, Ridge regression can be used.
 Ridge regression is a regularization technique, which is used to
reduce the complexity of the model. It is also called as L2
regularization.
 It helps to solve the problems if we have more parameters than
samples.
Lasso Regression:
Lasso regression is another regularization
technique to reduce the complexity of the
model.
It is similar to the Ridge Regression except that
penalty term contains only the absolute weights
instead of a square of weights.
Since it takes absolute values, hence, it can
shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
It is also called as L1 regularization. The
equation for Lasso regression will be:

You might also like