0% found this document useful (0 votes)
31 views129 pages

Unit 2 ML - Ver 2

Uploaded by

flytondk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views129 pages

Unit 2 ML - Ver 2

Uploaded by

flytondk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 129

Machine

Learning
Subject code:
Regulations: 2021 AL3451

Unit - 2 Supervised Learning

Linear Regression Models


Linear Regression Models
• Linear regression is a type of supervised machine-learning
algorithm that learns from the labelled datasets and maps the data
points to the most optimized linear functions, which can be used for
prediction on new datasets.
• Supervised learning has two types:
• Classification: It predicts the class of the dataset based on the independent
input variable. Class is the categorical or discrete values, like the image of
an animal is a cat or dog?
• Regression: It predicts the continuous output variables based on the
independent input variable. Like the prediction of house prices based on
different parameters like house age, distance from the main road, location,
area, etc.
Linear Regression
• Linear regression is a type of supervised machine learning algorithm
that computes the linear relationship between the dependent variable
and one or more independent variable by fitting a linear equation to
observed data.
• When there is only one independent variable, it is known as Simple
Linear Regression, and when there are more than one variable, it is
known as Multiple Linear Regression.
• When there is only one dependent variable, it is considered Univariate
Linear Regression, while when there are more than one dependent
variables, it is known as Multivariate Regression.
• Linear Regression models are simple and requires minimum memory
to implement
Simple Linear regression using
Least Square Method

• Simple linear regression is the simplest form of linear regression,


and it involves only one independent variable and one dependent
variable.
• The equation for simple linear regression is:
+x

where:
is the dependent variable
x is the independent variable
is the intercept
is the slope
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Least Square Method
Multiple Linear regression using
Least Square Method

• Multiple linear regression involves more than one independent


variable and one dependent variable.
• The equation for multiple linear regression is:
+ + + + ....... +

where:
is the dependent variable
, , , .... are the independent variable
is the intercept
, , , .... are the slope
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Multiple Linear Regression
Bayesian Linear Regression
• Regression is a machine learning task to predict continuous
values (real numbers), as compared to classification, that is used
to predict categorical (discrete) values.
• Bayesian Regression can be very useful when we have
insufficient data in the dataset or the data is poorly distributed.
• The output of a Bayesian Regression model is obtained from a
probability distribution.
• The aim of Bayesian Linear Regression is to find the ‘posterior’
distribution for the model parameters.
Bayesian Linear Regression

• The expression of Posterior is

• Posterior is the probability of an event to occur.

• Prior is the probability of an event that has occured previously.


• Likelihood represents the probability of observing the data given
the parameters of the model.
Bayesian Linear Regression
Baye’s Theorem:
Bayesian Linear Regression
Baye’s Theorem:
Bayesian Linear Regression
Bayesian Linear Regression
Bayesian Linear Regression
Bayesian Linear Regression
Gradient Descent
• Gradient Descent is an optimization algorithm
used to find the values of coefficients of a
function (hypothesis) that minimizes the cost
function.
• Mostly used in supervised machine learning
models for optimization of paramters in ML
models.
• The meaning of Gradient is slope of a curve and
the meaning of descent is movement to a lower
point.
• The algorithm makes use of the gradient (slope)
to reach the minimum (lowest point) of a Mean
Squared Error (MSE) function.
Gradient Descent

(2.9, 3.2)

(0.5, 1.4)
(2.3, 1.9)
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Gradient Descent
Take the derivatives of the sum of squared residuals with
respect to the Intercept

Slope =
Gradient Descent
Gradient Descent
Gradient Descent
4

3.5

2.5
Height

1.5

0.5

0
0 0.5 1 1.5 2 2.5
Weight
Gradient Descent
• Types of Gradient Descent:
1. Batch Gradient Descent
2. Stochastic Gradient Descent
3. Minibatch Gradient Descent
• Batch Gradient Descent involves calculations
over the full training set.
• Stochastic Gradient Descent runs one training
example per iteration.
• Minibatch Gradient Descent divides the training
datasets into small batch sizes and performs the
updates on those batches separately.
Gradient Descent
Reminder .....
• Supervised learning has 2 types of tasks based
on the nature of the target feature (dependent
variable):
1. Regression
2. Classification

• When the target feature (dependent variable) is


CONTINUOUS, it is a REGRESSION task.

• When the target feature (dependent variable) is


DISCRETE, it is a CLASSIFICATION task.

• So far covered, linear regression models like


Least squared method, Bayesian linear
In this lecture....
• The different Linear Classification Models to
handle classification tasks are:
1. Probabilistic Discriminative Model
2. Probabilistic Generative Model
3. Maximum Margin Classifier
• Probabilistic Discriminative Model:
• Focus on the boundary between the classes
rather than on how the data of each class is
distributed.
• Model the posterior probabilities P(Y∣X)
using a parameterized function and typically
do not model the distribution of the features.
• Example: Logistic Regression (Models the
In this lecture....
• Probabilistic Generative Model:
• Unlike discriminative models, Probabilistic
Generative Models estimate how the data
are generated for each class.
• Attempts to model the joint probability
distribution P(X,Y), or separately P(X∣Y) and
P(Y) .
• Example: Naive Bayes (Assumes
independence among features given the
class label and models P(X∣Y) for each
class).
• Maximum Margin Classifier:
• Focus on finding the decision boundary that maximizes
the margin (distance) between the closest points of
Linear Discriminant Function
• The primary use of the linear discriminant
function is to classify data points into two or
more classes.

• It achieves this by creating a decision boundary


that separates different classes in a dataset.

• For example, in a binary classification problem,


the function can predict whether an email is
spam or not based on features derived from
the email's content.

• The equation for a linear discriminant function


in a 2D space is essentially a line equation that
separates two different classes of data.
Linear Discriminant Function
• This line can be described by the general linear
equation:
+ =0
Where:
• and are the input features (the coordinates in
the 2D space),
• and are the weights assigned to these
features, and
• is the bias term.
• In the context of a linear discriminant function,
and represent the coefficients that define the
orientation of the line in the 2D space, and (b)
shifts the line away from the origin.
Linear Discriminant Function
• This line forms the decision boundary that
discriminates between the two classes.
+ =0

• Points on one side of the line are classified into


one category, and points on the other side are
classified into another.

• For easier interpretation, the line equation can


also be converted into the slope-intercept form,
which is more familiar in basic algebra:

Where corresponds to , corresponds to , is the


slope, and is the y-intercept.
Linear Discriminant Function

• By rearranging the line equation for 2D input


feature space:
+ =0

()

• Here, the y-intercept and the slope .

• For a d-dimensional space, the hyperplane


equation will be :
+ + ..... + = 0
Linear Discriminant Function

• [ ...... ]

• Therefore the linear discriminant function is X


=0

Example: + - 1 = 0

[-1 1 1] =0
Perceptron Algoritham
• Perceptron is an example of linear discriminant
function.

• The core purpose of the perceptron is to


classify input data into one of two categories
(binary classification).

• Preceptron uses generalized linear model with


an activation function.

• This activation function is a step function from


-1 to 1 of the form.
Example for Perceptron Algoritham
• Find the weights required to perform the
following classification using Perceptron
network.
• The vectors (1,1,1,1) and (-1,1,-1,-1) are
belong to class 1, vectors (1,1,1,-1) and (1,-1,-
1,1) are belong to class -1.
• Assume the learning rate as 1 and initial
weights as 0.
Example for Perceptron Algoritham
Example for Perceptron Algoritham
Example for Perceptron Algoritham EPOCH-1
Example for Perceptron Algoritham EPOCH-2
Example for Perceptron Algoritham EPOCH-2
Example for Perceptron Algoritham EPOCH-3
Example for Perceptron Algoritham EPOCH-3
Example for Perceptron Algoritham
Probabilistic Discriminative Model
• Discriminative models
learn about the
boundary between
classes within a
dataset.

• Discriminative models
excel at classification
tasks by effectively
distinguishing
•between
Discriminativedifferent
models set out to answer the
classes.
following question:
“What side of the decision boundary is this
data found in?”
Logistic Regression
• Linear regression predicts
the numerical response
but is not suitable for
predicting the categorical
values.

• When categorical
variables are involved, it
is called classification
problem and logistic
regression is suitable for
binary classification
problem.

• For example, logistic


Logistic Regression
• For example, What if
the organization wants
to know whether an
employee would get a
promotion or not
based on their
performance?

• The linear graph won’t


be suitable for this and
Sigmoid Curve is
suitable for this.

• Based on the threshold


values, the
Logistic Regression
• Odds of Success = Odds(θ)=

• Odds(θ)=

• Consider the equation of the straight


line:

• Now to predict the odds of success, we

log
take log on odds formula
Logistic Regression
• Exponentiating on both sides,
we have
• + Y() = Y

• Since , • (1+Y) = Y

• =
• Let Y = ) • =
• Then, Y • =
• = Y*()

• = Y - Y()
Types of Logistic Regression
• There are three types of Logistic Regressions:
1. Binary logistic regression
2. Multinomial logistic regression
3. Ordinal logistic regression

• Binary logistic regression:

It’s an either/or solution. There are just two


possible outcome answers. This concept is
typically represented as a 0 or a 1 in coding.
Examples include: classifying an object as an
animal or not an animal.
Types of Logistic Regression
• Multinomial logistic regression

It is a model where there are multiple


classes (set of three or more classes) that an
item can be classified as. Examples include:
Classifying texts into what language they come
from.
• Ordinal logistic regression

Ordinal logistic regression is also a model


where there are multiple classes that an item
can be classified as; however, in this case an
ordering of classes is required. Examples
include: Ranking restaurants on a scale of 0 to 5
Probabilistic Generative Model
• The Generative model learns a probability
distribution for the dataset, it can reference
this probability distribution to generate new
data instances.
• Generative models often rely on Bayes
theorem to find the joint probability, finding
p(x,y).
• Generative models model how the data was
generated, and answers the following
question:
“What’s the likelihood that this class or
another class generated this data
Naive Bayes Classifier

• The following equation is used to calculate the


posterior probability of each and every hypothesis
and the one hypothesis that give the maximum
value is the solution:
Naive Bayes Classifier
Naive Bayes Classifier
Naive Bayes Classifier
Support Vector Machine
Support Vector Machine
Support Vector Machine
Support Vector Machine
Support Vector Machine
Support Vector Machine
Support Vector Machine
Support Vector Machine
Support Vector Machine
Decision Tree
• A decision tree is a type of supervised learning
algorithm that is commonly used in machine
learning to model and predict outcomes based
on input data.
• It is a tree-like structure where each internal
node tests on attribute, each branch
corresponds to attribute value and each leaf
node represents the final decision or prediction.
• The decision tree algorithm falls under the
category of supervised learning. They can be
used to solve both regression and classification
problems.
Decision Tree

• The goal of machine learning is to decrease


uncertainty or disorders from the dataset and
for this, decision trees are used.
• The concepts like entropy, information gain,
and Gini index are involved in decision tree to
Decision Tree
• Entropy is the measure of uncertainty of a
random variable in our dataset or measure of
disorder. The higher the entropy more the
information content.

• Information gain measures the reduction of


uncertainty by entropy.
Decision Tree
• The Gini Index is a measure of the inequality or
impurity of a distribution, commonly used in
decision trees and other machine learning
algorithms. It ranges from 0 to 0.5, where 0
indicates a pure set (all instances belong to the
same class), and 0.5 indicates a maximally
impure set (instances are evenly distributed
across classes).
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
Decision Tree
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
ID3 Decision Tree Learning
Random Forest
Random Forest
Random Forest
Random Forest
Random Forest

You might also like