0% found this document useful (0 votes)
43 views70 pages

ML Supervised Regression

The document discusses supervised learning, which is a machine learning framework where the machine is presented with example inputs and their desired outputs, and the goal is to learn a general rule that maps inputs to outputs. Some key aspects covered include classification vs regression problems, how models are trained and tested, reducing overfitting and underfitting, and evaluating model performance. The document provides an overview of important concepts in supervised learning.

Uploaded by

Ayan Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views70 pages

ML Supervised Regression

The document discusses supervised learning, which is a machine learning framework where the machine is presented with example inputs and their desired outputs, and the goal is to learn a general rule that maps inputs to outputs. Some key aspects covered include classification vs regression problems, how models are trained and tested, reducing overfitting and underfitting, and evaluating model performance. The document provides an overview of important concepts in supervised learning.

Uploaded by

Ayan Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Supervised Learning

DC - 1
DC - 2
Tree of Classification

CLASSIFICATION

NON-EXCLUSIVE EXCLUSIVE (NON-


(OVERLAPPING) OVERLAPPING)

INTRINSIC EXTRINSIC
(UNSUPERVISED) (SUPERVISED)

NON-HIERARCHICAL
HIERARCHICAL (PARTITIONAL)
DC -3
The machine learning framework

• Apply a prediction function to a feature representation of the


image to get the desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
DC - 4
The machine learning framework

y = f(x)
output prediction Image
function feature

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},


estimate the prediction function f by minimizing the prediction error on the
training set
• Testing: apply f to an unknown test example x and output the predicted
value y = f(x)

DC - 5
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Testing

Image Learned
Prediction
Features model
Test Image DC - 6
Supervised Learning

DC - 7
Features

Raw pixels

Histograms

GIST (Global
Image)
descriptors

…
DC - 8
Recognition task and supervision
• Images in the training set must be annotated with the
“correct answer” that the model is expected to produce

Contains a motorbike

DC - 9
Unsupervised “Weakly” supervised Fully supervised

Definition depends on task


DC - 10
Generalization

Training set (labels known) Test set (labels


unknown)

• How well does a learned model generalize from


the data it was trained on to a new test set?
DC - 11
Generalization
• Components of generalization error
• Bias: how much the average model over all training sets differ from the true model?
• Error due to inaccurate assumptions/simplifications made by the model
• Variance: how much models estimated from different training sets differ from each
other
• Underfitting: model is too “simple” to represent all the relevant class
characteristics
• High bias and low variance
• High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant characteristics
(noise) in the data
• Low bias and high variance
• Low training error and high test error

DC - 12
Bias-Variance Trade-off

E(MSE) = noise2 + bias2 + variance

Unavoidable Error due to Error due to


error incorrect variance of
assumptions training samples

DC - 13
The perfect classification algorithm

• Objective function: encodes the right loss for the problem


• Parameterization: makes assumptions that fit the problem
• Regularization: right level of regularization for amount of training data
• Training algorithm: can find parameters that maximize objective on training set
• Inference algorithm: can solve for objective function in evaluation

DC - 14
Remember…

• No classifier is inherently better than any other: you need to


make assumptions to generalize

• Three kinds of error


– Inherent: unavoidable
– Bias: due to over-simplifications
– Variance: due to inability to perfectly estimate parameters from limited
data

DC - 15
How to reduce variance?

• Choose a simpler classifier

• Regularize the parameters

• Get more training data

DC - 16
Classification
• Assign input vector to one of two or more classes
• Any decision rule divides input space into decision regions
separated by decision boundaries

DC - 17
Supervised Machine Learning
The main goal of the supervised learning technique is:

 To map the input variable(x) with the output variable(y).


 Some real-world applications of supervised learning are Risk Assessment, Fraud
Detection, Spam filtering, etc.

Categories of Supervised Machine Learning

The supervised machine learning can be categories by two types:

 Classification
 Regression

DC - 18
Classification
What is the Classification Algorithm?

 The Classification algorithm is a Supervised Learning technique that is used to


identify the category of new observations on the basis of training data.

 In Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No,
0 or 1, Spam or Not Spam, cat or dog, etc.

 Classes can be called as targets/labels or categories.

 In classification algorithm, a discrete output function(y) is mapped to input


variable(x). y=f(x), where y = categorical output

 The best example of an ML classification algorithm is Email Spam Detector.


DC - 19
Classification
 Classification algorithms can be better understood using the below diagram.

 In the below diagram, there are two classes, class A and Class B. These classes
have features that are similar to each other and dissimilar to other classes.

DC - 20
Classification
The algorithm which implements the classification on a dataset is known as a
classifier. There are two types of Classifications:

 Binary Classifier: If the classification problem has only two possible


outcomes, then it is called as Binary Classifier.
 Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
DOG, etc.

 Multi-class Classifier: If a classification problem has more than two outcomes,


then it is called as Multi-class Classifier.
 Classifications of types of crops, Classification of types of music,
Classification of types of materials, targets etc.

DC - 21
Learners of Classification Problems
In the classification problems, there are two types of learners:

1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives the test dataset. In Lazy learner case, classification is done based on the
most related data stored in the training dataset. It takes less time in training but
more time for predictions. Example: K-NN algorithm, Case-based reasoning

2. Eager Learners: Eager Learners develop a classification model based on a


training dataset before receiving a test dataset. Opposite to Lazy learners, Eager
Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.

DC - 22
Types of Machine Learning Classification Algorithms
Classification Algorithms can be further divided into the Mainly two category:

 Linear Models

 Logistic Regression
 Support Vector Machines

 Non-Linear Models

 K-Nearest Neighbours
 Kernel SVM
 Naïve Bayes
 Decision Tree Classification
 Random Forest Classification

DC - 23
Evaluating of Classification Model

DC - 24
Evaluating of Classification Model
2. Confusion Matrix:

 The confusion matrix provides us a matrix/table as output and describes the


performance of the model.
 It is also known as the error matrix.
 The matrix consists of predictions result in a summarized form, which has a total
number of correct predictions and incorrect predictions. The matrix looks like as below
table:

Actual Positive Actual Negative

Predicted True Positive (TP) False Positive (FP)


Positive
Predicted False Negative (FN) True Negative (TN)
Negative
DC - 25
Accuracy Assessment Model
Precision and Recall

One fine morning, Jack got a phone call. It was


a stranger on the line. Jack, still sipping his
freshly made morning coffee, was hardly in a
position to understand what was coming for
him. The stranger said, “Congratulations Jack!
You have won a lottery of $10 Million! I just
need you to provide me your bank account
details, and the money will be deposited in
your bank account right way…”

DC - 26
Accuracy Assessment Model
Precision and Recall

Type I (False Positive) and Type II Errors (False Negative)

Let me try to explain the complexity here.


Assuming Jack is a normal guy, he would think of
this as a joke, or maybe, a scam to get his bank
details, and hence will deny to provide any
information. However, this decision is based on his
assumption that the call was a joke. If he is right,
he will save the money in his bank account. But, if
he is wrong, this decision would cost him a million
dollars!
DC - 27
Accuracy Assessment Model
Precision and Recall
Let’s talk in statistical terms for a bit. According to
me, the null hypothesis in this case is that this call is
a fraud. As a matter of fact, if Jack would have
believed the stranger and provided his bank details,
and the call was in fact a fraud, he would have
committed a type I error, also known as a false
positive. On the other hand, had he ignored the
stranger’s request, but later found out that he actually
had won the lottery and the call was not a hoax, he
would have committed a Type II error, or a false
negative.
DC - 28
Accuracy Assessment Model
Precision and Recall

DC - 29
Accuracy Assessment Model
Precision and Recall

DC - 30
Accuracy Assessment Model
Precision and Recall
Precision means the percentage of
your results which are relevant. On
the other hand, Recall refers to the
percentage of total relevant results
correctly classified by your algorithm.

DC - 31
Accuracy Assessment Model
Confusion Matrix

DC - 32
Accuracy Assessment Model
Confusion Matrix

DC - 33
Accuracy Assessment Model
Confusion Matrix

DC - 34
Accuracy Assessment Model

DC - 35
Accuracy Assessment Model

DC - 36
Regression in Machine Learning

DC - 37
Types of Regression

DC - 38
Liner Regression

DC - 39
Liner Regression

DC - 40
Liner Regression
 Linear regression is a statistical regression method which is used for predictive
analysis.

 It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.

 It is used for solving the regression problem in machine learning.

 Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.

 If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.

DC - 41
Liner Regression
 The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.

DC - 42
Liner Regression

DC - 43
Liner Regression

DC - 44
Liner Regression

DC - 45
Liner Regression
Example: Create a relationship model for the data given in table to find the relation
between height and weight of students.
Height Weight
151 63
174 81
138 56
186 91
128 47
136 57
179 76
163 72
152 62
131 48
DC - 46
Why Liner Regression is not suitable for classification

Y=1
 In Fig., the output is either 1 or 0. Thus a
regression line will not be able to build
such a classifier model which gives
Predicted Y can
Y
exceed 0 and 1 range
categorical output.

 This shows that linear regression is not


a suitable for those class of
Y=0 classification problems.
X
DC - 47
Logistic Regression
 Logistic regression is Machine Learning algorithms under the Supervised Learning
technique. It is used for predicting the categorical dependent variable using a given
set of independent variables.

 Logistic regression predicts the output of a categorical dependent variable.


Therefore the outcome must be a categorical or discrete value. It can be either Yes
or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1.

 Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.

DC - 48
Logistic Regression
 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).

 The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is overweight or not based on its
weight, etc.

 Logistic Regression is a significant machine learning algorithm because it has the


ability to provide probabilities and classify new data using continuous and discrete
datasets.

 Logistic Regression can be used to classify the observations using different types
of data and can easily determine the most effective variables used for the
classification.
DC - 49
Logistic Regression
Logistic Function (Sigmoid function):

 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.

 It maps any real value into another value within a range of 0 and 1.

 The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the
Sigmoid function or the logistic function.

 In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.

DC - 50
Logistic Regression
Assumption for Logistic Regression:

 The dependent variable must be categorical in nature.

 The independent variable should not have multi-collinearity.

Note: Logistic regression uses


the concept of predictive
modeling as regression;
therefore, it is called logistic
regression, but is used to
classify samples; Therefore, it
falls under the classification
algorithm.
DC - 51
Logistic Regression

(Logit function) DC - 52
Logistic Regression

DC - 53
Building Logistic Regression Model (Logit Function)

DC - 54
Building Logistic Regression Model (Logit Function)

Y=1

 The output of logistic


Predicted Y can regression either falls in
Y
exceed 0 and 1 class 0 or class 1.
range

 The regression curve in this


case is that of the logit
function, which gives the
Y=0
X probability the output will fall
in class 1 or class 0.

DC - 55
Maximum Likelihood Estimation
 In Linear Regression, we used the method of least squares estimate regression
coefficients.

 In Logistic Regression, we will use another approach called Maximum Likelihood


Estimation. The maximum likelihood estimate of a parameter is that the maximizes
the probability of the observed data.

 The likelihood function is used to estimate the probability by observing the data.
i.e. it is the probability that the observed values of the dependent variable may be
predicted from the observed values of the independent variables.

 The likelihood varies from 0 to 1. like probability.

 It is easier to work with the logarithm of the likelihood function. This function is
known as the log-likelihood.
DC - 56
Maximum Likelihood Estimation

DC - 57
Maximum Likelihood Estimation

DC - 58
Logistic Training

DC - 59
Logistic Training

DC - 60
Logistic Training

y = p (1-p)
DC - 61
Type of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:

 Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.

 Multinomial: In multinomial Logistic regression, there can be 3 or more possible


unordered types of the dependent variable, such as "cat", "dogs", or "sheep“.

 Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered


types of dependent variables, such as "low", "Medium", or "High".

DC - 62
Polynomial Regression
 Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.

 It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.

 Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.

 In Polynomial regression, the original features are transformed into


polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.

DC - 63
Polynomial Regression
 The equation for polynomial regression also
derived from linear regression equation that
means Linear regression equation Y= b0+
b1x, is transformed into Polynomial
regression equation Y= b0+b1x+ b2x2+
b3x3+.....+ bnxn.

 Here Y is the predicted/target output, b0,


b1,... bn are the regression coefficients. x
is our independent/input variable.

 The model is still linear as the coefficients


are still linear with quadratic.

DC - 64
Support Vector Regression
 Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.

 Support Vector Regression is a regression algorithm which works for continuous


variables. Below are some keywords which are used in Support Vector
Regression:

 Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.

 Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line
which helps to predict the continuous variables and cover most of the datapoints.

 Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin
for datapoints.

 Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and
opposite class. DC - 65
Support Vector Regression
 In SVR, we always try to determine a hyperplane with a maximum margin, so
that maximum number of datapoints are covered in that margin.

 The main goal of SVR is to consider the maximum datapoints within the
boundary lines and the hyperplane (best-fit line) must contain a maximum
number of datapoints.

 Consider the below image:

 Here, the blue line is called hyperplane,


and the other two lines are known as
boundary lines.

DC - 66
Decision Tree Regression
Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.

It can solve problems for both categorical and numerical data.

Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test,
and each leaf node represents the final decision or result.

A decision tree is constructed starting from the root node/parent node (dataset),
which splits into left and right child nodes (subsets of dataset). These child nodes
are further divided into their children node, and themselves become the parent
node of those nodes. Consider the below image:

DC - 67
Decision Tree Regression

 Image showing the example of


Decision Tee regression, here, the
model is trying to predict the choice
of a person between Sports cars or
Luxury car.

DC - 68
Random Forest Regression

DC - 69
Random Forest Regression

DC - 70

You might also like