ML Supervised Regression
ML Supervised Regression
DC - 1
DC - 2
Tree of Classification
CLASSIFICATION
INTRINSIC EXTRINSIC
(UNSUPERVISED) (SUPERVISED)
NON-HIERARCHICAL
HIERARCHICAL (PARTITIONAL)
DC -3
The machine learning framework
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
DC - 4
The machine learning framework
y = f(x)
output prediction Image
function feature
DC - 5
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model
Testing
Image Learned
Prediction
Features model
Test Image DC - 6
Supervised Learning
DC - 7
Features
Raw pixels
Histograms
GIST (Global
Image)
descriptors
…
DC - 8
Recognition task and supervision
• Images in the training set must be annotated with the
“correct answer” that the model is expected to produce
Contains a motorbike
DC - 9
Unsupervised “Weakly” supervised Fully supervised
DC - 12
Bias-Variance Trade-off
DC - 13
The perfect classification algorithm
DC - 14
Remember…
DC - 15
How to reduce variance?
DC - 16
Classification
• Assign input vector to one of two or more classes
• Any decision rule divides input space into decision regions
separated by decision boundaries
DC - 17
Supervised Machine Learning
The main goal of the supervised learning technique is:
Classification
Regression
DC - 18
Classification
What is the Classification Algorithm?
In Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No,
0 or 1, Spam or Not Spam, cat or dog, etc.
In the below diagram, there are two classes, class A and Class B. These classes
have features that are similar to each other and dissimilar to other classes.
DC - 20
Classification
The algorithm which implements the classification on a dataset is known as a
classifier. There are two types of Classifications:
DC - 21
Learners of Classification Problems
In the classification problems, there are two types of learners:
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives the test dataset. In Lazy learner case, classification is done based on the
most related data stored in the training dataset. It takes less time in training but
more time for predictions. Example: K-NN algorithm, Case-based reasoning
DC - 22
Types of Machine Learning Classification Algorithms
Classification Algorithms can be further divided into the Mainly two category:
Linear Models
Logistic Regression
Support Vector Machines
Non-Linear Models
K-Nearest Neighbours
Kernel SVM
Naïve Bayes
Decision Tree Classification
Random Forest Classification
DC - 23
Evaluating of Classification Model
DC - 24
Evaluating of Classification Model
2. Confusion Matrix:
DC - 26
Accuracy Assessment Model
Precision and Recall
DC - 29
Accuracy Assessment Model
Precision and Recall
DC - 30
Accuracy Assessment Model
Precision and Recall
Precision means the percentage of
your results which are relevant. On
the other hand, Recall refers to the
percentage of total relevant results
correctly classified by your algorithm.
DC - 31
Accuracy Assessment Model
Confusion Matrix
DC - 32
Accuracy Assessment Model
Confusion Matrix
DC - 33
Accuracy Assessment Model
Confusion Matrix
DC - 34
Accuracy Assessment Model
DC - 35
Accuracy Assessment Model
DC - 36
Regression in Machine Learning
DC - 37
Types of Regression
DC - 38
Liner Regression
DC - 39
Liner Regression
DC - 40
Liner Regression
Linear regression is a statistical regression method which is used for predictive
analysis.
It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.
Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.
If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.
DC - 41
Liner Regression
The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
DC - 42
Liner Regression
DC - 43
Liner Regression
DC - 44
Liner Regression
DC - 45
Liner Regression
Example: Create a relationship model for the data given in table to find the relation
between height and weight of students.
Height Weight
151 63
174 81
138 56
186 91
128 47
136 57
179 76
163 72
152 62
131 48
DC - 46
Why Liner Regression is not suitable for classification
Y=1
In Fig., the output is either 1 or 0. Thus a
regression line will not be able to build
such a classifier model which gives
Predicted Y can
Y
exceed 0 and 1 range
categorical output.
Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
DC - 48
Logistic Regression
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is overweight or not based on its
weight, etc.
Logistic Regression can be used to classify the observations using different types
of data and can easily determine the most effective variables used for the
classification.
DC - 49
Logistic Regression
Logistic Function (Sigmoid function):
The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the
Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.
DC - 50
Logistic Regression
Assumption for Logistic Regression:
(Logit function) DC - 52
Logistic Regression
DC - 53
Building Logistic Regression Model (Logit Function)
DC - 54
Building Logistic Regression Model (Logit Function)
Y=1
DC - 55
Maximum Likelihood Estimation
In Linear Regression, we used the method of least squares estimate regression
coefficients.
The likelihood function is used to estimate the probability by observing the data.
i.e. it is the probability that the observed values of the dependent variable may be
predicted from the observed values of the independent variables.
It is easier to work with the logarithm of the likelihood function. This function is
known as the log-likelihood.
DC - 56
Maximum Likelihood Estimation
DC - 57
Maximum Likelihood Estimation
DC - 58
Logistic Training
DC - 59
Logistic Training
DC - 60
Logistic Training
y = p (1-p)
DC - 61
Type of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.
DC - 62
Polynomial Regression
Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.
Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
DC - 63
Polynomial Regression
The equation for polynomial regression also
derived from linear regression equation that
means Linear regression equation Y= b0+
b1x, is transformed into Polynomial
regression equation Y= b0+b1x+ b2x2+
b3x3+.....+ bnxn.
DC - 64
Support Vector Regression
Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.
Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line
which helps to predict the continuous variables and cover most of the datapoints.
Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a margin
for datapoints.
Support vectors: Support vectors are the datapoints which are nearest to the hyperplane and
opposite class. DC - 65
Support Vector Regression
In SVR, we always try to determine a hyperplane with a maximum margin, so
that maximum number of datapoints are covered in that margin.
The main goal of SVR is to consider the maximum datapoints within the
boundary lines and the hyperplane (best-fit line) must contain a maximum
number of datapoints.
DC - 66
Decision Tree Regression
Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
It can solve problems for both categorical and numerical data.
Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test,
and each leaf node represents the final decision or result.
A decision tree is constructed starting from the root node/parent node (dataset),
which splits into left and right child nodes (subsets of dataset). These child nodes
are further divided into their children node, and themselves become the parent
node of those nodes. Consider the below image:
DC - 67
Decision Tree Regression
DC - 68
Random Forest Regression
DC - 69
Random Forest Regression
DC - 70