0% found this document useful (0 votes)
7 views

Module 1

The document explains the concept of learning, particularly focusing on machine learning, which is the ability of a computer program to improve its performance on tasks through experience. It outlines different types of learning, such as supervised, unsupervised, semi-supervised, and reinforcement learning, along with their applications and workflows. Additionally, it discusses the importance of data, algorithms, and computing power in the context of machine learning.

Uploaded by

Shashank Tyagi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module 1

The document explains the concept of learning, particularly focusing on machine learning, which is the ability of a computer program to improve its performance on tasks through experience. It outlines different types of learning, such as supervised, unsupervised, semi-supervised, and reinforcement learning, along with their applications and workflows. Additionally, it discusses the importance of data, algorithms, and computing power in the context of machine learning.

Uploaded by

Shashank Tyagi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 175

What is Learning?

Learning
The ability to improve behavior based on the experience.

1
What is Learning?
Learning
The ability to improve behavior based on the experience.

2
What is Learning?

Identify fruits in the image

3
What is Learning?

Remember the names of the fruit

Ackee Mangosteen Rambutan Horned Finger Lime


Melon

4
What is Learning?

Identify Fruits in the image

5
What is Learning?

How many Fruit names predicted correctly

Student 1 Student 2 Student 3

Correct 2 3 4

Incorrect 3 2 1

Performance 40% 60% 80%

6
What is Machine Learning?

What we are expecting from Machine


Hey Machine! Sure
Can you Learn Human!
yourself? Feed me
data

7
Machine Learning

A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.

8
Machine Learning to a Layman

You had a task : Identify Fruit Names


You Experienced: Remembered name and image of the fruit
Performance : How many fruit names you correctly Identified

9
What is Machine Learning?
Machine Learning
Design of Algorithm that-
• Learn from data or build models using that data
• The learned model can be used to
• Detect patterns/structures/themes/trends etc. in the data
• Make predictions about future data and make decisions
• Modern ML algorithms are heavily “data-driven”
• No need to pre-define and hard-code all the rules (usually
infeasible/impossible anyway).
• The rules are not “static”; can adapt as the ML algorithm
ingests with more and more data.

10
Machine Learning vs Programming

11
When to Use Machine Learning?
• Human expertise is absent
Example: navigating on mars
• Humans are unable to explain their expertise Example:
vision, speech, language
• Requirements and data change over time
Example: Tracking, Biometrics, Personalized
fingerprint recognition
• The problem or the data size is just too large

Example: Web Search


• When not to use it: If you can precisely/mathematically describe
how to solve the task. Just program it.
12
Why Machine Learning?
• Machine Learning term first coined in 1959
• Computer Model based on Neural Network was created in 1943

13
Why Machine Learning?
• Machine Learning term first coined in 1959
• Computer Model based on Neural Network was created in 1943

14
Why Machine Learning?

DATA OPTIMIZED COMPUTING


ALGORITHMS POWER

15
Why Machine Learning?

• Structured Data
• Unstructured Data
• “More than 300 million photos get uploaded per day.
• Every minute there are 510,000 comments posted
and 293,000 statuses updated”
DATA • “Over 2.5 quintillion bytes of data are created
every single day, and it's only going to grow from
there.
• By 2020, it is said that 1.7MB of data has been
created every second for every person on earth”
More than 80% data is unstructured.

16
Why Machine Learning?

Python
Libraries: Pandas, Numpy, Sklearn, Keras
TensorFlow, PyTorch, Theano

OPTIMIZED
ALGORITHMS Less programming more science!

17
Why Machine Learning?

Powerful CPUs
GPU
Parallel and Distributed Computing

COMPUTING
POWER

18
Jargon Difference!

Machine Learning

Artificial Intelligence Deep Learning

Data Science

19
Jargon Difference!

20
Types of Learning
• Supervised (inductive) learning: Training data includes desired
outputs.
• Unsupervised learning: Training data does not include desired
outputs, Find hidden/interesting structure in data.
• Semi-supervised learning: Training data includes a few desired
outputs
• Reinforcement learning: the learner interacts with the world via
“actions” and tries to find an optimal policy of behavior with
respect to “rewards” it receives from the environment

21
Types of Learning

22
A Typical Supervised Learning Workflow (for Classification)

23
A Typical Supervised Learning Workflow (for Classification)

24
A Typical Un-supervised Learning Workflow (for Clustering)

25
A Typical Un-supervised Learning Workflow (for Clustering)

Note: Unsupervised Learning too


can have (and often has) a “test”
phase.
E.g., in this case, given a new
cat/dog image, predict which of the
two clusters it belongs to.

It can be done by assigning the


image to the cluster with closer
centroid
26
A Typical Reinforcement Learning Workflow

Agent’s goal is to learn a policy for


some task

Agent does the following repeatedly


• Senses/observes the environment
• Takes an action based on its current
policy
• Receives a reward for that action
• Updates its policy

There IS supervision, not explicit (as in


Supervised Learning) but rather implicit
(feedback based)

27
Geometric View of Some Basic ML Problems
Regression

Supervised Learning:
Learn a line/curve (the “model”)
using training data consisting of
Input-output pairs (each output is a
real-valued number)

Use it to predict the outputs for new


“test” inputs

Classification

Supervised Learning: Learn a


linear/nonlinear separator (the
“model”) using training data
consisting of input-output pairs
Two-Class Multi-Class Two-Class (binary) Multi-Class Nonlinear
Use it to predict the labels for new (binary) Linear Nonlinear Classification
“test” inputs Linear Classification Classification
Classification
28
Geometric View of Some Basic ML Problems

Clustering

Unsupervised Learning: Learn the


grouping structure for a given set
of unlabeled inputs

Dimensionality Reduction

Unsupervised Learning: Learn a


Low-dimensional representation for
a given set of high-dimensional
inputs

Two-dim to one-dim Three-dim to two-dim nonlinear


Note: DR also comes in supervised
linear projection projection (a.k.a. manifold
flavors (supervised DR)
learning)
29
Machine Learning = Probability Density Estimation
Supervised Learning (“predict y given x ”) can be thought of as estimating p(y|x )

Labeled
Training “dog

“do
Data g”
“do
g”
Supervised ML p(class|image)
“ca
“ca t”
“ca t”
t”

Unsupervised Learning (“model x ”) can also be thought of as estimating p(x )

Unlabeled
Training Unsupervised ML p(image)
Data

Harder for Unsupervised Learning because there is no supervision y


Other ML paradigms (e.g., Reinforcement Learning) can be thought of as learning prob. density

30
Machine Learning = Function Approximation
Supervised Learning (“predict y given x ”) can be thought learning a function that maps x to y

Labeled “dog
Training “do

Data “do
g”
g”
Supervise : image class
d ML
“ca
t”
“cat”
“cat”

Unsupervised Learning (“model x ”) can also be thought of as learning a function that maps x to some useful
latent representation of x

Unlabeled
Training
Data latent
Unsupervis
ed ML
: representation
image of image (e.g.,
cluster id or
compressed
version)
Harder for Unsupervised Learning because there is no supervision y
Other ML paradigms (e.g., Reinforcement Learning) can be thought of as doing function approx.
31
Machine Learning in the real-world
Broadly applicable in many domains (e.g., internet, robotics, healthcare and
biology, computer vision, NLP, databases, computer systems, finance, etc.)

32
Machine Learning helps Computer Vision

33
Machine Learning helps Computer Vision

34
Machine Learning helps NLP

35
Machine Learning helps NLP

36
Machine Learning helps NLP

37
Machine Learning helps NLP- Search and Info Retrieval

38
Machine Learning meets Speech Processing
ML algorithms can learn to translate speech in real time

39
Machine Learning helps Chemistry
ML algorithms can understand properties of molecules and learn to
synthesize new molecules

40
Machine Learning helps Chemistry
ML algorithms can “read” databases of matetials and recreate the
Periodic Table within hours
“Recreated” Periodic Table

41
Machine Learning helps in Biology, E-commerce

42
Inductive Learning

43
Classification Learning
Task T
Input
• A set of instances d1, d2, ...., dn
• An instance has a set of features
• We can represent an instance as a vector
• d = <x1,x2, x3, ...., xn>

Output
• A set of predictions y1, y2, y3, ...., yc
• One of the fixed set of constant values
• Eg: {+1, -1}

Performance P - How accurately model predicts the output

Experience E - A set of labeled examples (x,y) where y is the true


label of x. 44
Need for Inductive Learning
• There are basically two methods for knowledge extraction firstly
from domain experts and then with machine learning.
• For a very large amount of data, the domain experts are not very
useful and reliable.
• So we move towards the machine learning approach for this
work.

45
Inductive Learning
• Also called as Deterministic Supervised Learning
• In this, first input x, (the verified value) given to a function f, and
the output is f(x).
• Then we can give different set of inputs (raw inputs) to the same
function f, and verify the output f(x).
• By using the outputs we generate (learn) the rules.

46
Inductive Learning
• Inductive learning, also known as discovery learning, is a process
where the learner discovers rules by observing examples.
• We can often work out rules for ourselves by observing
examples. If there is a pattern; then record it.
• We then apply the rule in different situations to see if it works.
• With inductive language learning, tasks are designed specifically
to guide the learner and assist them in discovering a rule.

47
Inductive Learning
• Inductive learning or “Prediction”:
– Given examples of a function (X, F(X))
– Predict function F(X) for new examples X
This is the function which we are trying to learn.

• Classification
F(X) = Discrete
• Regression
F(X) = Continuous
• Probability estimation
F(X) = Probability(X):

Why it is called Inductive learning?


We are given some data and we are trying to do induction to
identify a function which can explain that data.
48
Basic Terminologies
Types of features
1. Categorical - It will have finite number of categories and
classes.
Example
– Gender - Male, Female
– Age group: (0-12) children, (13-19) teenagers, (20-30)
adults, 31-60 working professionals, above 60 senior
citizen,
– Blood group: A, B, AB, O etc

2. Integer Valued
Example: Number of words in a text

3. Continuous - Are those which can take INFINITE number of


values.
Example: Age, height, weight, price etc. 49
Basic Terminologies
• Feature: Distinct characters that can be used to describe each
object in a quantitative manner.
• Feature Vector: n-dimensional vector of numerical features
that represent some object.
• Feature Space:
– Suppose we have two features x1 and x2
– Two features will define two dimensional feature space
– In general n-features will define n-dimensional feature space.

• Instance Space X: Set of all possible objects that can be


described by features.
• Target Function: It is function we are trying to learn
• Training data set:
– Collection of examples observed by learning algorithms
– It is a used to discover potentially predictive relationship
50
Basic Terminologies
Feature Space:
Properties that describe the
3.0
2.0
1.0
0.0 problem

0.0 1.0 2.0 3.0 4.0 5.0


6.0
51
Basic Terminologies

Example:
<0.5,2.8,+>
3.0

+
+ + +
-
+ + - - -
2.0

+ - +
-
- + + - - -
1.0

-
+ + + - -
0.0

0.0 1.0 2.0 3.0 4.0 5.0


6.0
52
Basic Terminologies
Possible Functions
1. Slanted line with 2 parameters
y= mx +c
-> We need to define both intercept and slope.

2. Polynomial
quadratic function: ax² + bx +c
a,b,c- 3 parameters

3. Complex Function
Note:- We are interested in a function which not only fit the
training data but also works well with future or test data.

53
Basic Terminologies
Representation of Function
• When we talk of representation of these hypothesis (or
functions) then we have two things, one is features and the
other is the function class.

• Function/ model/ hypothesis all are same things.

54
Basic Terminologies
Representation

55
Basic Terminologies
Representation

56
Basic Terminologies

Hypothesis:
Function for labeling examples
3.0

Label: + + Label:
+ ? + + -
-
+ + - - -
2.0

+ ? - +
- ?
- + + - - -
1.0

-
+ + + ?
- -
0.0

0.0 1.0 2.0 3.0 4.0 5.0


6.0
57
Hypothesis Space
• There could be many possible functions that explain the
given training data as shown below:
• It is the set of legal hypothesis
• There could be multiple legal hypothesis or functions set of
all such legal hypothesis is called as hypothesis space.
• Eg: class 1: +ve, class 2: -ve

 Our objective is to come up with best hypothesis


 We denote the hypothesis space by H
 Output of learning algorithm will be h where h ∈ H

 One way to think about a supervised machine learning is as a


device that explores a "Hypothesis Space“.

58
Hypothesis Space

Hypothesis Space:
Set of legal hypotheses
3.0

+
+ + +
-
+ + - - -
2.0

+ - +
-
- + + - - -
1.0

-
+ + + - -
0.0

0.0 1.0 2.0 3.0 4.0 5.0


6.0
59
Hypothesis Space
Target Function
It's a function which maps every input x to an output y, we
denote it by f.

 Our objective is to come up with a hypothesis h ∈ H that


approximates “f” based on the training data.

Input and output of a learning algorithm


 Input - Training set, S
 Output- Hypothesis, h where h ∈ H.

60
Hypothesis Space
 If there are 2 Boolean input features then there are possible instances.
 If there are 3 Boolean input features then there are possible instances.
 If there are n Boolean input features then there are possible instances.

 If there are 2 Boolean input features then there are possible Boolean
functions
 If there are 3 Boolean input features then there are possible Boolean
functions.

 For n variables, how many Boolean functions are possible?


n variables = Boolean functions

 When there are no variables, there are two expressions.


False =0 True =1

61
Inductive Learning In General
 Inducing a general function from training examples.
 Constructs a hypothesis h to agree with all the training
examples
 A hypothesis is consistent if it agrees (works will) with all
training examples.
 A hypothesis is said to be generalized if it correctly predicts the
value of y for new examples.

Inductive learning hypothesis (Rule)


 Any hypothesis used to approximate the target function well
over a sufficiently large set of training examples will
approximate the target function well over other unobserved
examples-
 If h works well on sufficiently large set of training example
 Then it works well on observed data
62
Supervised Learning
Given: <x, f(x)> for some unknown function f
Learn: A hypothesis H, that approximates f

Example Applications:
• Disease diagnosis
x: Properties of patient (e.g., symptoms, lab test results)
f(x): Predict disease
• Automated steering
x: Bitmap picture of road in front of car
f(x): Degrees to turn the steering wheel
• Credit risk assessment
x: Customer credit history and proposed purchase
f(x): Approve purchase or not
63
64
Learning = Representation + Evaluation
+ Optimization
• Combinations of just three elements
Representation Evaluation Optimization
Instances Accuracy Greedy search
Hyperplanes Precision/Recall Branch & bound
Decision trees Squared error Gradient descent
Sets of rules Likelihood Quasi-Newton
Neural networks Posterior prob. Linear progr.
Graphical models Margin Quadratic progr.
Etc. Etc. Etc.

65
Inductive BIAS
 As we can see that hypothesis space is very large. It is not possible
to look at every hypothesis individually to choose the best
hypothesis.
 So we put some restrictions on hypothesis.
 If we restrict the hypothesis, it reflects a bias of the learning
algorithm

 Bias Could be of two types


1. Restricted bias: Limits the hypothesis space
2. Preference bias: Impose ordering on hypothesis space.

 Example of restriction bias: We may say that we are looking for a


linear function or we are looking for 3rd degree polynomial.
 Example of preference bias: We may say that we are considering all
possible polynomials but we will prefer a polynomial of lower
degree. 66
Generalization & Error
Coming up with a general function from training examples
 When we do generalization some errors get introduced.
 There are two components of generalization error
 Bias error
 Variance error

67
Generalization & Error
Bias
 This is the error introduced due to simplifying assumptions
made by a model
 Simplified assumptions limit the model's capacity to learn.

Low Bias
 Suggests less assumptions about the form of the target
function.

High Bias
 Suggests more assumptions about the form of the target
function.

68
Generalization & Error
Variance
 Variance tells that how much a random variable is different
from its expected value.
 If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then
variance occurs.

Low variance.
 Suggests small change to the estimated models with Changes
to the training dataset.

High variance.
 Suggests large changes, to the estimated models with Changes
to the training dataset.

69
Bias Variance Trade-off

Note: This is a regression problem.


Data is divided into train and test set
Not classes
Train

Test

Train Data
Test Data

70
Bias Variance Trade-off

Complex Model
No error on training data
What about test set?

Train Data Overfit Model

Test Data

71
Bias Variance Trade-off

Complex Model
No error on training data

What about test set?

Model performed well on train


Data but test error is high

Train Data Overfit Model

Test Data

72
Bias Variance Trade-off

Complex Model
No error on training data

What about test set?

Model performed well on train


Data but test error is high

Train Data Overfit Model

Test Data

73
Bias Variance Trade-off

Train error Low


High Variance
Test Error High

Error difference in train and test set is more


Hence we don’t want an Overfitting model
Overfit Model

Overfit High Variance


Train Data
Test Data

74
Bias Variance Trade-off

Simple Model
More error on training data
What about test set?

Train Data Underfit Model

Test Data

75
Bias Variance Trade-off

Simple Model
More error on training data
What about test set?

Model is neither performing


Well on train data nor on Test

Train Data Underfit Model

Test Data

76
Bias Variance Trade-off

Simple Model
More error on training data
What about test set?

Model is neither performing


Well on train data nor on Test

Train Data Underfit Model

Test Data

77
Bias Variance Trade-off

Train error High


High Bias
Test Error High

Error difference in train and test set is Less


Hence we don’t want an Underfit model

Underfit High Bias


Train Data
Test Data

78
Bias Variance Trade-off

Train Data Overfit Model Underfit Model


Test Data

79
Bias Variance Trade-off
High Low
Train Error
High Bias Low Bias
High Low
Test Error
High Variance Low Bias

Train Data
Test Data

80
Over-fitting
• Over-fitting & under-fitting are the two main errors/problems in
the machine learning model, which cause poor performance in
Machine Learning.
• Over-fitting occurs when the model fits more data than required,
and it tries to capture each and every data point fed to it.
Hence it starts capturing noise and inaccurate data from the
dataset, which degrades the performance of the model.
• An over-fitted model doesn't perform accurately with the
test/unseen dataset and can’t generalize well.
• An over-fitted model is said to have low bias and high variance.

81
How to avoid Overfitting

• Using cross-validation
• Using Regularization techniques
• Implementing Ensemble Techniques.
• Picking a less parameterized/complex model
• Training the model with sufficient data
• Removing features
• Early stopping the training

82
Under-fitting
• Model cannot create a mapping between the input and the target
variable
• Under-observing the features leads to a higher error in the
training and unseen data samples.
• Under-fitting becomes obvious when the model is too simple and
cannot create a relationship between the input and the output.

83
How to avoid Under-fitting

• Preprocessing the data to reduce noise in data


• More training to the model
• Increasing the number of features in the dataset
• Increasing the model complexity
• Increasing the training time of the model to get
better results.

84
85
Over-fitting
Over-fitting during training

Model
error Error on
new data

Training error

Number of iterations

86
Regularization and Over-fitting
Adding a regularizer:

Model
error Without regularizer
With regularizer

Number of iterations

87
Cross-Validation
• Cross-validation involves partitioning your data into
distinct training and test subsets.

• The test set should never be used to train the model.

• The test set is then used to evaluate the model after


training.

88
K-fold Cross-Validation
• To get more accurate estimates of performance you
can do this k times.
• Break the data into k equal-sized subsets Ai
• For each i in 1,…,k do:
– Train a model on all the other folds A1,…, Ai-1, Ai+1,…, Ak
– Test the model on Ai
• Compute the average performance of the k runs

89
5-fold Cross-Validation

90
Occam’s Razer

• Classical example of Bias


• It says the simplest consistent hypotheses about the target function is
actually the best.

91
Learning as a search

• Learning can be viewed as the task of searching through a large


space of hypothesis implicitly defined the hypothesis
representation.
• The goal of this search is to find the hypothesis that best fits the
training examples and generalize well to unseen data.

92
Supervised Learning
Classification Vs. Regression

93
Training Eg. or Input Features Output
Instance
X1 X2 X3 . . Xn Y

I1 a1 a2 a3 . . an Y1

I2 b1 b2 b3 . . bn Y2

I3 c1 c2 c3 . . cn Y3 Classificatio
n or
. Regression ?
. ?
.
Im P1 P2 P3 . . Pn Ym

Test Input Z1 Z2 Z3 . . Zn ??

Model has
to predict it.
94
Supervised Learning-
For each input x, the desired output y is given. Here y is the label.

Pre-classified training example-


Given: x, y pairs.
for any unseen value of x, determine the best label y.

Classification - When y is discrete


Eg: Masked or not
Red or Black
Cat, Dog, Horse

Regression - When y is continuous.


Eg: carpet area, location - predict the price of the house.

95
Classification
• Classification is a process of categorizing a given set of data into classes.
• It can be performed on both structured or unstructured data.
• The process starts with predicting the class of given data points. The classes are often
referred to as target, label or categories.

96
Classification Example:-
You are given with the collection of emails, determine the spam
or non-spam email from it.

Here the task is to learn the


function which can classify
and predict if the email is
spam or non-spam.
The line F2 is better than F1.

97
Regression
• A technique for determining the statistical relationship between two or more variables
where a change in a dependent variable is associated with, and depends on, a change
in one or more independent variables.

• A regression problem is used when the output variable is a real or continuous value,
such as "Salary" or "weight".

98
Regression Example –
• Sales of a product can be predicted by using the relationship
between sales volume and amount of advertising.
• The performance of an employee can be predicted by using
the relationship between performance and aptitude tests.
• The size of a child’s vocabulary can be predicted by using the
relationship between the vocabulary size, the child’s age and
the parents’ educational input.

99
Regression Example – Estimate the price of the house
from the given data.

100
Regression Example – Estimate the price of the house
from the given data.

101
Regression Example – Estimate the price of the house
from the given data.

What would be
the price of the
102
medium size
Regression Example – Estimate the price of the house
from the given data.

103
Regression Example – Estimate the price of the house
from the given data.

104
Regression Example – Estimate the price of the house
from the given data.

105
Regression Analysis

• Regression Analysis is used primarily to Model Causality an


provide prediction
• Predict the values of a dependent (response) variable based
on values of at least one independent (explanatory) variable.
• Explain the effect of the independent variables on the
dependent variable.

106
Dependent and Independent Variable
• Independent variables are considered as an input to a system
and may take on different values freely.
• Dependent variables are those values that change as a
consequence of changes in other values in the system.
• Independent variable is also called as predictor or
explanatory variable and is denoted by X
• Dependent variable is also called as response variable and is
denoted by Y.

107
Linear Regression
• The Simplest mathematical relationship between two
variables x and y is a linear relationship
• In a cause and effect relationship, the independent variable is
cause, and the dependent variable is the effect.
• Least squares linear regression is a method for predicting the
value of a dependent variable Y, based on the value of an
independent variable X.

108
Linear Regression
Height(cm) Weight(KG)

120 45.0
127 51.8
140 58.4
134 55.8
179 86.2
122 44.9
166 68.1
149 52.0
180 95.3
171 73.5
155 61.1
178 89.8

109
Linear Regression

110
The first order linear model

111
Slope & Intercept
Slope:
The slope of a line is the change in y for a one unit increase
in x.

Y-intercept:
It is the height at which the line crosses the vertical axis and
can be obtained by setting x = 0 in the below equation

y = mx + b

112
Error Variable

The inclusion of the random error term allows (x,y) to fall either
above the true regression line (When E > 0) or below the line
(When E < 0)
113
Basis Linear Regression Logistic Regression
Core concept The data is modeled using a It models the probability
straight line of a certain class or event
existing such as yes or no,
win or lose, sick or
healthy, and so on
Used For Continuous variable Categorical Variable
Output/ Value of the variable Probability of occurrence
Prediction of event
Evaluation Measured by loss, R squared, Accuracy, Precision,
Measures Adjusted R Squared. Recall, F1 Score, ROC
curve, Confusion Matrix,
etc.
114
Linear Regression using Least Square Method

115
Linear Regression using Least Square Method

116
Linear Regression using Least Square Method

117
Linear Regression using Least Square Method

118
Linear Regression using Least Square Method

Regression Line Equation is y = mx + b

𝒎=
∑ ( 𝒙 − 𝒙) ( 𝒚 − 𝒚 ) ❑

𝟐
∑ (𝒙−𝒙)
119
Linear Regression using Least Square Method

120
Linear Regression using Least Square Method

121
Linear Regression using Least Square Method

122
Linear Regression using Least Square Method

123
Linear Regression using Least Square Method

124
Linear Regression using Least Square Method

𝒎=
∑ ( 𝒙 − 𝒙) ( 𝒚 − 𝒚 ) ❑

=
𝟔
=𝟎 . 𝟔
∑ (𝒙−𝒙)
𝟐
𝟏𝟎

Find the value of b from is y = mx + b using mean values


b = 4 – 0.6*3 = 2.2
125
Linear Regression Performance Evaluation

126
R-Squared
• It is used to determine how
well the regression line fits the
data,
• It measures the proportion of
variance in the dependent
variable (y) that is explained
by the independent variables
(x) in a model.

127
R-Squared
• It ranges from 0 to 1, where a higher value indicates a better fit
of the model to the data.
• If R-Squared value is 1 means the model perfectly fits the data,
while an R-Squared value is 0 means the model explains none of
the variability in the data.
• will always be a positive number.

128
R-Squared

129
R-Squared

130
R-Squared

131
R-Squared
x y
1 2
2 4
3 5
4 4
5 5
Mean 4

Given,
Regression line equation = mx + b
m = 0.6, b = 2.2
132
R-Squared
x y
1 2 2.8 0.8 0.64 2 4
2 4 3.4 0.6 0.36 0 0
3 5 4 1 1 1 1
4 4 4.6 0.6 0.36 0 0
5 5 5.2 0.2 0.04 1 1
Mean 4

= 0.6

133
R-Squared

134
R-Squared

135
R-Squared

136
R-Squared

137
R-Squared

138
Standard Error of Estimate (SEE)
• The standard error of the
estimate is a measure of the
variability of the predicted
values around the true
regression line.
• We calculate the distance
between actual and the
estimated/predicted values
which is called as the Error.
• Therefore our task is to
minimize this error.

139
Standard Error of Estimate (SEE)

n observations are used to estimate


k+1 parameters, we have n-(k+1)
Degrees of freedom.

140
Standard Error of Estimate (SEE)

Given,
n = 5, k= 1
141
SEE Vs
SEE and R-squared are two different measures used in regression
analysis:
• SEE: It is a measure of the variability of the predicted values
around the true regression line. It provides an indication of the
accuracy of the prediction.
• R-squared: It is a statistic that measures the proportion of
variation in the dependent variable that can be explained by
the independent variables. It ranges from 0 to 1.
In summary, the SEE measures the accuracy of the predictions,
while R-squared measures the goodness of fit of the model to
the data.
142
SEE Vs

143
Types of Regression
Univariate and Multivariate regression are two types of regression
analysis used in statistics:
• Univariate Regression: It is a type of regression analysis that
involves only one independent variable and one dependent
variable.
• Multivariate Regression: It is a type of regression analysis that
involves multiple independent variables and one dependent
variable.

144
LR Exercise 1
Q. Study the relationship between the monthly sales and the
advertising costs surveyed for different stores as given below.
Find the equation of the straight line that fits the data best.
Determine the R-squared value.
Store Sales (units) Advertising Cost
1 368000 1700
2 340000 1500
3 665000 2800
4 954000 5000
5 331000 1300
6 556000 2200
7 376000 1300 145
LR Exercise 2 Car age in Price in
Q. Examine the relationship between years Lakhs
the age and price for used cars sold in 4 6.3
the last year by a car dealership 4 5.8
company. Find the equation of the
5 5.7
straight line that fits the data best.
5 4.5
Determine the R-squared value.
7 4.5
7 4.2
8 4.1
9 3.1
10 2.1
11 2.5
12 2.2
146
Other Evaluation Measures
Several evaluation measures are commonly used to assess the
performance of a linear regression model.

147
Error Calculation in Linear Regression
1. Mean Absolute Error (MAE):
It is the simplest regression error metric to understand, it can
be calculated as below:

Linear Regression fits a line to the data by finding the regression


coefficient that results in the smallest MAE.

148
Error Calculation in Linear Regression
2. Mean Square Error (MSE):
The mean square error (MSE) is similar to the MAE, but squares the
difference before summing them all instead of using the absolute value.

• MSE will always be bigger than the MAE.


• MSE is one of the most widely used metrics for regression problems.
• The effect of the square term in the MSE equation is most apparent with the presence of
outliers in our data.
• Each residual in MAE contributes proportionally to the total error, while the error
grows quadratically in MSE.
• Outliers in data will contribute to much higher total error in the MSE than the MAE.

149
Error Calculation in Linear Regression

1. Residual Sum Of Squares (RSS)


2. Root Mean Squared Error (RMSE)
• RMSE is the square root of the MSE and is often preferred
because it is in the same unit as the dependent variable.

3. Mean Absolute percentage Error (MAPE)


4. Mean Percentage Error

150
Gradient Descent in LR

The linear relationship between


2 variables can be represented as
a straight line y = mx + b
“y” is target
“x” is feature
m & b are model parameters

151
Gradient Descent in LR

A general linear equation with multiple features can be written as:

xi are features
b, mi are model parameters or coefficient
y is target variable

152
Gradient Descent in LR
Depending on the values of m and b, multiple possible lines can be
possible.

We need to find out the value of parameters b and m corresponding to


which straight line fit best to the data.
153
Gradient Descent in LR

• Error

• For i samples we will have

• can be +ve or –ve,


therefore take square of it.

154
The Cost Function of Linear Regression
• Cost function measures how a machine learning model performs.
• Cost function is the calculation of the error between predicted
values and actual values, represented as a single real number.
• Cost function of a linear regression is mean square error.
• Cost Function
After expanding = b + mx in above equation we get

• Therefore the best fitting model would be the one which


minimizes the value of cost function.

155
How to minimize the Cost Function?
• We have established the fact that all the straight lines are just
different combination of model parameters b & m.
• Cost Function is the function of parameters b & m.
• Therefore by changing the values of b & m we can change the
cost function.
• We will keep changing the values of b & m till we find a
combination where cost function is minimized.
• To find the best combination we use Gradient Descent
Algorithm.

156
Gradient Descent in LR

157
Gradient Descent in LR

158
Gradient Descent in LR

159
Gradient Descent in LR

160
Gradient Descent in LR

161
Gradient Descent in LR

162
Gradient Descent in LR

• These iteration of Gradient Descent algorithm can run multiple


times depending on nature of function and 'α' (learning rate) and
of course where we start from, 'a1' in this case.

• Same methodology can be used to minimize cost function (J)


which is a function of model parameters b & m by changing them
through iterations of Gradient Descent Algorithm.

163
Gradient Descent in LR

Steps
1. Calculate slope at the current value of parameter b & m separately
2. Take step α and update new parameters.
3. Calculate the cost function J with new (b & m) values .

Repeat it multiple times.


What happen if take the larger value of α??

164
Multivariate Linear Regression

165
Multivariate Linear Regression

• Multivariate linear regression is an extension of simple linear


regression, which models the relationship between a dependent
variable and multiple independent variables.
• In multivariate linear regression, there are multiple independent
variables.

166
Multivariate Linear Regression

167
Multivariate Linear Regression

• The general form of a multivariate linear regression model with k


independent variables is given by:

Here,

168
How to use Multivariate Regression Analysis?
The processes involved in multivariate regression analysis include
the selection of features, engineering the features, feature
normalization, selection loss functions, hypothesis analysis, and
creating a regression model.

1. Selection of features:
It is the most important step in multivariate regression. Also known
as variable selection, this process involves selecting viable variables
to build efficient models.

169
Feature Elimination

170
How to use Multivariate Regression Analysis?
2. Feature Normalization: This involves feature scaling to maintain
streamlined distribution and data ratios. This helps in better data analysis.
The value of all the features can be changed according to the requirement.

3. Selecting Loss function and hypothesis: The loss function is used for
predicting errors. The loss function comes into play when the hypothesis
prediction changes from the actual figures. Here, the hypothesis represents
the value predicted from the feature or variable.

4. Fixing hypothesis parameter: The parameter of the hypothesis is fixed


or set in such a way that it minimizes the loss function and enhances better
prediction.

171
How to use Multivariate Regression Analysis?
5. Reducing the loss function: The loss function is minimized by
generating an algorithm specifically for loss minimization on the
dataset which in turn facilitates the alteration of hypothesis parameters.
Gradient descent is the most commonly used algorithm for loss
minimization.

6. Analyzing the hypothesis function: The function of the hypothesis


needs to be analyzed as it is crucial for predicting the values. After the
function is analyzed, it is then tested on test data.

172
Assumptions in the Multivariate Regression Model
• The dependent and the independent variables have a linear
relationship.
• The independent variables do not have a strong correlation among
themselves.
• The observations of yi​are chosen randomly and individually from
the population.

173
Advantages of Multivariate Regression
• Multivariate regression helps us to study the relationships among
multiple variables in the dataset.
• The correlation between dependent and independent variables
helps in predicting the outcome.
• It is one of the most convenient and popular algorithms used in
machine learning.

174
Disadvantages of Multivariate Regression
• The complexity of multivariate techniques requires complex
mathematical calculations.
• It is not easy to interpret the output of the multivariate regression
model since there are inconsistencies in the loss and error outputs.
• Multivariate regression models cannot be applied to smaller
datasets; they are designed for producing accurate outputs when it
comes to larger datasets.

175

You might also like