0% found this document useful (0 votes)
7 views19 pages

Lec 6

Uploaded by

vucarot2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views19 pages

Lec 6

Uploaded by

vucarot2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

L INEAR R EGRESSION

Nehal Khosla, Priyanshu Parida

National Institute of Science Education and Research

January 30, 2023


PART I: L INEAR R EGRESSION : P ROBLEM & S OLUTION

1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Linear Regression: The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Mathematical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Loss Function: Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Linear Regression: The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


3.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Quality of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1 / 18
PART II: L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

1 Inductive Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1 A List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Choice of Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 / 18
PART III: L INEAR R EGRESSION : A PPLICATIONS AND S HORTCOMINGS

1 Applications and Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 / 18
Part I

L INEAR R EGRESSION : T HE P ROBLEM

4 / 18
R EGRESSION

▶ Regression is a statistical method that attempts to predict the strength and nature of the
relationship between a dependent variable, and one or a series of independent variables.
▶ It does so by finding a curve that minimizes the error in the actual and expected values of the
dependent variable of training data-set.
▶ For proper interpretation of regression, several assumptions (inductive bias) about the data and
the model must hold.
▶ Linear regression is one of the common forms of this method. It establishes a linear relationship
between the two variables.

5 / 18
L INEAR R EGRESSION
I NTRODUCTION

▶ Linear regression is a supervised machine learning algorithm.


▶ The model tries to find a best-fit line to establish a linear relationship between the dependent
(y) and independent(x) variable.
▶ The model then uses this fit to predict the appropriate y-values for unknown x-values.
▶ The best-fit line is achieved minimizing the error between predicted and actual values. This is
done by minimizing the loss function.

6 / 18
L INEAR R EGRESSION
R EGRESSION L INE

The line showing the linear relationship between the dependent and independent variable is called
a regression line. An example of a regression line is shown below:

Figure. Regression line for log R (dependent variable)


vs log d (independent variable)

The regression line may be positive, wherein the dependent variable increases with increase in
independent variable; or it may be negative, wherein the dependent variable decreases with
increase in independent variable (as in above figure).

7 / 18
L INEAR R EGRESSION : T HE P ROBLEM
T YPES

Linear regression may be classified further into the following two types:
▶ Simple Linear Regression: Assumes a linear relationship between a single independent
variable and a dependent variable.
▶ Multiple Linear Regression: Assumes a linear relationship between two or more independent
variables and a dependent variable.

8 / 18
L INEAR R EGRESSION : T HE P ROBLEM
M ATHEMATICAL R EPRESENTATION

Once a linear relationship has been determined by the algorithm, the general form of each model
may be represented as follows:
▶ Simple Linear Regression
y = ax + b + u
▶ Multiple Linear Regression
Y = a1 x + a2 x +a3 x + b + u
where:
y = Dependent variable
x = Independent variable
a = Slope(s) of the variable(s)
b = The y-intercept
u = The regression residual/error term

9 / 18
L INEAR R EGRESSION : T HE P ROBLEM
L OSS F UNCTION : M EAN S QUARED E RROR

▶ The regression line is achieved by minimizing the sum of mean squared error (loss function) for
all points in the domain. The loss function is gives as:

1
MSE = Σ(y − f (x))2
N
where f(x) = a1 x + a2 x .... +b

10 / 18
L INEAR R EGRESSION : T HE S OLUTION
S OLUTION

The best fit line may be found in the following two manners:
▶ Closed form (Exact form) Solution:
• It solves the problem in terms of simple functions and mathematical operators.
• The closed form solution for linear regression is as follows:

B = (X′ X)−1 X′ Y

where B = Matrix of regression parameters


X = Matrix of X values
X’ = Transpose of X
Y = Matrix of Y values
• Although this method gives an accurate model, it is computationally expensive, especially
when there are more than 4 dimensions.
▶ Gradient Descent:
• It is used to minimize MSE by calculating the gradient of the loss function.
• It is an iterative optimization algorithm.

11 / 18
L INEAR R EGRESSION : T HE S OLUTION
Q UALITY OF F IT

▶ The goodness of the fit achieved determines how linearly the variables are correlated.
▶ The goodness of fit may be calculated using the Pearson correlation coefficient, which is given
by:
Σ(x1 − x)(yi − y)
r= p
Σ(xi − x)2 Σ(yi − y)2
▶ The higher the value of r, the better is the fit.

12 / 18
Part II

L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

13 / 18
I NDUCTIVE B IAS
A L IST

Linear regression takes the following assumptions, or inductive biases:


▶ The assumption that the dependent and independent variables are linearly related.
▶ Homoscedasticity: The assumption that the error term should be the same for all points.
▶ The assumption that MSE is the most appropriate loss function for linear regression.

14 / 18
C HOICE OF L OSS F UNCTION

Let us analyse some loss functions to justify the choice of MSE as an appropriate loss function.
▶ L1 = (y-f(x)): This loss function gives out both positive and negative values, which cancel out to
give near zero error for large data.
▶ L2 = |(y-f(x))|: Although errors do not cancel out here, the outliers are penalised equally as
compared to standard data.
▶ L3 = (y-f(x))2 : In this case, the errors do not cancel out. Also, outliers are penalised more, giving
a more appropriate regression line.
Hence, MSE is an appropriate choice for loss function.

15 / 18
Part III

L INEAR R EGRESSION : A PPLICATIONS AND


S HORTCOMINGS

16 / 18
A PPLICATIONS AND S HORTCOMINGS

▶ Linear regression finds its applications in several fields, like market analysis, financial analysis,
environmental health, and medicine.
▶ However, it does leave somethings to desire for. A linear correlation does not indicate
causation, i.e. a connection between two variable does not imply that one causes the other.
▶ Linear regression is prone to noise and overfitting.
▶ It is prone to multicollinearity, i.e. occurence of correlation between two or more independent
variables. This reduces the statistical significance of an independent variable.

17 / 18
R EFERENCES

1. CS460 Machine Learning 2023 Lectures, Subhankar Mishra.


2. Linear Regression in Machine Learning, Javatpoint.
3. ML|Linear Regression, geeksforgeeks.

18 / 18

You might also like