0% found this document useful (0 votes)
19 views28 pages

DMJAP LinearRegression 3

The document provides an overview of linear regression, including its definition, historical context, and mathematical derivation for both single and multiple variable regression. It discusses the regression model for prediction, goodness of fit, and includes numerical examples to illustrate the concepts. The document also covers the calculation of regression coefficients and R-squared values to assess model accuracy.

Uploaded by

ptem336
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

DMJAP LinearRegression 3

The document provides an overview of linear regression, including its definition, historical context, and mathematical derivation for both single and multiple variable regression. It discusses the regression model for prediction, goodness of fit, and includes numerical examples to illustrate the concepts. The document also covers the calculation of regression coefficients and R-squared values to assess model accuracy.

Uploaded by

ptem336
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Linear Regression

DR . MUHAMMED JAMSHED ALAM PATWARY


POSTDOCTORAL FELLOW, NOTTINGHAM TRENT UNIVERSITY, ENGL AND
&
ASSOCIATE PROFESSOR,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING,
INTERNATIONAL ISL AMIC UNIVERSITY CHITTAGONG, BANGL ADESH
WEB: HTTPS://SITES.GOOGLE.COM/VIEW/DRJAMSHED
Topics
• Introduction to the concept of Regression.
• Manually assumption of regression.
• Mathematical analysis of single variable regression with a
numerical example.
• Regression goodness of fit ( R2 value )
• Mathematical analysis of multiple variable regression with a
numerical example.
What is Regression?
The probable movement of one variable in terms of the other
variables is called regression.
In other words the statistical technique by which we can estimate
the unknown value of one variable (dependent) from the known
value of another variable is called regression.
Example: The productions of paddy of amount y is dependent on
rainfall of amount x. Here x is independent variable and y is
dependent variable.
Historical Note
• The term “regression” was used by a famous Biometrician Sir. F.
Galton (1822-1911) in 1877.
• To explain the relationship between the heights of fathers and
their sons.
Linear Regression with single variable
Regression analysis is used to predict the value of one variable ( the
dependent variable) on the basis of other variables (the independent
variables).
The linear model will be :
Y = mx + c ;
where,
Y = Dependent Variable
m = Regression slope
x = Independent Variable
c = Intercept Value
Derivation of equation
The equation of the line of dependence of y on variable x is:
y = a + bx
Here y is the Dependent variable and x is the Independent variable. And a and b are two
constants. "a" is the distance from the point where the line crosses the y-axis to the point of
origin. "b" is the slope of the line. This is called the coefficient of regression of y on the
variable.
This indicates the change in the y variable for every single change in the x variable.
Constants a and b are two unknown parameters. Once their value is determined, the
dependence line will be determined. The equation of the dependence line is determined
Y = a + bx ……………………… (1)
Below is a diagram of the dependency line.
Different dependence lines are available for different values of a and b. So, the
values of a and b are calculated based on the observed values of the two variables
using the least square method.

Suppose (x1, y1), (x2, y2), ... (xn, yn) are the observed values ​of the n pairs of variables
x and y. The principle of the least square method is to calculate the values ​of
constant a and b in such a way that the sum of the squares of the deviation of the
observed value y = a + bx + e from Y is the minimum. That is, the values of a and b
have to be determined in such a way that
S = ei2=(yi - Yi)2 = )2 …………………(2)
Here yi and Yi are the i-th observed and determined values of the variable y,
respectively. That is
yi = a+ bxi+ ei and Yi = a + bxi
Here the observed values ​of the variables x and y are specified. So different
values ​of S are found for different values ​of a and b. That is, s is a function of a
and b.
Now, according to Calculus's principle of largest and smallest values, the value
of S will be minimal only when the value of the partial differentiation of S,
relative to a and b, is zero.
That means,
= 0 and = 0
We get,
=(yi-a-bxi)2 = 0
yi-a-bxi)(-1) = 0
yi-a-bxi) = 0
yi = na + bxi …………………(3)
Now divide the equation number 3 by n on both sides
= + …………………(5)
=a+b
∴ a= -b
Again,
=[(yi-a-bxi)2] = 0
(yi-a-bxi)(-xi) = 0
(yixi-axi-bxi2) = 0
yi xi = axi + bxi2…………………(4)

Here, and are the mean of x and y respectively.


Putting the value of a = - b in equation (4),
yixi = ( - b )xi + bxi2

∴b=
Here , a & b are regression coefficients.
Regression Model For Prediction (Single Variable)
Suppose you have data of population (in hundred thousand) of a medium size city
over 20 years (based on every 5-year census) as shown table 1. You want to predict
the population in the year 2005.
Table 1 data for regression analysis

Let us plot the data above into a graph. One point in the graph represents data of
one year. Since we have 5 data, thus we have five points.
Regression Model For Prediction
We have several proposals here. I will plot three seems best proposal:
• Blue line (slash dot line)
• Red line (solid line)
• Green line (small dot line)
Regression Model For Prediction
The diagram below is showing how we measure the error. When the
point data is above the line model, we say that the error is positive, while
if the line model is above the data, we say the error is negative.

Square error = (Yi - )2


Yi = observe value
= calculated value

It does not work because some error is positive and some of the error is
negative. The sum of error may be zero. If we sum all the error, we may
get many lines.
When we square the error, regardless it is positive or negative, the
number become positive.
Sum Square Error
Year ( y ) Population Blue Line Red Line( sq Green
(X) (sq error) error) Line( sq
error)

1980 2.1 2.10(0.00) 2.08(0.00) 1.66(0.19)

1985 2.9 2.92(0.0004) 2.76(0.02) 2.60(0.09)


1990 3.2 3.70(0.25) 3.44(0.06) 3.54(0.12)
1995 4.1 4.50(0.16) 4.12(0.00) 4.48(0.14)
2000 4.9 5.30(0.16) 4.80(0.01) 5.42(0.27)
Ʃ(SSE) = 0.57 0.09 0.81

We may obtain that the red line give the minimum sum of square error (=0.09)
among the three proposals.
Numerical Example
The best line model can be computed using formula of linear
regression.
Y= mx +c ;
where Y is the dependent variable (that's the variable that goes on
the Y axis), X is the independent variable (i.e. it is plotted on the X
axis),m is the slope of the line and c is the y-intercept.
Regression Numerical Example
Suppose we have the following 5 data points and we want to predict the
population data for the year 2005 using linear regression model.

In this section, we will use hand calculation using linear regression


formula.
Regression Numerical Example
Year ( x) Population (y) xy x2
1980 2.1 4158 3920400
1985 2.9 5756.5 3940225
1990 3.2 6368 3960100
1995 4.1 8179.5 3980025
2000 4.9 9800 4000000
Ʃ(x) = 9950 Ʃ(y) = 17.2 Ʃ(xy) = 34262 Ʃ(x2) = 19800750

Here , = 1990 and = 3.44 Total ,n =5


observant , n = 5

[ y = mx + c ]
Regression Numerical Example
Using this regression line, we can predict the number of Population in
the city for year 2005.
Population = 0.136 * year – 267.2 = ( 0.136*2005) – 267.2 = 5.48
So , 5.48 Million is the prediction population for the year 2005.
Regression Goodness Of Fit
Suppose you have a regression formula y = mx+c; as the best line model.
How fit is the data to our model?
There are unlimited numbers of model combination aside from linear model. Our
data may be represented by curvilinear or non- linear model.
Most common indices are:
• R-squared, or coefficient of determination
• Adjusted R-squared
• Standard Error
• F statistics
• t statistics
Regression Goodness Of Fit (R2)
It is the ratio of sum of square error (SSE) from the regression model and
the sum of squares difference around the mean (SST = sum of square
total)

Where yi = observant y and = calculated y from equation , = mean of y


R2 Calculation
Yi = 0.136* xi -267.2 Yi - (Yi - )2 Yi - (Yi - )2

2.1 (0.136* 1980) – 267.2 = 2.08 0.02 0.0004 -1.34 1.79

2.9 (0.136* 1985) – 267.2 = 2.76 0.14 0.0196 -0.54 0.29

3.2 (0.136* 1990) – 267.2 = 3.44 -0.24 0.0576 -0.24 0.0576

4.1 (0.136* 1995) – 267.2 = 4.12 -0.02 0.0004 0.66 0.435

4.9 (0.136* 2000) – 267.2 = 4.8 0.1 0.01 1.46 2.13

Ʃ (Yi - )2= Ʃ (Yi - )2=


0.088 4.71
( SSE ) ( SST )

Now , putting the values of SSE and SST on R2 equation ,


R2 = 1 – = 1- = 0.981 ( 98%)
Linear Regression With Multiple Variable
It measure that attempts to determine the strength of the relationship
between one dependent variable and a series of other changing variable
known as independent variable.
Y = a + b1x1 + b2x2 + b3x3+…bnxn ;
Where , a = intercept = b2
x1, x2 .. xn = multiple variable
b1 , b2 ..bn = correspondence slope
Numerical Example
A given example where the quantity demand depends on the price and
income of the product.
Quantity Demand Price (X1) Income (X2)
(Y)
100 5 1000
75 7 600
80 6 1200
70 6 500
50 8 300
65 7 400
90 5 1300
100 4 1100
110 3 1300
60 9 300
? 6 600
Calculation

Here,
Y= Qd.

X1=Price.

X2=Income.

y=Y-

x1=X1-

x2=X2-
Calculation
Now,
= = -7.18

b1 = - 7.18 ;
b2 = 0.014 ;
& a = 111.8 ;
So estimated regression line will be –
Y = 111.8 – 7.18 * x1+ 0.014* x2
where , x1 = price , x2 = income and Y =Quantity Demand.

We want to predict the quantity demand for a product price 6 with income
600 using linear regression model.
QD = 111.8 – (7.18 * 6) + (0.014 * 600) = 77.12
R2 Calculation
Calculating R2 is the same for both simple and multiple regression.
References
• (4) Linear Regression and Linear Models – YouTube
• Microsoft Excel Tutorials: Regression (revoledu.com)
• (4) What is Multiple Regression | numerical explanation AND int
erpretation of Multiple regression – YouTube
Thank You.

You might also like