DMJAP LinearRegression 3
DMJAP LinearRegression 3
Suppose (x1, y1), (x2, y2), ... (xn, yn) are the observed values of the n pairs of variables
x and y. The principle of the least square method is to calculate the values of
constant a and b in such a way that the sum of the squares of the deviation of the
observed value y = a + bx + e from Y is the minimum. That is, the values of a and b
have to be determined in such a way that
S = ei2=(yi - Yi)2 = )2 …………………(2)
Here yi and Yi are the i-th observed and determined values of the variable y,
respectively. That is
yi = a+ bxi+ ei and Yi = a + bxi
Here the observed values of the variables x and y are specified. So different
values of S are found for different values of a and b. That is, s is a function of a
and b.
Now, according to Calculus's principle of largest and smallest values, the value
of S will be minimal only when the value of the partial differentiation of S,
relative to a and b, is zero.
That means,
= 0 and = 0
We get,
=(yi-a-bxi)2 = 0
yi-a-bxi)(-1) = 0
yi-a-bxi) = 0
yi = na + bxi …………………(3)
Now divide the equation number 3 by n on both sides
= + …………………(5)
=a+b
∴ a= -b
Again,
=[(yi-a-bxi)2] = 0
(yi-a-bxi)(-xi) = 0
(yixi-axi-bxi2) = 0
yi xi = axi + bxi2…………………(4)
∴b=
Here , a & b are regression coefficients.
Regression Model For Prediction (Single Variable)
Suppose you have data of population (in hundred thousand) of a medium size city
over 20 years (based on every 5-year census) as shown table 1. You want to predict
the population in the year 2005.
Table 1 data for regression analysis
Let us plot the data above into a graph. One point in the graph represents data of
one year. Since we have 5 data, thus we have five points.
Regression Model For Prediction
We have several proposals here. I will plot three seems best proposal:
• Blue line (slash dot line)
• Red line (solid line)
• Green line (small dot line)
Regression Model For Prediction
The diagram below is showing how we measure the error. When the
point data is above the line model, we say that the error is positive, while
if the line model is above the data, we say the error is negative.
It does not work because some error is positive and some of the error is
negative. The sum of error may be zero. If we sum all the error, we may
get many lines.
When we square the error, regardless it is positive or negative, the
number become positive.
Sum Square Error
Year ( y ) Population Blue Line Red Line( sq Green
(X) (sq error) error) Line( sq
error)
We may obtain that the red line give the minimum sum of square error (=0.09)
among the three proposals.
Numerical Example
The best line model can be computed using formula of linear
regression.
Y= mx +c ;
where Y is the dependent variable (that's the variable that goes on
the Y axis), X is the independent variable (i.e. it is plotted on the X
axis),m is the slope of the line and c is the y-intercept.
Regression Numerical Example
Suppose we have the following 5 data points and we want to predict the
population data for the year 2005 using linear regression model.
[ y = mx + c ]
Regression Numerical Example
Using this regression line, we can predict the number of Population in
the city for year 2005.
Population = 0.136 * year – 267.2 = ( 0.136*2005) – 267.2 = 5.48
So , 5.48 Million is the prediction population for the year 2005.
Regression Goodness Of Fit
Suppose you have a regression formula y = mx+c; as the best line model.
How fit is the data to our model?
There are unlimited numbers of model combination aside from linear model. Our
data may be represented by curvilinear or non- linear model.
Most common indices are:
• R-squared, or coefficient of determination
• Adjusted R-squared
• Standard Error
• F statistics
• t statistics
Regression Goodness Of Fit (R2)
It is the ratio of sum of square error (SSE) from the regression model and
the sum of squares difference around the mean (SST = sum of square
total)
Here,
Y= Qd.
X1=Price.
X2=Income.
y=Y-
x1=X1-
x2=X2-
Calculation
Now,
= = -7.18
b1 = - 7.18 ;
b2 = 0.014 ;
& a = 111.8 ;
So estimated regression line will be –
Y = 111.8 – 7.18 * x1+ 0.014* x2
where , x1 = price , x2 = income and Y =Quantity Demand.
We want to predict the quantity demand for a product price 6 with income
600 using linear regression model.
QD = 111.8 – (7.18 * 6) + (0.014 * 600) = 77.12
R2 Calculation
Calculating R2 is the same for both simple and multiple regression.
References
• (4) Linear Regression and Linear Models – YouTube
• Microsoft Excel Tutorials: Regression (revoledu.com)
• (4) What is Multiple Regression | numerical explanation AND int
erpretation of Multiple regression – YouTube
Thank You.