This document covers the fundamentals of Simple Linear Regression, including key concepts such as Population Regression Function, Stochastic Error Term, and the Method of Ordinary Least Squares (OLS). It outlines the assumptions of the Classical Linear Regression Model, the process of estimating model parameters, and the significance testing of regression coefficients. Additionally, it discusses the variances and standard errors of OLS estimators and provides insights into forecasting and the interpretation of regression results.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
39 views
Chapter 2 Simple Linear Regression
This document covers the fundamentals of Simple Linear Regression, including key concepts such as Population Regression Function, Stochastic Error Term, and the Method of Ordinary Least Squares (OLS). It outlines the assumptions of the Classical Linear Regression Model, the process of estimating model parameters, and the significance testing of regression coefficients. Additionally, it discusses the variances and standard errors of OLS estimators and provides insights into forecasting and the interpretation of regression results.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 31
Chapter e
Simple Linear Regression
Model : Two Variable Case
After learning this chapter you will understand :
Population Regression Function.
Stochastic Error Term.
Sample Regression Function.
Method of Ordinary Least Squares.
Assumptions of CLRM.
Properties of OLS Estimators.
Gauss-Markoy Theorem.
Hypothesis Testing of OLS Estimators.
Coefficient of Determination.
Normality Tests :
v Normal Probability Plot,
v Jarque-Bera Test.
Forcasting.
Scaling and units of measurement.
VVVVVVVVVV
For Full Course Video Lectures of
All Subjects of Eco. (Hons), B Com (H), BBE, MA Economics
Register yourself at
www.primeacademy.in
Dheeraj Suri Classes
Prime Academy
9899192027Prime Academy, www.primeacademy.in
Basic Concepts
1. Regression : Regression means returning or stepping back to average or normal. It
was first used by Sir francis Galton. Regression analysis, in general sense, means the
estimation or prediction of the unknown value of one variable from the known value
of other variable. In the words of M. M. Blair, “Regression analysis is a mathematical
measure of the average relationship between two or more variables in terms of the
original units of the data”. It is one of the very important statistical tools which is
extensively used in business and economics to study the relationship between two or
more variables that are related casually and for estimation of demand and supply
curves, cost functions, production and consumption functions, etc.
Linear Regression : Since in this text we are only concerned with linear
regression models, so it is essential to know the meaning of linearity. The term
linearity can be interpreted in two different ways as under :
(i) Linearity in the Variables : If all the variables involved in the regression
equation have degree one then such regression equation is said to linear in
variables.
wR
Gi) Linearity in the Parameters : If the conditional expectation of Y, E(Y|X;)
is a linear function of parameters, the B’s, then it is linear in parameters.
Here it may or may not be linear in variables.
In our analysis by linearity we mean linear in parameters only. The regression
model may or may not be linear in variables, the X’s, but it is essentially linear in
parameters, the B's.
w
: Regression equations are used to estimate the value of
one variable on the basis of the value of other variable. If we have two variables X
and Y the two regression for them are X on Y and Y on X.
4. Regression Equation of Y on X : The line of regression of Y on X is the line
which gives the best estimate of Y for any given value of X.
5. Regression Equation of X on Y : The line of regression of X on Y is the line
which gives the best estimate of X for any given value of Y.
Note : Generally we study only regression equation of Y on X, where Y is the
dependent or explained variable and X is independent or explanatory
variable.
6. A Linear Probabilistic Model : As a first approximation we may assume that the
Population Regression Function (PRF) is a linear function, i.c., it is of the type :
E(Y/Xi) = Bi + 2X,
Where,
E(Y/X;) means the mean or expected value of Y corresponding to or conditional
upon a given value of X, here by linearity we mean linear in parameters.
B1 and > are unknown but fixed parameters known as regression coefficients. Br is
called intercept term and B2 is called slope term.
Econometrics 2. By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
7. — Stochastic Specification of PRF : E(Y/X;) = i+ 2X: form of the regression
function implies that the relationship between X and Y is exact, that is all the
variations in Y are solely due to changes in X and there are no other factors
affecting the dependent variable. If this were true then all the points of X and Y
pairs, if plotted on a two dimensional plane, would fall on a straight line. However
if we if we gather observations on the actual data and plot them on a diagram, we
see they do not fall on a straight line (or any other smooth curve for that matter).
The deviations of the observations from the line may be attributed to several
factors.
In Statistical analysis, however, one generally acknowledges the fact that the
relationship is not exact by explicitly including a random factor, known as
disturbance term in the linear regression model. So the linear regression model
becomes :
Ys= Bi + PoXi + us
Where u, is known as the stochastic or random or residual or error term.
8. Sample Regression Function : Suppose that we have taken a sample of few
values of Y corresponding to given X;. The line which is drawn to fit the data is
called sample regression line. Mathematically, sample regression line can be
expressed as:
Y= B+ BX, or Y=b +b,X,
Where,
Y, is the estimator of E(Y/X;) or the estimator of population conditional mean,
A, or by is the estimator of By, and
By or by is the estimator of >.
9. Stochastic Specification of SRF : The stochastic sample regression function can
be expressed as:
Y=A+BX, +e or Y=¥ te,
Where, ¢; is the estimator of population error term uj.
10. Estimating Model Parameters : Our basic objective in regression analysis is to
estimate the stochastic PRF
Yi= Bit BX + ui
On the basis of the SRF
¥=B,+BX; +e,
Because, generally our assumption is based on a single sample from some population.
But due to sampling variations our estimate of PRF based on SRF is only
approximate. To estimate the regression equation we use the method of least squares.
il. Assumptions of Classical Linear Regression Model : In order to use the method
of ordinary least squares (OLS), the following basic assumptions must hold for a
two variable regression model :
Econometrics 23 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
‘Assumption 1: Linear regression model. The regression model
Y, = B, + £X; + u; is linear in the parameters. This interpretation of linearity is
that the conditional expectation of Y, E(Y | Xi), is a linear function of the
parameters, the fs; it may or may not be linear in the variable X. In this
interpretation E(Y | X,) = A + BoX?, is a linear (in the parameter) regression model.
To sce this, let us suppose X takes the value 3. Therefore, E(Y | X = 3) = fi + fs,
which is obviously linear in fy and >.
Note : A function is said to be linear in the parameter, say, 1, if f; appears with a
power of I only and is not multiplied or divided by any other parameter (for
example, BiPx, B2/Br, and so on).
Assumption 2: X values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is
assumed to be nonstochastic.
Assumption 3: Zero mean value of disturbance u;. Given the value of X, the
mean, or expected, value of the random disturbance term u; is zero. Technically,
the conditional mean value of u; is zero. Symbolically, we have E(u; |X;) = 0. A
violation of this assumption introduces bias in the intercept of the regression
equation.
‘Assumption 4: Homoscedasticity or equal variance of u:. Given the value of X,
the variance of u; is the same for all observations. That is, the conditional variances
of ui are identical. Symbolically, we have var(u; |Xi) = Elui— E(u: |Xi)? = E(u? | Xd)
=e
where var stands for variance.
Assumption 5: No autocorrelation between the disturbances. Given any two X
values, X; and X; (i # j), the correlation between any two uj and uj (i # j) is zero.
Symbolically,
cov (ui, uj [Xi, Xj) = E{ fui - E(u] | Xi } {aj - EQa)] |X} }
E(u |Xi)(uj | Xj)
0
where i and j are two different observations and where cov means covariance.
Assumption 6: Zero covariance between ui and Xi, or E(uiXi) = 0. Formally,
coy (u;, Xi) = Elu; - E(u) |EXi - ECX))]
E[u; (X; - E(X)] since E(u) =
E(uXi) - E(X,)E(u) since EX) is nonstochastic
= E(uiX) since E(u) = 0
= 0 by assumption
Assumption 7: The number of observations n must be greater than the
number of parameters to be estimated. Alternatively, the number of
observations n must be greater than the number of explanatory variables.
Assumption 8: Variability in X values. The X values in a given sample must not
all be the same. Technically, var (X) must be a finite positive number.
Assumption 9: The regression model is correctly specified. Alternatively, there
is no specification bias or error in the model used in empirical analysis.
Econometrics 24 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Assumption 10: There is no perfect multicollinearity. That is, there are no
perfect linear relationships among the explanatory variables.
12. Method of Least Squares : The method of Ordinary Least Squares is attributed to
a German Mathematician Carl Friedrich Gauss. This method is based on the
assumption that the sum of the squares of differences between the estimated
values and the actual observed values of the observations is minimum and the
variables X and Y are related according to the simple linear regression model. The
values of Pi, Bz and o? will almost never be known to an investigator. Instead,
sample data consisting of n observed pairs (X;, Yi), (X2, Ya), ..., (Xn, Yn) will be
available, from which the model parameters and the true regression line itself can
be estimated. The least squares method is used to obtain the best fitting straight
line to the given data and consists in selecting the best fitting straight line to the
values of independent variable for dependent variable. Using this method the
regression equation may be found as under :
Regression Equation of Y on X :
Y=h,+2,X
Normal Equations
YY = nf. +h X and DX =A X+ 8, xX?
where f, and f, are constants and their values can be obtained by solving _ the
normal equations.
The least square estimate of the slope coefficient /, of true regression line is
DIX, - HH -Y))_ aX -T XD,
De - Xx] n> x? -(Dx,}
The least square estimate of the intercept coefficient /, of true regression line is :
B=Y-BX
Note : Regression Equation of X on Y :
X=h+hy
Normal Equations
DX =nf, + B,DY and YXY= BY + BD Y*
where f, and f, are constants and their values can be obtained by
solving the normal equations.
13. Variations in Yi : The variations in Yi may be classified as under :
(i) Total Variation : The total variation in actual Y values about their sample
mean ¥ is called total variation in Y. it may also be called total sum of
squares (TSS). > y; is a measure of total variation in Y, such that,
TsS = 0%, -¥)’ ie. Ey7,
Econometrics 25 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Gi) Explained Variation : Ys? = D(-¥) =D(" -7F = T(x, - XY is the
variation of the estimated Y values (f,) about their mean (f=), which
appropriately may be called the sum of squares due to regression, (i.e., due
to explanatory variables) or simply the explained sum of squares (ESS),
such that ESS = D7, -¥, Ga-ne-7f &»)
yaw yer
(ii) Unexplained Variation : Se? represents the unexplained variation of Y
values about the regression line, or simply they are called residual sum of
squares (RSS). Such that RSS = (¥,-¥)? or Ye?
Where,
Total Sum of Squares = Expected Sum of Squares + Residuals Sum of Squares
ie., TSS = ESS + RSS
Y, = Actual Value, ¥, = Estimated Value, ¥ = Actual Mean
14. Estimating o? and o : The parameter 6? determines the amount of variability
inherent in the regression model. A large value of 6? means that the observed
(i, )») are quite spread out about the true regression line, whereas when o? is small
the observed points will tend to fall very close to the true regression line. The
variance of the population error term o? is usually unknown. We therefore need to
replace it by an estimate using sample information. Since the population error term
is unobservable, one can use the estimated residuals to find an estimate. We start
by forming the residual term
=¥,- BBX,
After estimating the residuals, we estimate the residuals sum of squares denoted
x Sw -¥Y =D -4-BX7
We observe that, first of all two parameters 3, and 3, must be estimated, which
implies a loss of two degrees of freedom. With this information we may use the
folowing formula 7 estimating 6° as under:
is known as the standard error of estimate or the standard error of the regression
(se). It is simply the standard deviation of the Y values about the estimated
regression line and is often used as a summary measure of the “goodness of fit” of
the estimated regression line.
Econometrics 2.6 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Note : The term number of degrees of freedom means the total number of
observations in the sample (= n) less the number of independent (linear)
constraints or restrictions put on them. In other words, it is the number of
independent observations out of a total of n observations. For example, before
the Ye? can be computed, , and 2, must first be obtained. These two
estimates therefore put two restrictions on the ¢?.. Therefore, there are n ~ 2,
not n, independent observations to compute the }’e*.
15. Variances and Standard Errors of OLS Estimators : As we know that OLS
estimators are random variables, because their values will change from sample to
sample. So we would like to know something about the sampling variability of
these estimators. These sampling variabilities are measured by the variances of
these estimators. The variances and standard errors of OLS estimators are
computed by the following formulae :
Var(b,) => — SE(b)) = Vartb,
Var(b2
_ 6
Sax, - xP
16. Test of Significance of Regression Coefficients : As per central limit theorem,
the regression coefficients bi and b2 follow normal distribution with their means
equal to true B; and B, and variances as computed above. The following steps are
taken to test the significance of slope term of regression equation :
(i) Define Null hypothesis(Ho) and Alternative hypothesis(H.).
Hp : B= 0, £e., slope term is statistically insignificant.
H, : Slope term is statistically insignificant, ie.,
fo #0 (Two tailed test)
B2.>0 (Upper tailed test)
B<0 (Lower tailed test)
(ii) Find out the tail of the test, determine whether it is single tail or two tail test.
(ii) Calculate the standard error of b>.
(iv) Calculate the test statistic “t’ as under :
(3)
(v) Set the Level of Significance ‘a’
(vi) Find ta (for single tail test) or ta (for two tail test) for n — 2 degrees of
freedom from the table.
(vii) Compare |t| and ta (or ta).
(a) If |t] < tas (or ta), then do not Reject Null hypothesis.
(b) If |t] > ta (or ta), then Reject Null hypothesis.
ly we can test the statistical significance of intercept term B,
Econometrics 27 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
17. Confidence Interval : Let us assume that ‘a’ is the level of significance or the
probability of committing type I error, then the confidence interval of regression
coefficients is computed as under :
Confidence Interval of Intercept Term :
P(t, ;.-SED,) $ B, $b, +t,).SE(b,))=1
Confidence Interval of Slope Term :
P(b, ~t,2:SElb,) $ By $B, + tq,» SE@,))=1—
18. The Coefficient of Determination : The coefficient of determination is a measure
of how well a ee model likely to predict future outcomes. The coefficient of
determination 1? is the square of sample correlation coefficient between outcomes
and predicted values. The coefficient of determination denoted by 1° is given by
_ Explained Variance
Tair
Coefficient of determination =r? =
Total Variance
Properties of 2 : The following two properties of ? may be noted :
(@) is anon negative quantity.
(ii) Its limits areO< P< 1
19. The Goodness of Fit Test : Once the regression line has been fitted, we would
like to know how good the fit is; in other words, we would like to measure the
discrepancy of actual observations from the fitted line. This is important since the
closer the data to the line, the better the fit or, in other words, the better the
explanation of variation of the dependent variable by the independent variables. A
usual measure of the goodness of fit is the square of the correlation coefficient, 7°.
This is the proportion of the total variation of the dependent variable caused by the
independent variable. In other words,
_ ExpectedSum of Squares(ESS)
~ TotalSum of Squares(TSS)
The closer the value of 1° to 1 the better fit is the regression model, because an
Y= | means regression is perfect fit, i.2., ¥ =,
20. Gauss Markov Theorem : The Gauss Markov theorem states that, provided that
the assumptions of CLRM are satisfied, the OLS estimators are BLUE, i.e., Best
(most efficient) linear (combinations of Y;) unbiased estimators of the regression
parameters. Thus, the OLS estimators have the following properties :
(i) Linearity : b; and by are linear estimators, i.e., they are linear functions of
random variable Yj.
Gi) Unbiasedness : OLS estimators are unbiased.
(a) bi and bp are unbiased estimates of B; and Bo, i.e., E(b:) = Bi and
E(b2) = Bo.
(b) The OLS estimator of the error variance is unbiased, i.e., E(6*) = 0°.
Econometrics 28 By Dheeraj Suri, 9899-192027