Simple Linear Regression Analysis
Simple Linear Regression Analysis
Introduction
It considers the modelling between the dependent and one independent variable
Y =β0 + β1 X +
o Y is termed as the dependent or study variable and
o X is termed as the independent or explanatory variable.
o The terms β 0 (intercept) and β 1 are the (slope parameter) parameters of the
model.
These parameters are usually called as regression coefficients.
The unobservable error component accounts for the failure of data to lie on the straight
line and represents the difference between the true and observed realization of Y.
There can be several reasons for such difference, e.g., the effect of all deleted variables
in the model, variables may be qualitative, inherent randomness in the observations etc.
We assume that is observed as independent and identically distributed random
variable with mean zero and constant variance σ 2 .
Later, we will additionally assume that is normally distributed. {Required for inference
making only and not for estimation of the parameters}. This is justified from Central
Limit Theorem.
Regression Line
The regression line indicates the average value of the dependent variable Y
associated with a particular value of independent variable X.
The regression line is given as:
Y intercept Slope of the line
E(Y ∨X ¿¿ i)¿ = β 0 + β 1 X i
Independent Variable
ΣX ΣY
∑ XY −
n
Regression Coefficients, b 1= and,
2 ( ∑ X )2
∑X −
n
2
Simple Linear Regression Analysis
b 0=Y −b1 X
For example,
Rainfall (mm) 12 9 8 10 11 13 7
Yield (Kg) 14 8 6 9 11 12 3
Solution:
2 2
Rainfall (X) Yield (Y) XY X Y
12 14 168 144 196
9 8 72 81 64
8 6 48 64 36
10 9 90 100 81
11 11 121 121 121
13 12 156 169 144
7 3 21 49 9
Total 70 63 676 728 651
Mean 10 9
728 – 70^2/7
b 0 = 9 – (1.64*10) = -7.4
3
Simple Linear Regression Analysis
Estimation
1. The regression model is linear in the parameters (the true relationship between X and Y
is linear)
Y i=β 0 + β 1 X i+ μ i
2. X is non stochastic or pre-determined (i.e; X values are fixed in repeated sampling). This
3. Zero mean value of disturbance term, 𝞵; E ¿] = 0. This means 𝞵 does not affect the
means that the regression analysis is conditional on the given values of X.
mean value of Y; i.e.; the positive μivalues cancel out the negative μi values so that their
average or mean effect on Y is 0. E ¿] = β 0 + β 1 X i
4. Homoscedasticity or equal variance [Var ( μi ¿ X i) = σ 2 ¿. This means that the Y
populations corresponding to various X values have the same variance. [Var ( Y i ¿ X i) =
2
σ ¿.
5. No spatial correlation and no autocorrelation between the disturbance terms [Cov (
μi , μ j∨X i , X j ¿=0] and [Cov ( μt , μ t−1∨ X i , X j ¿=0 ]. This means that, given X i , the
4
Simple Linear Regression Analysis
deviations of any two Y values from their mean value does not exhibit any pattern or
correlation.
6. Zero covariance between X i μi.
7. The number of observations n must be greater than the number of parameters to be
estimated (alternatively, the number of observations n must be greater than the number
of explanatory variables)
8. Variability in X values [Var(x) must be a positive number] must be high enough to be
qualified as explanatory variables.
9. Regression model is correctly specified (The form of relationship is correct and there is
no specification bias or error in the model)
a. No improper functional form, else model will be over-estimated or under-
estimated
b. No inclusion of irrelevant variable or exclusion of relevant variable
10. At each fixed value of X, the corresponding values of Y have a normal distribution about
a mean.
11. No perfect multicollinearity (no perfect linear relationship between explanatory variables)
{Applicable in case of multiple linear regression analysis}