0% found this document useful (0 votes)
212 views28 pages

A A Regression

Regression analysis is used to estimate the relationship between variables. The regression line represents the average relationship between the dependent and independent variables. Regression coefficients are calculated to quantify this relationship. The standard error of estimate measures how accurately the regression line predicts values of the dependent variable from the independent variable. It indicates how closely the data points fit the regression line. Overall, regression analysis helps quantify the strength and direction of the relationship between variables.

Uploaded by

Robin Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views28 pages

A A Regression

Regression analysis is used to estimate the relationship between variables. The regression line represents the average relationship between the dependent and independent variables. Regression coefficients are calculated to quantify this relationship. The standard error of estimate measures how accurately the regression line predicts values of the dependent variable from the independent variable. It indicates how closely the data points fit the regression line. Overall, regression analysis helps quantify the strength and direction of the relationship between variables.

Uploaded by

Robin Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

REGRESSION ANALYSIS

Regression helps us to estimate one


variable or the dependent variable from
the other variable or the independent
variable.
The line describing the average
relationship between two variables is
known as the line of regression.
The dependent variable and independent
variable are also known as regressed (or
explained) and regressor (or explanatory)
variable respectively.
Definition
:
Regression is the measure of the average
relationship between two or more variable in
terms of the original units of the data.
.Blair
Uses of Regression Analysis:
1. It provides estimates of values of the
dependent variable from values of independent
variable. The regression line describes the
average relationship existing between X and Y
variables.
2. To obtain a measure of the error involved in
using the regression line as a basis for
estimation. For this purpose, the standard error
of estimated is calculated.
1. An estimate can be good if the regression line fits the
data closely. If the observations are scattered to a
great extent and do not lie close to the regression line,
then it will not produce accurate estimates of the
dependent variable.
2. Regression analysis helps in obtaining a measure of
the degree of association or correlation that exists
between the two variables. For this purpose, the
coefficient of determination, which measures the
strength of the relationship that exists between the
variables is calculated.
Scatter Diagram:
To determine whether there is a relationship between
the dependent and independent variable, we need to
examine the graph of the known or given data. This
graph is called scatter diagram.


From the scatter diagram we get two
types of information:
We can get patterns that indicate
whether the variables are related.
We can know what kind of line or
estimating equation describes the
relationship.
Regression Line :
If we take the case of two variables X and
Y, we shall have two regression lines as
the regression of X on Y and the
regression of Y on X
The regression line of Y on X gives the most probable
values of Y for given values of X.
The regression line of X on Y gives the most probable
values of X for given values of Y.
When there is either perfect positive or perfect negative
correlation between the two variables (r =1) the
regression line will coincide.
The farther the two regression lines from each other, the
lesser is the degree of correlation and the nearer the two
regression lines to each other, the higher is the degree
of correlation.
If the variables are independent, r is zero and the lines
of regression are at right angles, i.e., parallel to OX and
OY.
The regression lines cut each other at the point of
average of X and Y.
The regression lines are drawn on least squares
assumption which stipulates that the sum of
squares of the deviations of the observed Y
values from the fitted line shall be minimum.
The total of the squares of the deviations of the
various points is minimum only from the line of
best fit.
Regression Equations:
These are algebraic expressions of the
regression lines.
Regression equation of Y on X

bX a Y + =
a is Y- intercept
X independent variable
Y dependent variable
b slope of the line that represents change in Y
variable for a unit change in X variable.
Regression equation of X on Y:


c is X-intercept
D - slope of line
X dependent variable
Y independent variable
dY c X + =
To determine these two regression lines, the
values of constant a, b, c, d are to be obtained.
These values are obtained by the method of
least squares.
Method of Least Squares:
It states that the regression line should be
drawn through the plotted points in such a way
that the sum of the squares of the vertical
deviations of the actual and estimated Y values
is minimum. Thus , the line of regression
becomes the line of best fit.
According to the principle of least squares, the
normal equations for estimating a and b i.e.,
the Y-intercept and slope of the Best Fitting
Regression Line are:
Where n = Total number of observed pairs of
values
X, Y, XY and X
2
are totals computed
from the observed pairs of values of the two
variables X and Y for which the line of least
squares estimate is to be fitted.



+ =
+ =
2
X b X a XY
X b na Y
X b Y a
X n X
Y X n XY
b
=

2
2
Where X, is the mean value of X and Y
is the mean value of Y.

Similarly, the normal equations for the
regression equation X on Y are:

_ _
Y d X c
Y n Y
Y X n XY
d
Y d Y c XY
Y d nc X
=

=
+ =
+ =



2
2
2
Illustration
From the following data obtain the two
Regression equations:
X Y X
2
Y
2
XY
6 9 36 81 54
2 11 4 121 22
10 5 100 25 50
4 8 16 64 32
8 7 64 49 56
X = 30 Y = 40 X
2
= 220 Y
2
= 340 XY = 214
Regression equation of Y on X
Y = a + bx
Substituting the values in equations




We find
40 = 5a + 30 b..(i)
214 = 30a + 220 b(ii)
b = -0.65
Substituting the value of b in equation (i)
40 = 51 + 30 (-.65)
a = 11.9
Putting the values of a & b in the equation, the regression
line of Y on X is:
Y = 11.9 0.65 X


+ =
+ =
2
X b X a XY
X b na Y
Regression line of X on Y
X = c + d Y
Substituting the values in the following equations:




We find:
30 = 5a + 40 b..(i)
214 = 40 a + 340 b(ii)
Solving these two equations we get
b = - 1.3
Substituting the value of b in equation (i)
30 = 5a + 40 (-1.3)
a = 16.4
Thus Regression line of X on Y
X = 16.4 1.3 Y


+ =
+ =
2
Y d Y c XY
Y d nc X
Deviations taken from Arithmetic
Mean
In this case instead of dealing with the actual
values of X and Y we take the deviations of X
and Y series from their respective means.
In such a case the two regression equations are
written as follows:
i) Regression equation of X on Y:

y
x
r
Y
X
Y Y r X X
y
x
o
o
o
o
=
= ) (
Mean of X series
Mean of Y Series =
= The regression coefficient of X on Y
Regression equation of Y on X:

Y Y y
X X x
x
xy
b
Where
X X b Y Y
X X
x
y
r Y Y
yx
yx
=
=
=
=
=

2
) (
) (
o
o
Illustration
Calculate the two regression equations of
X on Y and Y on X from the data given
below :

Price 10 12 13 12 16 15
Amount
demanded
40 38 43 45 37 43
Solution:

X x x
2
Y y y
2
xy
10 -3 9 40 -1 1 3
12 -1 1 38 -3 9 3
13 0 0 43 2 4 0
12 -1 1 45 4 16 -4
16 3 9 37 -4 16 -12
15 2 4 43 2 4 4
X = 78 x = 0 x
2
= 24

Y = 246 y = 0 y
2
=50 xy = -6
Regression equation of X on Y














) ( Y Y
y
x
X X =
o
o

92 . 17 12 . 0
) 41 ( 12 . 0 13
12 . 0
50
6
41
6
246
13
6
78
2
+ =
=
=

= =
=
=
= =

Y X
Y X
y
xy
y
x
Y
X
o
o

Regression equation of Y on X

25 . 44 25 .
25 . 44 25 . 0 41
) 13 ( 25 . 0 41
25 . 0
24
6
) (
2
+ =
+ =
=
=

= =
=

X Y
X Y
X Y
x
xy
x
y
X X
x
y
Y Y
o
o

o
o

Regression Coefficients
Regression equation of Y on X, is given as
Y = a + b X
The quantity b, the slope of the line of regression Y
on X is called the regression coefficient.
The regression coefficient of Y on X is also given
by:


Regression coefficient of X on Y is given by:

x
y
r b
yx
o
o
=
y
x
r bxy
o
o
=
Where r is the coefficient of correlation
between X and Y
x is the population standard deviation
of X
y is the population standard deviation
of Y
Properties of Regression Coefficient :
i. The coefficient of correlation is the
geometric mean of the two regression
coefficients.

xy yx
b b r =
If b
xy
is positive then b
yx
should also be
positive and vice versa. Thus, both
regression coefficient must have the same
sign.
As the value of the coefficient of correlation
lies between 1, both the regression
coefficient also lie between 1. If one of the
regression coefficient is greater than one,
the other must be less than one, since the
value of the coefficient of correlation can not
be greater than one.



The coefficient of correlation and the regression
coefficient have the same sign. If the former is
positive, the latter is also positive and vice-
versa.
Arithmetic mean of b
yx
and b
xy
is equal or greater
than coefficient of correlation.
Regression coefficient are independent of origin
but not of scale.
Since

values given the other three.

y
x
r b
xy
o
o
=
, we can find out any of the four
Standard Error of Estimate (S. E.)
The standard error of estimate measures the
dispersion about an average line, called the
regression line.
It is analogous to standard deviation. S. E. of Y
measures the variability of the observed values
of Y around the regression line.
The deviations are not from the arithmetic
means but they are perpendicular distances of
every point from the line of average
relationships.
Formula:
S
yx
is the standard error
of regression of Y
values from Y
c
.

Similarly,

2
2
1
) (
r S
N
Y Y
S
y yx
c
yx
=

=

o
N
X X
S
c
xy
2
) (


=
A more convenient Formula is :

It measures the accuracy of the estimated
figures.
The smaller the value of S. E., the closer will be
the dots to the regression line and the better the
estimates based on the equation for this line.
If S. E. is zero, then there is no variation about
the line and the correlation will be perfect.

N
XY b Y a Y
S
N
XY b X a X
S
yx
xy



=

=
2
2
With the help of S. E. it is possible for us to ascertain
how good and representative the regression line is as a
description of the average relationship between two
series.

Interpreting the SE of Estimate and finding the
Confidence Limits for the Estimate in Large and Small
Samples:

a) SE for Large Samples (where N 30 in a sample)

With an assumption that the observed values of Y are
normally distributed around the regression line and the
variance of the distribution around each possible value
of Y is same, one can find
68% of all observations within Y 1 SE limits
95.5% of all observations within Y 2 SE limits
99.7% of all observations within Y 3 SE limits
>

You might also like