J. K.Shah Classes Regression Analysis
J. K.Shah Classes Regression Analysis
5. Regression Analysis
Introduction
Regression is the average linear relationship between two or more variables.
The word regression implies “estimation of prediction”. In other words through regression
equations we can quantify the relationship between two variables and we can predict the
average value of one variable corresponding to a specific value of the other.
It establishes a functional relationship between two variables.
Regression equation enables us to find the nature and the extent of relationship between
two variables. Correlation can measure only the degree of association between the two
variables whereas regression quantifies such relationship.
The two variables are dependent and independent variable. Thus, we try to estimate the
average value of dependent variable, for a specified value of independent variable using
regression analysis.
If there are two variables, then the independent variable is called the “Regressor” or
“Explaining Variable” and the dependent variable is called the “Regressed” or “Explained
Variable”.
Regression analysis is an absolute measure showing a change in the value of y or x for
a corresponding change in the value of x or y whereas correlation coefficient is a relative
measure of linear relationship between x and y.
This average linear relationship between two variables is expressed by means of two
straight line equation known as regression lines or regression equations.
If there are two variables x and y we can have the following two types of regression lines,
i. Regression equation of y on x (y dependent, x independent)
ii. Regression equation of x on y (x dependent, y independent)
REGRSSION LINES
: 361 :
J. K.SHAH CLASSES Regression Analysis
n n n
n
3. When deviations are taken from 3. When deviations are taken from
actual mean i.e., x and y such that actual mean i.e., x and y such that
u x x, v y y u x x, v y y
b yx
( x x )( y y ) uv bxy
( x x )( y y ) uv
( x x )2 u2 ( y y )2 v2
4. When deviations are taken from 4. When deviations are taken from
assumed mean say A & B for x and assumed mean say A & B for x and
y, u=x-A, v = y-B y, u=x-A, v = y-B
uv u v uv u v
b yx n n n
bxy n n n
u u v v
2 2 2
2
n n n
n
5. Using ‘r’
y 5. Using ‘r’
b yx r x
x bxy r
x S .D ( x ), y S .D ( y ) y
and r = Correlation co-efficient x S .D ( x ), y S .D ( y )
between x and y and r = Correlation co-efficient between x
and y
: 362 :
J. K.SHAH CLASSES Regression Analysis
+ + +
- - -
Note:
When byx and bxy are of opposite signs, data are inconsistent, r is imaginary.
8. Regression coefficients are independent of the Change of Origin but they are dependent
xa y b
on Change of Scale. If u and v then
c d
i. byx = bvu d
c
ii. bxy= buv c
d
9. There is no specific range within which two regression coefficients will lie but their values
should be such that the square root of the product of two regression coefficients must lie
between -1 and +1 (both inclusive). Thus, if one of the regression coefficient, is
greater than unity then the other must be less than unity.
When r = 1 , then
i. The two regression lines become identical i.e., they coincide.
1
ii. byx=
bxy
iii. Perfect linear co-relationship is observed and the angle between the two regression
lines becomes 0o.
iv. For a particular value of x we shall obtain a specific value of y.
: 363 :
J. K.SHAH CLASSES Regression Analysis
As the angle between two regression lines numerically decreases from 90o to 0o,
the correlation increases from 0 to 1 and the two regression lines comes closer to
each other.
Angle between two regression lines; if A is the angle between two regression lines
1 r2 x y
then tan A =
r 2 2
x y
Miscellaneous Properties:
In regression analysis, the difference between the Observed value and the Estimated
value is known as Residue or Error.
Proportion of Total Variance explained by regression analysis is r2.
Proportion of Total Unexplained Variance is (1- r2).
or x 1 r 2
N
or y 1 r 2
N
When r2=1 , then;
Explained variance
i. 1
Total variance
ii. Explained variance = Total Variance
iii. The whole of the total variance is explained by regression.
iv. The unexplained variation is zero
v. All the points on the scatter diagram will lie on the regression line
vi. There is a perfect linear dependence between the variables
vii. The two regression lines coincide
viii. For a given value of one variable, we have a fixed value of the other variable
Regression Lines
1. From the following data, find the regression equation of X on Y:
X 1 2 3 4 5
Y 2 3 5 4 6
a) X = 0.9Y + 0.6
b) X = 0.9Y – 0.6
c) X = 0.9Y + 1.3
d) X = 0.9Y – 1.3
2. Taking data from the previous question, find the regression equation of Y on X.
a) Y = 0.9X – 1.3
b) Y = 0.9X – 0.6
c) Y = 0.9X + 1.3
d) Y = 0.9X + 0.6
: 364 :
J. K.SHAH CLASSES Regression Analysis
3. The information given below relates to the advertisement expenses and sales revenue
of a company.
5. Given the following data, find the marks in statistics for a student who gets 90 marks in
accountancy:
Statistics Accountancy
Average
67 75
Marks
S.D. of
8 10
Marks
Correlation Coefficientr = + 0.75
6. In the estimate of regression equation of two variable X and Y, the following results
were obtained: Mean(X) = 90, Mean(Y) = 70, n = 10, ∑x2 = 6360, ∑y2 = 2860, ∑xy =
3900 (x and y are the deviations taken from respective means). Obtain the regression
equation of Y on X.
7. Given that the means of X and Y are 65 and 67 respectively. Their standard deviations
are 2.5 and 3.5 respectively, and the coefficient of correlation between them is 0.8.
Obtain the best estimate of X, when Y = 70.
a) 56.61
b) 66.71
c) 79.81
d) 70.71
8. With bxy = 0.5,. r = 0.8 and variance of y = 16, standard deviation of x equals to:
a) 6.4
b) 2.5
c) 10.0
d) 26.5
: 365 :
J. K.SHAH CLASSES Regression Analysis
10. Regression equation of Y on X is 8X – 10Y + 66 = 0 and SD(x) = 3, find the value of Cov
(x, y).
a) 11.25
b) 7.2
c) 2.4
d) None of the above
11. If Mean of x = 10, Mean of y = 50, SD(x) = 3, SD(y) = 15, r = 0.9, then find the estimated
value of x for corresponding y = 100.
a) 18
b) 19
c) 20
d) 21
12. For 100 students of a class, the regression equation of marks statistics (X) on the marks
in Economics(Y) is 3Y-5X+180=0. The mean marks in Economics is 50 and variance of
marks in statistics is 4/9 of the variance of the marks in Economics. The mean marks in
statistics is & Correlation coefficient is
a) 56, r = 0.7
b) 55, r = 0.8
c) 66, r = 0.9
d) 65, r = 0.7
13. While calculating the coefficient of correlation between two variables X and Y the
following results were obtained:
The number of observation N=25
X 125, Y 100, X 2 650, Y 2 460, XY 508, It was however, later
discovered at the time of checking that two pairs of observations (X,Y) were copied (6,14)
and (8,6) while the correct values (8,12) and (6,8) respectively. The correct value of byx
is:
a) 0.6
b) 0.7
c) 0.8
d) 0.65
14. Obtain the estimate of ‘m’ if regression equation of saving (x) on income (y) is expressed
as x=a+y /m where ‘m’ are constants and it is known that the coefficient of correlation
between x and y based on a sample of 100 families is 0.4 and the variance of saving is
one quarter the variance of incomes.
a) 5
b) 4
c) 3
d) 2
: 366 :
J. K.SHAH CLASSES Regression Analysis
15. Find the regression line of profits on output from the following data using:
Output (1000 units) 5 7 9 11 13 15
Profits per unit (Rs.) 1.7 2.4 2.8 3.4 3.7 4.4
a) y=0.257x+0.497
b) y=0.255x+0.487
c) y=0.277x+0.447
d) None
19. If bxy = - 1.2 and byx = - 0.3, then the coefficient of correlation between x and y is:
a) – 0.698
b) – 0.36
c) – 0.51
d) – 0.6
20. If the regression coefficient of y on x is 4/3, then the regression coefficient of x on y is:
a) More than 1
b) Less than 1
c) Less than zero
d) None of the above
21. Given bxy = 1.36, byx = 0.613, then the value of coefficient of determination is:
a) 0.734
b) 0.634
c) 0.534
d) 0.834
: 367 :
J. K.SHAH CLASSES Regression Analysis
22. Given bxy = 0.756, byx = 0.659, then the value of coefficient of non-determination is given
by:
a) 0.402
b) 0.502
c) 0.602
d) 0.702
23. Find the mean of x and y, if the two regression lines are 3x – y – 5 = 0 and 2x – y – 4 = 0.
a) 1, - 2
b) – 1, 2
c) 2, - 1
d) – 2, - 1
26. If the Coefficient of non-determination is 0.502, then the Coefficient of alienation is:
a) 0.71
b) 0.61
c) 0.51
d) 0.81
28. Given byx =3, y =12 and r=1 then the standard error of estimate of X on Y is:
a) 4
b) 3
c) 2
d) 0
: 368 :
J. K.SHAH CLASSES Regression Analysis
Identification Problems
32. Out of the two regression lines x + 2y = 4 and 2x + 3y – 5 = 0, which is the regression line
of x on y?
a) 2x + 3y – 5 = 0
b) x + 2y = 4
c) Both a) and b) above
d) Neither a) nor b) above
33. In a partially destroyed record, the following data re available: variance of x = 9 and the
regression lines are 8x – 10y + 66 = 0 and 40x – 18y = 214, find the coefficient of
correlation between x and y.
a) – 0.6
b) 0.36
c) – 0.36
d) + 0.6
34. Taking data from the previous question, what would be the standard deviation of y?
a) 9
b) 4
c) 6
d) 5
: 369 :
J. K.SHAH CLASSES Regression Analysis
From the following regression equations answer the questions that follows:
3x – 2y – 10 = 0
24x – 25y + 145 = 0
Theoretical Aspects
Introduction
38. The word regression is used to denote ________ of the average value of one variable for
a specified value of the other variable.
a) Estimation
b) Prediction
c) Either a) or b) above
d) None of the above
43. If the curve plotted on a Scatter Diagram is a straight line, it is called the:
a) Line of correlation
b) Line of scatter diagram
c) Line of regression
d) Both a) and c) above
44. The estimation in regression analysis is done by means of suitable equations, derived
on the basis of available bivariate data. Such an equation is known as:
a) Recursive Equations
b) Recurring Equations
c) Regression Equations
d) Both a) and c) above
47. For two variables, the number of regression lines would be:
a) 1
b) 3
c) 2
d) Greater than 3
: 371 :
J. K.SHAH CLASSES Regression Analysis
48. For the relationship between two variables X and Y, we have which of the followings
lines of regression?
a) Y on X
b) X on Y
c) Both a) and b) above
d) Only b) above
49. Which of the following regression equations is used to estimate Y, when the value of X
is known?
a) X on Y
b) Y on X
c) Either of the two can be used
d) Neither of the two can be used
50. Since Yield of a crop depends upon amount of rainfall, we need to consider:
a) The regression equation of yield on rainfall
b) The regression equation of rainfall on yield
c) Any one of a) or b) above can be considered
d) Neither of a) or b) can be considered
52. The method applied for deriving the regression equations is:
a) Fitting of Normal Equations
b) Rank Correlation Method
c) Least Square Method
d) Product Moment Method
: 372 :
J. K.SHAH CLASSES Regression Analysis
Properties:
58. When the slope of two regression lines are equal:
a) The lines are perpendicular to each other.
b) The lines will coincide.
c) The lines will be parallel to each other.
d) None of the above.
: 373 :
J. K.SHAH CLASSES Regression Analysis
62. The sign analogy of correlation coefficients and two regression coefficients is:
a) -, +, +
b) -, -, -
c) +, +, +
d) Both b) and c) above
64. As “r” increases numerically from 0 to 1, the angle between the regression lines:
a) Increases from 0o to 90o
b) Diminishes from 90o to 0o
c) Increases from 0o to 180o
d) Both a) and c) above
66. If correlation coefficient between two variables is zero, which of the following is true?
a) Both regression coefficients are greater than one.
b) Both regression coefficients are negative.
c) One of the regression coefficient is zero.
d) Both regression coefficients are zero.
67. In regression analysis the difference between the observed value and the estimated
value is known as:
a) Residue
b) Deviation
c) Error
d) Either a) or c) above
: 374 :
J. K.SHAH CLASSES Regression Analysis
69. Which of the following statement/s is/are FALSE regarding the regression coefficient?
a) If one of the regression coefficient is greater than unity the other one is less than
unity.
b) The product of two regression coefficient is equal to the square of the correlation
coefficient between the two variables.
c) The regression coefficient lies between – infinity to + infinity.
d) None of the above is FALSE.
38 c 46 a 54 a 62 d
39 c 47 c 55 a 63 c
40 c 48 c 56 b 64 b
41 b 49 b 57 a 65 b
42 b 50 a 58 b 66 d
43 c 51 b 59 b 67 d
44 c 52 c 60 c 68 c
45 b 53 c 61 b 69 d
70 b
: 375 :