Final - Chp. 17 Correlation and Regression
Final - Chp. 17 Correlation and Regression
Bivariate data
Correlation
Concurrent Deviation
Regression
Regression line of y on x
Regression Equations
274
CA Foundation
Bivariate Data
Introduction :
So far we have studied the statistical methods used for analysis of data involving one
variable only. However, we may come across certain situations where observing two
variables is needed. e.g. we may have data regarding
(i) Demand and Price of a certain commodity over a period of time.
(ii) Weight of a person and Blood Pressure of the person.
(iii) Quantity of water given and yield of a crop.
Example 1 : Prepare a Bivariate Frequency table for the following data relating to the
(narks in statistics (x) and Mathematics (y):
(15,13), (1,3) (2,6) (8,3) (15,10) (3,9) (13,19)
(10,11) (6,4) (18,14) (10,19) (12,8) (11,14) (13,16)
(17,15) (18,18) (11,7) (10,14) (14,16) (16,15) (7,11)
(5,1) (11,15) (9,4) (10,15) (13,12) (14,17) (10,1)
(6,9) (13,17) (16,15) (6,4) (4,8) (8,11) (9,12)
(14,11) (16,15) (9,10) (4,6) (5,7) (3,11) (4,16)
(5,8) (6,9) (7,12) (15,6) (18,11) (18,19) (17,16)
(10,14)
Take mutually exclusive classification for both the variables, the first class interval being
0-4 for both.
We take the class intervals 0-4, 4-8, 8-12, 12-16, 16-20 for both the variables. Since the
first pair of marks is (15, 13) and 15 belongs to the fourth class interval (12-16) for x and
13 belongs to the fourth class interval for y, we put a stroke in the (4,4)-th cell. We carry
on giving tally marks till the list is exhausted
275
CA Foundation
276
CA Foundation
We may obtain the mean and SD from the above table. They would be known as conditional
mean and conditional SD of marks of Statistics. The same result holds for marks of
Mathematics. In particular, if there are m classification for x and n classifications for y,
then there would be altogether (m + n) conditional distribution.
Note :
1) For a (5 x 5) i.e. (p x q) classification of bivariate data, the maximum no of
conditional distributions is 5 + 5 = 10 (i.e. p + q.)
2) Some of the cell frequencies in bivariate Frequency table may be zero.
3) For a 5 5 (p q) bivariate frequency table, the maximum no. of marginal
distributions is 2.
CLASS WORK - 1
3. The statistical method, to find the relation /association between two variables of
bivariate data, known as
(a) Correlation (b) Regression (c) Mean (d) None
Ans.
1 2 3 4 5 6
C A A A A B
277
CA Foundation
Correlation
If it is evident from the bivariate data that the change in value of one variable induces
change in value of other variable, then the variables are said to be correlated and we say
that there is a correlation between the two underlying variables. In other words,
correlation is the mutual or joint relationship between the two variables. For the study of
correlation, there must be evidence for dependence between the variables
Study of correlation is essential for the variables such as
(i) Intelligence Quotient (IQ) and marks of a student.
(ii) (ii) Demand and price of a commodity.
Types of Correlation :
According to the direction of changes in two variables the correlation can be of two types:
i. Positive Correlation
ii. Negative Correlation
i Positive Correlation : If change in the value of one variable causes change in value
of other variable in the same direction, then the variables are said to be positively
correlated. In positive correlation, the increase (decrease) in the value of one
variable results in the corresponding increase (decrease) in value of other variable.
Examples :
Income and expenditure of a family
Sale and profit of company
Supply and price of a commodity
Height and weight of a group of students
Marsks in Mathematics and Statistics.
Marks in Maths (x) 25 50 70 80 90
Marks in Statistics (y) 30 40 55 85 87
ii. Negative Correlation : If change in the value of one variable causes change in value
of other variable in the opposite direction, then the variables are said to be
negatively correlated.
In negative correlation, the increase (decrease) in the value of one variable results
in the corresponding decrease (increase) in the value of other variable.
Examples :
Population and per capita income.
Expenditure and saving of a family.
Volume and pressure of gas
Sale of woolen garments and day temperature of a place.
Demand and price of commodity
Demand (y) 100 80 90 70 60
Price (x) 10 20 30 40 50
278
CA Foundation
Spurious Correlation :
It is the correlation between two variables having no causal relation.
Examples :
1. Height of student and marks scored in exam.
2. Size of shirt and monthly income of a person.
CLASS WORK - 2
4. Age of Applicants for life insurance and the premium of insurance - correlations are
(a) Positive (b) Negative (c) Zero (d) None
5. “Unemployment index and the purchasing power of the common man” - Correlations
are
(a) Positive (b) Negative (c) Zero (d) None
6. If the ratio of change between two variables is the same then it is called ________
(a) Non linear correlation (b) Linear correlation
(c) a or b (d) None
7. Correlation would be __________ if the amount of change in one variable does not
bear a constant ratio to the amount of change in other variable
(a) Non linear (b) Linear (c) Can’t say (d) None
8. The distinction between linear & nonlinear correlation is based upon the constancy
of ratio of change between the variables
(a) False (b) true (c) Both (d) None
279
CA Foundation
10. The correlation between the speed of an automobile and the distance travelled by
it after applying the brakes is…………..
(a) positive (b) negative (c) zero (d) None
11. Day temperature and sale of cold drinks indicate ________ correlation
(a) Positive (b) Negative (c) Zero (d) None
Ans.
1 2 3 4 5 6 7 8 9 10
D A D A B B A B B A
11 12
A C
Graphical Algebraic
Scatter diagram
It is a simple graphical tool to study correlation. It gives an idea about the type of
correlation and degree (extent) of correlation. Let x1,y1 x 2 ,y 2 .................. x n ,y n be n
280
CA Foundation
ordered pairs of value of two variables X and Y. These points are plotted corresponding to
each of the ordered pair by choosing a suitable scale on Cartesian Co-ordinate system.
We get a diagram of plotted points known as scatter diagram. From these plotted point
we may locate a curve or a line. The trend of the points in scatter diagram indicates the
nature of possible correlation between X and Y. The closeness of points to the line gives
the idea of extent of correlation. More the closeness, higher is the degree of correlation.
Following are different possible Scatter diagrams.
Case (A) n points are collinear i.e. all the points lie on a line.
(i) Perfect positive correlation: If the line containing n points is rising from left to right
then scatter diagram indicates a perfect positive correlation
(ii) Perfect negative correlation: If the line containing these n points is failing down
from left to right, the scatter diagram indicates a perfect negative correlation.
281
CA Foundation
282
CA Foundation
CLASS WORK - 3
5. If the band of points is rising from left to right then it indicates_________ correlation
(a) negative (b) Zero (c) Positive (d) None
6. If the band of points is falling from left to right it indicates _________ correlation
(a) negative (b) Zero (c) Positive (d) None
10. In case the correlation coefficient between two variables is 1, the relationship
between the two variables would be _________
(a) y = a + bx
(b) y = a + bx, b > 0
(c) y = a + bx, b < 0
(d) y = a + bx, both a and b being positive
11. If the relationship between two variables x and y is given by 2x + 3y + 4 = 0, then the
value of the correlation coefficient between x and y is _________
(a) 0 (b) 1 (c) -1 (d) Negative
13. When in scatter diagram all the points lie on trend line we get_________
(a) Perfect positive correlation (b) Perfect negative correlation
(c) Either (a) or (b) (d) None
14. If the plotted points in a scatter diagram are evenly distributed, then the correlation is
(a) Zero (b) Negative (c) Positive (d) (a) or (b)
16. If the plotted points in a scatter diagram are evenly distributed, then the correlation is
(a) Zero (b) Negative (c) Positive (d) (a)or(b)
17. If the plotted points in a scatter diagram lie from upper left to lower right, then the
correlation is
(a) Positive (b) Zero (c) Negative (d) None of these
18. If all the plotted points in a scatter diagram lie on a single line, then the correlation is
(a) Perfect positive (b) Perfect negative (c) Both (a) and (b) (d) Either (a) or (b)
19. The more scattered the points are around a straight line a scattered diagram the
_____ is the correlation coefficient.
(a) Zero (b) More (c) Less (d) None
284
CA Foundation
20. The correlation coefficient being +1 if the slope of the straight line a scatter diagram is
(a) Positive (b) Negative (c) Zero (d) None
21. The correlation coefficient being -1 if the slope of the straight line in a scatter
diagram
(a) Positive (b) Negative (c) Zero (d) None
Ans.
1 2 3 4 5 6 7 8 9 10
C A A D C A A B C B
11 12 13 14 15 16 17 18 19 20
C C C A A A C D C A
21 22
B A
TRY YOURSELF - 1
2. If the values of y are not affected by changes in the values of x, the variables are
said to be
(a) Correlated (b) Uncorrelated (c) Both (d) Zero
3. When the variables are not independent, the correlation coefficient may be zero
(a) True (b) False (c) Both (d) None
4. When high values of one variable are associated with high values of the other & low
values of one variable are associated with low values of another, then they are said
to be
(a) Positively correlated (b) Directly correlated (c) Both (d) None
5. If high values of one tend to low values of the other, they are said to be
(a) Negatively correlated (b) Inversely correlated
(c) Both (d) None
285
CA Foundation
9. If two variables x and y are independent then the correlation coefficient between x
and y is _____.
(a) Positive (b) Negative (c) Zero (d) One
11. If x denotes height of a group of students expressed in cm. and y denotes their weight
expressed in kg, then the correlation coefficient between height and weight
(a) Would be shown in kg. (b) Would be shown in cm.
(c) Would be shown in kg. and cm. (d) Would be free from any unit.
13. The correlation between sale of cold drinks and day temperature is _____.
(a) Zero (b) Positive (c) Negative (d) None of these
16. Whatever may be the value of r, positive or negative, its square will be
(a) Negative only (b) Positive only (c) Zero only (d) None only
17. A small value of r indicates only a _____ linear type of relationship between the
variables
(a) Good (b) Poor (c) Maximum (d) Highest
286
CA Foundation
18. Correlation methods are used to study the relationship between two time series of
data which are recorded annually, monthly, weekly, daily and so on.
(a) True (b) False (c) Both (d) None
20. A coefficient near +1 indicates tendency for the larger values of one variable to be
associated with the larger values of the other.
(a) True (b) False (c) Both (d) None
21. There is a high direct association between measures of 'cigarette smoking' and 'lung
damage. The correlation coefficient consistent with the above statement is -
(a) 0.30 (b) 0.80 (c) -0.80 (d) -0.30
Ans.
1 2 3 4 5 6 7 8 9 10
A B A C C A A B C A
11 12 13 14 15 16 17 18 19 20
D A B A A B B A A A
21
B
Karl Pearson’s Coefficient of Correlation
Definition : Given a set of N pairs of observations X1,Y1 X 2 ,Y2 .................. X N,YN relating
to two variables X and Y, the Coefficient of Correlation between X and Y, denoted by the
symbol ‘r’ is defined as ratio of covariance between x and y to the product of standard
deviations of x and y.
Cov.(X,Y)
r= Where Cov. (X,Y) = Covariance of X and Y
σx .σy
σx = Standard Deviation of X
σy = Standard Deviation of Y
This expression is known as Pearson’s product - moment formula and is used as measure
of linear correlation between X and Y.
1 r=
Cov.(X,Y)
where Cov.(X,Y) =
(x - x)(y - y)
Var(x) Var(y) n
287
CA Foundation
Cov.(X,Y)
2. r=
σx .σy
3. r=
(x - x)(y - y)
(x - x) (y - y)
2 2
n xy x y
4. r=
n x x n y y
2 2 2 2
x -a y -b
5. If x and y are large numbers then u = and v = a,b,h,k are constants and
h k
h 0&k 0
n uv u y
rxy = ruv =
n u u . n v v
2 2 2 2
Interpretation :
(i) If r > 0 the correlation is positive
(ii) If r < 0 the correlation is negative
(iii) If r = 0, no linear correlation
(iv) If r = 1, the correlation is perfect positive
(v) If r = -1, the correlation is perfect negative
(vi) If r > 0.8, there is high correlation
(vii) If 0.3 < r < 0.8, there is moderate correlation
(viii) If r < 0.3, there is marginal correlation.
Effect of shift of origin and change of scale :
1. It is not affected by shift of origin
2. It is affected by signs of changes of scale.
3. It is not affected by magnitude of change of scale.
Examples:
1. If u = 3x+4 and v = 2y+7 and rxy = 0.75 then ruv = +0.75 because both coefficients of
change of scale are positive.
288
CA Foundation
2. If u = -3x+4 and v = 2y+7 and rxy = 0.75. Then ruv = -0.75 because one of the
coefficients of change of scale is negative.
3. If u = -3x+4 and v = -2y+7 and rxy = +0.75 then ruv = + 0.75 because both coefficients
of change of scale are negative.
CLASS WORK - 4
7. If cov (x,y) = 15, what restriction should be put for the standard deviations of x & y?
(a) No restriction
(b) The product of the standard deviations should be more than 15
(c) The product of the standard deviations should be less than 15
(d) The sum of the standard deviations should be less than 15
289
CA Foundation
x 5 y 7
8. If the correlation coefficient between x & y is r then between u = &u =
10 10
(a) r (b) -r (c) r - 5 10 (d) r - 7 10
10. If the covariance between two variables is 20 and the variance of one of the
variable is 16, what would be the variance of the other variable ?
(a) More than 100 (b) More than 10 (c) Less than 10 (d) More than 1.25
The coefficient of correlation was found to be 0.93. What is the correlation between
u and v as given below ?
u: 10 15 25 20 35
v: -24 -36 -42 -48 -60
(a) -0.93 (b) 0.6 (c) -0.93 (d) 0.93
14. In calculating the Karl Pearson's coefficient of correlation it is necessary that the
data should be of numerical measurements. The statement is
(a) Valid (b) Not valid (c) Both (d) None
21. If the coefficient of correlation between x and y is 0.28, co-variance between x and
y is 7.6 and the variance of x is 9, then the S.D. of y series is;
(a) 9.8 (b) 10.1 (c) 9.05 (d) 10.05
22. If for two variable X and Y, the covariance, variance of X and variance of Y are 40,
16 and 256 respectively, what is the value of the correlation coefficient?
(a) 0.01 (b) 0.625 (c) 0.4 (d) 0.5
23. The co-efficient of correlation between two variables x and y is 0.5, their co-variance
is 16. If S.D. of x is 4 and S.D. of y is equal to:
(a) 4 (b) 8 (c) 16 (d) 64
24. If the relationship between two variables x and y is given by 2x+3y+4=0, then the
value of the correlation coefficient between x and y is
(a) 0 (b) 1 (c) -1 (d) Negative
25. Coefficient of correlation between x and y for 20 items is 0.4 . The AM'S and the SD'S of x
and y are known to be 12, 15, 3 and 4 respectively. Later on, it was found that the pair
(20, 15) was wrongly taken as (15, 20). Find the correct value of correlation coefficient.
(a) 0.28 (b) 0.31 (c) 0.53 (d) 0.47
Ans.
1 2 3 4 5 6 7 8 9 10
B C C D C C B A D A
11 12 13 14 15 16 17 18 19 20
B B C A A B A A A C
21 22 23 24 25
C B B C B
291
CA Foundation
n 3 n
Where n = number of pairs of observations, r = coefficient of correlation.
The probable error of the coefficient of correlation helps in interpreting its value. Since
the coefficients of correlation are, generally, computed from samples, they, like other
statistical quantities, are subject to errors of sampling. So from interpretation point of
view probable error of the coefficient of correlation is very useful.
Properties of Probable Error
It is used for interpreting the coefficient of correlation r whether it is significant or not
(i) If r < 6 x PE. then it is not significant. Perhaps there is no evidence of
correlation.
(ii) If r 6 x PE., then it is significant and the correlation exists.
(iii) By adding and -subtracting the value of Probable Error from r, we get
respectively upper and lower limits within which the coefficient of correlation
in the population can expected to. It is given as:
Correlation of the population = r P.E.
Thus P. E is used for testing the reliability of the value of r.
Standard Error :
1-r 2
The standard error is defined as: Standard Error or S.E.=
n
where r = coefficient of correlation; n = number of pairs of observations.
Correlation coefficient measuring a linear relationship between the two variables
indicates amount of variation of one variable accounted for by the other variable. A better
measure this purpose is provided by the square of the correlation coefficient, Known as
'coefficient determination'. This can be interpreted as the ratio between the explained
variance to variance i.e.
Explained variance
r2
Total variance
2
Thus a value of 0.6 for r indicates that (0.6) x 100% or 36 per cent of the variation has
accounted for by the factor under consideration and the remaining 64 per cent variation
is due to other factors. The 'coefficient of non-determination' is given by 1-r 2 and can
be interpreted as the ratio of unexplained variance to the total variance.
unexplained variance
Coefficient of non- determination = 1-r 2 =
total variance
r = 0.6 can not conclude that 60% of the variation in dependent variable is due to the
variation independent variable, but coefficient of determination r 2 = 0.36 which implies
that only 36% of variation in dependent variable has been explained by independent
variable.
292
CA Foundation
CLASS WORK - 5
4. In a correlation analysis, the value of Karl Person’s correlation coefficient and its
probable error were found to be 0.90 and 0.04 respectively. Find the value of n.
(a) 10 (b) 12 (c) 13 (d) None of these
5. Find the coefficient of correlation r, when its probable error is 0.2 and the number
of pairs of items is 9.
(a) 0.123 (b) 0.3323 (c) 0.223 (d) None of these
580
6. A relationship r 2 = 1- is not possible
300
(a) True (b) False (c) Both (d) None
8. If r 6 PE, then r is
(a) significant (b) Not Significant (c) Can’t say (d) None.
10. Correlation coefficient between x and y is 0.3 for n =100. Find Standard Error in r.
11. Correlation coefficient between x and y is 0.3 for n =100. Find Probable Error in r.
(a) 0.081 (b) 0.054 (c) 0.09 (d) None
293
CA Foundation
12. Correlation coefficient between x and y is 0.3 for n =100. Find range of r.
(a) 0.246 - 0.354 (b) 0.3 to 0.354 (c) 0.354 – 0.9 (d) None
TRY YOURSELF - 2
1. Correlation coefficient is dependent on the change of both origin & the scale of
observations.
(a) True (b) False (c) Both (d) None
3. Neither y nor x can be estimated by a linear function of the other variable when r
equals
(a) +1 (b) -1 (c) 0 (d) None
7. r12 is
the correlation coefficient between
(a) X1 and X2 (b) X2and X1 (c) X1 and X3 (d) X2 and X3
294
CA Foundation
9. If a, b, c, d are constants such that a and c are of opposite signs and r is the
correlation coefficient between X and Y, then the correlation coefficient between
aX + b and cY + d is -
a c
(a) r (b) r (c) r + (d) r-
c a
11. Two variables X and Y are related as 4x + 3y = 7, then the Correlation between x
and y is -
(a) Perfect Positive (b) Perfect Negative (c) Zero (d) None of these
12. If Cov (u,v) =3, 𝜎𝑢2 = 4.5, 𝜎𝑣2 = 5.5, then p(u,v) is:
(a) 0.347 (b) 0.603 (c) 0.07 (d) 0.121
14. The following data relate to the test scores obtained by eight salesmen in an
aptitude test and their daily sales in thousands of rupees:
Salesman Scores Sales
1 60 31
2 55 28
3 62 26
4 56 24
5 62 30
6 64 35
7 70 28
8 54 24
(a) 0.23 (b) 0.48 (c) 0.77 (d) 0.89
15. Find the product moment correlation coefficient from the following information:
X 2 3 5 5 6 8
Y 9 8 8 6 5 3
(a) -0.93 (b) 0.57 (c) -0.49 (d) 0.73
16. Examine the correlation between age and blindness on the basis of the following data.
Age in years No. of persons (in 000's) No of blind persons
0-10 90 10
10-20 120 15
20-30 140 18
30→0 100 20
40-50 80 15
50-60 60 12
60-70 40 10
70-80 20 06
(a) 0.73 (b) 0.96 (c) 0.58 (d) 0.37
295
CA Foundation
18. What is the value of correlation coefficient Karl Pearson on the basis of the
following data:
X -4 -3 -2 -1 0 1 2 3 4
Y 18 11 6 3 2 -3 -6 -11 -18
(a) 1 (b) -1 (c) 0 (d) -0.5
19. Compute the correlation coefficient between x and y from the following data.
n = 10 , ∑ x = 40, ∑y = 50, ∑xy = 220, ∑x2 =200, ∑y2 = 262.
(a) 0.33 (b) 0.51 (c) 0.91 (d) 0.87
20. Given that for twenty pairs of observations, ∑xu = 525, ∑x = 129, ∑u = 97, ∑x 2 = 687,
∑u2 = 427 and y = 10 - 3u, the coefficient of correlation between x and y is
(a) -0.7 (b) 0.74 (c) -0.74 (d) 0.75
Ans.
1 2 3 4 5 6 7 8 9 10
D A C B C A A A D C
11 12 13 14 15 16 17 18 19 20
B B B B A B C C C C
296
CA Foundation
e.g. suppose an item is repeated twice at rank 5, then the common rank to be assigned to
5+6
each item will be 5.5 The next item will be ranked 7. If an item is repeated thrice at
2
3+ 4 +5
rank 3, then common rank to be assigned to each item will be 4. The next rank
3
assigned will be 6.
m m2 1
In case of the tie occurs need to be corrected. The correction factor is
12
The corrections for ties are computed separately for each of two series x and y.
Let Tx = Total correlation due to ties in x
Ty = Total correlation due to ties in y
Then, corrected d Tx Ty
2
i
6 corrected d 2
R = 1-
i
n n2 -1
CLASS WORK - 6
297
CA Foundation
5. For finding the degree of agreement about beauty between two Judges in a Beauty
Contest, we use
(a) Scatter diagram (b) Coefficient of rank correlation
(c) Coefficient of correlation (d) Coefficient of concurrent deviation
6. If there is a perfect disagreement between the marks in Geography and Statistics, then
what would be the value of rank correlation coefficient?
(a) Any value (b) Only 1 (c) Only -1 (d) (b) or (c)
8. If the sum of squares of the rank difference in mathematics & physics marks of 10
students is 22, then the coefficient of rank correlation is
(a) 0.267 (b) 0.867 (c) 0.92 (d) None
9. Three competitors in a contest are ranked by two judges in the order 1, 2, 3 & 2, 3, 1
respectively. Calculate spearman’s rank correlation coefficient
(a) - 0.5 (b) - 0.8 (c) 0.5 (d) 0.8
10. Spearman’s While calculating spearman’s correlation if item or value of any one
characteristic is repeated thrice at rank 3, then common rank to be assigned to each
item will be
3+ 3+ 3 3+ 4 +5 4+4+4 5+5+5
(a) 3 (b) 4 (c) 4 (d) 5
3 3 3 3
15. Great advantage of _____ is that it can be used to rank attributes which cannot be
expressed by way of numerical value.
(a) Concurrent correlation (b) Regression
(c) Rank correlation (d) None
298
CA Foundation
16. Compute the coefficient of rank correlation between Eco marks and Stats marks as
given below.
Eco Marks Stats Marks
80 90
56 75
50 75
48 65
50 65
62 50
60 65
(a) 0.25 (b) 0.38 (c) 0.15 (d) 0.71
17. For a group of 8 students, the sum of squares of differences in ranks for Maths and
Stats mark was found to be 50. What is the value of rank correlation coefficient?
(a) 0.23 (b) 0.40 (c) 0.78 (d) 0.92
18. If the sum of squares of difference of ranks, given by two judge A and B, of 8 students
in 21, what is the value of rank correlation coefficient?
(a) 0.7 (b) 0.65 (c) 0.75 (d) 0.8
19. While computing rank correlation coefficient between profit and investment for the
last 6 years of a company the difference in rank for a year was taken 3 instead of 4.
What is the rectified rank correlation coefficient if it is known that the original value
of rank correlation coefficient was 0.4?
(a) 0.3 (b) 0.2 (c) 0.25 (d) 0.28
Ans.
1 2 3 4 5 6 7 8 9 10
C A A A B C C B A B
11 12 13 14 15 16 17 18 19
B B C B C C B C B
Concurrent deviation
Concurrent deviation method :
Meaning :
Concurrent Deviation Method is based on the direction of change in the two paired
variables. The Correlation Coefficient between two series of direction of change is called
Coefficient of Concurrent Deviation. It is given by the formula :
2c-m
rc = ± ±
m
Where, rc = Coefficient of Concurrent Deviation.
C = Number of Positive signs after multiplying the direction of change of X series and Y
series.
M = Number of pairs of observation compared.(i.e. Number of + or - signs)
299
CA Foundation
Important Points :
1. It is quickest method to study correlation.
2. The results obtained by this method are only an approximate indicator of presence
or absence of correlation.
3. In this method, magnitudes are not considered and tendency of increasing or
decreasing is considered.
CLASS WORK - 7
1. ________ method is based on the direction of change the two paired variables
(a) Karl Pearson’s coefficient of correlation
(b) Spearman’s rank correlation
(c) Concurrent deviation
(d) None
2. In case of concurrent deviation method we are not concerned with the magnitude of
the variable
(a) True (b) False (c) (a) or (b) (d) None
300
CA Foundation
9. What is the coefficient of concurrent deviations for Demand and Supply relating to
following data:
Supply Demand Supply Demand
68 65 38 45
43 60 23 40
38 55 83 85
78 61 63 80
66 35 53 85
83 75
(a) 0.82 (b) 0.85 (c) 0.89 (d) -0.81
TRY YOURSELF - 3
1. Two numbers within the brackets denote the ranks of 10 students of class in two
subjects : (1,10), (2,9), (3,8), (4,7), (5,6), (6,5) , (7,4), (8,3), (9,2), (10,1), then rank
correlation coefficient is -
(a) 0 (b) -1 (c) 1 (d) 0.5
2. What is the value of: Rank correlation coefficient between the following marks in
Physics and Chemistry:
Roll No. 1 2 3 4 5 6
Physics 25 30 46 30 55 80
Chemistry 30 25 50 40 50 78
(a) 0.782 (b) 0.696 (c) 0.932 (d) 0.857
4. Let x1, x2....xn be the ranks of n individuals according to character A and y1, y2...yn be
the ranks of the same individuals according to other character B such that xi + yi = n +
1, then the coefficients of rank correlation between the characters A and B is.
(a) -1 (b) 1 (c) 0 (d) None of these
5. If the sum of the squares of rank differences in the marks of 10 students in two students
is 44, then the coefficient of rank correlation is _____
(a) 0.78 A (b) 0.73 X (c) 0.87 (d) None
6. If the rank correlation coefficient between marks in management and mathematics for
a group of student is 0.6 and the sum of squares of the differences in ranks in 66, what
is the number of students in the group?
(a) 10 (b) 9 (c) 8 (d) 11
7. For a number of towns, the coefficient of rank correlation between the people living
below the poverty line and increase of population is 0.50. If the sum of squares of the
differences in ranks awarded to these factors is 82.50, find the number of towns.
(a) 10 (b) 50 (c) 20 (d) 70
8. While computing rank correlation coefficient between profits and investment for 10
years of a firm, the difference in rank for a year was taken as 7 instead of 5 by mistake
and the value of rank correlation coefficient was computed as 0.80. What would be the
correct value of rank correlation coefficient after rectifying the mistake?
(a) 0.23 (b) 0.57 (c) 0.78 (d) 0.95
Ans.
1 2 3 4 5 6 7 8
B D D A D A A D
Introduction of Regression
We have already learns that correlation is used to measure the strength and direction of
association between two variables. In statistics, correlation denotes association between two
quantitative variables. It is assumed that this association is linear. That is, one variable increase
or decreases by a fixed amount for every unit of increase or decrease in the other variable.
Consider the relationship between the two variables in each of the following examples.
1. Advertising and sales of a product. (Positive correlation)
2. Height and weight of a primary school student. (Positive correlation)
3. The amount of fertilizer and the amount of crop yield. (Positive correlation)
4. Duration of exercise and weight loss. (Positive correlation)
5. Demand and price of a commodity. (Positive correlation)
6. Income and consumption. (Positive correlation)
7. Supply and price of a commodity. (Negative correlation)
8. Number of days of absence (in school) and performance in examination. (Negative
correlation)
302
CA Foundation
9. The more vitamins one consumes, the less likely one is to have a deficiency. (Negative
correlation)
Correlation coefficient measures association between two variables but cannot determine
the value of one variable when the value of the other variable is known or given. The
technique used for predicting the value of one variable for a given value of the other variable
is called regression. Regression is a statistical tool for investigating the relationship between
variables. It is frequently used to predict the value and to identify factors that cause an
outcome. Karl Pearson defined the coefficient of correlation known as Pearson’s Product
Moment correlation coefficient.
Carl Friedrich Gauss developed the method known as the Least Squares Method for finding
the linear equation that best describes the relationship between two or more variables. R.A.
Fisher combined the work of Gauss and Pearson to develop the complete theory of least
squares estimation in linear regression. Due to Fisher's work, linear regression is used for
prediction and understanding correlations.
Note: Some statistical methods attempt to determine the value of an unknown quantity,
which may be a parameter or a random variable. The method used for this purpose is called
estimation if the unknown quantity is a parameter, and prediction if the unknown quantity is
a variable.
303
CA Foundation
The linear regression model assumes that the value of the dependent variable changes in
direct proportion to changes in the value of an independent variable, regardless of values of
other independent variables. Linear regression is the simplest form of regression and there
are more general and complicated regression models. We shall restrict our attention only to
linear regression model in this chapter.
304
CA Foundation
Amount of fertilizer (X) (Kgs. In per acre) Yield (Y) (in ’00 kg)
30 43
40 45
50 54
60 53
70 56
80 63
The amount of fertilizer and the crop yield for six cases. These pairs of observations are
used to obtain the scatter diagram.
305
CA Foundation
Suppose the data consists of n pairs of values x1 y1 , x2 y2 ...... xn yn and suppose that
the line that fits best to the given data is written as follows. Ŷ = a + bX (Here, 𝑌̂ is to be read
as Y cap.)This equation is called the prediction equation. That is using the same values of
constants a and b, the predicted value of Y are given by 𝑌̂= a + bxi where xi is the value
of the independent variable and 𝑌̂i is the corresponding predicted value of Y. Note that the
observed value yi of the independent variable Y is different from the predicted value (𝑌̂𝑖 ).
The observed valued (yi) and predicted values (𝑌̂𝑖 ) of Y do not match perfectly because the
observations do not fall on a straight line. The Difference between the observed values and
the predicted values are called errors or residuals.
Mathematically speaking the quantities
y1 – 𝑌̂𝑖 , y2 – 𝑌̂2 _____ yn – 𝑌̂𝑛 or equivalently the quantities y1 – (a + bx1), y2 – (a + bx2),
______, (a + bxn) are deviations of observed values of Y from the corresponding predicted
values and are therefore called errors or residuals. We write e i = yi - 𝑌̂1= yi – (a + bx), for i =
1, 2, _______ n.
Geometrically, the residual ei, which is given by yi − (a + bxi), denotes the vertical distance
(which may be positive or negative) between the observed value (y) and the predicted value
(𝑌̂𝑖 ).
The principle of the method of least squares can be stated as follows.
Among all the possible straight lines that can be drawn on the scatter diagram, the line of
best fit is defined as the line that minimizes the sum of squares of residuals, that is, the sum
of squares of deviations of the predicted y-values from the observed y-values. In other
words, the line of the best fit is obtained by determining the constant a and b so that
𝑛 𝑛 𝑛
S 2 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2
𝑖=1
∑ 𝑦𝑖 = 𝑛𝑎 + 𝑏 ∑ 𝑥𝑖
𝑖=1 𝑖=1
306
CA Foundation
𝑛 𝑛 𝑛
∑ 𝑥𝑖 𝑦𝑖 = 𝑎 ∑ 𝑥𝑖 + 𝑏 ∑ 𝑥𝑖2
𝑖=1 𝑖=1 𝑖=1
When we solve these two linear equations, the values of a and b that minimize S2 are given
by
𝑐𝑜𝑣(𝑋𝑌)
𝑎 = 𝑦̅ − 𝑏𝑥̅ , 𝑏 =
𝜎𝑥2
Where
𝑛
1
cov(X, Y) = ∑ 𝑥𝑖 𝑦𝑖− 𝑥̅ 𝑦̅
𝑛
𝑖=1
and
𝑛
1
𝜎𝑥2 = ∑ 𝑥𝑖2 −−2
𝑥
𝑛
𝑖=1
Substituting the values of a and b obtainedas indicated above in the regression equation
Y= a + bX,
We get the equation
Y - 𝑦̂ – b(X - 𝑥̅ )
Note: The constant b is called the regression coefficient (or the slope of the regression line)
and the constant a is called the Y-intercept (that is, the Y value when X = 0). Recall that the
equation Y = a + bX defining a straight line is called the slope intercept formula of the straight
line.
When observations on two variables, X and Y, are available, it is possible to fit a linear
regression of Y on X as well as a linear regression of X on Y. Therefore, we consider both
the models in order to understand the difference between the two and also the relationship
between the two.
Regression of Y on X
We now introduce notation b xy for b when Y is the dependent variable and X is the
independent variable.
Linear regression of Y on X assumes that the variable X is the independent variable and the
variable Y is the dependent variable. In order to make this explicit, we express the linear
regression model as follows.
Y y b xy X x ,
Or
Here, note that b is replaced by byx.
cov X,Y
b xy
var(X)
x x y y
i i
=
x x
2
307
CA Foundation
= x y nx y
i i
x n x
2
2
i
Regression of X on Y
The notation b xy stands for b when X is the dependent variable and Y is the independent
variable. Linear regression of X on Y assumes that the variable Y is the independent variable
and the variable X is the dependent variable. In order to make it clear that this model is
different from the linear regression of Y on X, we express the linear regression model as
follows.
X = a + b ’Y
The method of least squares, when applied to this model leads to the following expressions
for the constant a′ and b′.
a' x b' y
and
cov X,Y
b'
var(X)
Substituting the values of a′ and b′ in the linear regression model, we get
X a' b' Y and
X x b' y b' Y
i.e. X x b ' Y y
Note: The constant b′ in the above equation is called the regression coefficient of X on Y. In
order to make this explicit, it will henceforth be written as b xy instead of b′. The least squares
regression of X on Y will therefore be written as
X x byx Y y .
Note that the linear regression of X on Y is expressed as
X a' byx Y
Here note that b is replaced by b xy This can be written as
1
Y X a'
byx
1
Showing that the constant is the slope of the line of regression of X on Y.
b yx
Further, note that the regression coefficient b xy involved in the linear regression of X on Y is
given by
308
CA Foundation
cov(X,Y)
b xy
var(Y)
1
xi x yi y
n
1
2
yi y
n
x y nx y
i i
x n x
2
2
i
Observed that the point x, y satisfies equation of both the lines of regression. Therefore,
the point x, y is the point of intersection of the two lines regression.
Properties of Regression
The line of regression of Y on X is given by Y a byx X and the line of regression of X on
Y is given by X a' bxy Y
cov(X,Y)
Here, b xy , the slope of the line of regression of Y on X is called the regression
var(Y)
coefficient of Y on X. Similarly.
cov(X,Y)
b xy , the slope of the line of regression of X on Y is called the regression
var(Y)
coefficient of X on Y. These two regression coefficients have the following property.
cov(X,Y) cov(X,Y)
b XY bYX ,
var(Y) var(X)
2
cov(X,Y)
X Y
Can it be said that the correlation coefficient is the square root of the product of the
two regression coefficients?
309
CA Foundation
the relation.
cov(X,Y) r.X Y
cov(X,Y)
r
X Y
The regression coefficients can also be written as follows.
cov(X,Y)
bYX
2x
r x . y
2x
y
r
x
and
cov(X,Y)
b XY
2y
r x .y
2y
x
r
y
b YX b XY
(c) |r |
2
Proof: We have already seen that
bYX r Y and b XY r X
X Y
where X and Y are the standard deviations of X and Y, respectively.
Therefore,
Y
bYX b XY r r X
X Y
Y X
X Y
310
CA Foundation
X
Y . . . . (1)
X Y
But ( X Y 0 and therefore
2
2 X 2 Y 2XY 0
2 X 2 Y 2 XY
2 Y 2 X
2
X Y
2 Y 2 X
r 2r (2)
X Y
From (1) and (2), we have
bYX bXY 2r
b YX b XY
r
2
this result shows that the arithmetic mean of the two regression coefficients, namely
bYX and bXY is greater than or equal to r. This result, however, holds only when bYX,
bXY and r are positive. (Can you find the reason?) Consider the case where bYX = 0.8
and bXY = 0.45 In this case, we have r = −0.6. (Can you find the reason?)
Note that bYX + bXY = -1.25, and 2r = −1.2. This shows that bYX + bXY ≤ 2r.
It may be interesting to note that
cov(XY)
b YX
2x
cov(XY)
b XY
2Y
cov(XY)
r
X Y
It is evident from the above three equations that all the coefficients have the same n
have the same sign. In other words, if numerator and this numerator determines their
sign. As the result, all these coefficients r > 0, then bYX > 0, and bXY > 0. Similarly. if
r < 0. then bYX < 0, and bXY < 0. Finally. if r = 0, then bYX = bXY = 0.
(d) bYX and bXY are not affected by change of origin, but are affected by change of scale.
This property is known as invariance property.
The invariance property states that bYX and bXY are invariant under change of origin,
but are not invariant under change of scale.
311
CA Foundation
Xa Y b
Proof: Let U = and V
h k
where a, b, h and k are constants with the condition that h, k ≠ 0
We have already proved that Y k v 2 and cov(X, Y) = hkcov(U, V).
2 2 2
Therefore,
cov(XY)
b YX
2x
hk cov(U,V)
=
h2 U2
k cov(U,V)
=
h U2
That is,
k
b YX b VU
h
Similarly,
k
b XY bUV
h
These two results show that regression coefficients are invariant under change of
origin, but are not invariant under change of scale.
CLASS WORK - 8
2. The process of developing an algebraic equation between two variables & predicting
the value of one variables & predicting the value of one variables given the value of
other variables is known as
(a) Correlation analysis (b) Regression analysis
(c) (a) or (b) (d) none
3. The variables whose value is predicted using the algebraic equation is called
(a) Dependent variable
(b) Response variable
(c) Explained variable or regressed variable
(d) All of the above
312
CA Foundation
4. The variables whose value is used as the basis for prediction is called
(a) Independent variable
(b) Predictor variable
(c) Regressor or explanatory variable
(d) All of the above
5. The strength or degree of relationship between two variables is measured using ______
(a) Regression analysis (b) Correlation analysis
(c) Both (d) None
10. If there are two variables x and y, then the number of regression equations could be _____
(a) 1 (b) 2 (c) Any Number (d) 3
11 _________ gives the mathematical relationship of the variables.
(a) Correlation (b) Regression (c) Both (d) None
313
CA Foundation
17. The line of regression passes through the points, bearing ________ no. of points on
both sides.
(a) equal (b) unequal (c) zero (d) none
18. The method applied for deriving the regression equations is known as
(a) Least squares (b) Concurrent deviation
(c) Product moment (d) Normal equation.
19. The difference between the observed value and the estimated value in regression
analysis is known as
(a) Error (b) Residue (c) Deviation (d) (a) or (b).
25. b yx
x y
(a) r (b) r (c) Both (d) None
y x
26. Regression line y on x is
(a) x - x b xy y - y (b) x + x b xy y + y
(c) y + y = b xy x - x (d) None
314
CA Foundation
33. b yx
x cov(x,y)
(a) r (b)
y var(y)
(c)
xy - x y (d) None
x n x
2 2
315
CA Foundation
37. The HRD manager of a company wants to find a measure which he can use to fix the
monthly income of persons applying for a job in the production Department. As an
experimental project, he collected data of 7 persons from that department referring
to years of service and their monthly incomes.
Years of service (X) 11 7 9 5 8 6 10
Income (₹ in 1000's) (Y) 10 8 6 5 9 7 11
(i) Find the regression equation of income on years of service.
(ii) What initial start would you recommend for a person applying for the job after
having served in similar capacity in another company for 13 years?
(a) y=2+075×11,750 (b) y=3+0.75× 12,750
(c) y=4+0.8×14,400 (d) y=2-0.75×11,750
38. The management of a large furniture store would like to determine sales (in
thousands of `. (X) on a given day, on the basis of number of people (Y) visited the
store on that day. The necessary records were kept and a random sample of ten
days was selected for the study. The summary results were as follows:
x = 370, y = 580, x 2
= 17206, y 2
= 41658, xy = 11494,n = 10
Obtain the line of regression of X on Y.
(a) y = 109.0912 - 1.243 x (b) x = 109.0912 - 1.243 y
(c) x = 109.0912 + 1.243 y (d) y = 109.0912 + 1.243 x
Ans.
1 2 3 4 5 6 7 8 9 10
D B D D B A C A B B
11 12 13 14 15 16 17 18 19 20
B B D A B A A A D D
21 22 23 24 25 26 27 28 29 30
A A C B B D A B B A
31 32 33 34 35 36 37 38
A C D B A B A B
TRY YOURSELF - 4
1. The statistical method which helps us to estimate or predict the unknown value of
one variable from the known value of the related variable is called
(a) Correlation (b) Scatter diagram (c) Regression (d) Dispersion
316
CA Foundation
6. Following are the two normal equations obtained for deriving the regression line of y
and x:
5a + 10b = 40
10a + 25b = 95
The regression line of y on x is given by
(a) 2x + 3y = 5 (b) 2y + 3x =5 (c) y = 2 + 3 x (d) y = 3 + 5x
7. Find the two regression equation from the following data and estimate y when x is 13,
x when y is 15.
X 2 4 5 5 8 10
y 6 7 9 10 12 12
(a) 16.2546 and 11.2489 ↓j→ (b) 15.3063 and 11.75 f∩
(c) 14.6352 and 10.50 (d) 18.2453 and 12.85
9. For y = 25, what is the estimated value of x, from the following data:
X 11 12 15 16 18 19 21
Y 21 15 13 12 11 10 9
(a) 15 (b) 13.926 (c) 13.588 (d) 14.986
10. The following data relate to the heights of 10 pairs of fathers and sons:
(175.173), (172,172), (167,171), (168,171),
(172.173), (171,170), (174,173), (176,175),
(169,170), (170,173)
The regression equation of height of son on that of father is given by
(a) Y=100+5x (b) Y=99.708+0.405x
(c) Y=89.653+0.582x (d) Y=88.758+0.562x
11. Given below the information about the capital employed and profit earned by a
company over the last twenty five years:
Particulars Mean SD
Capital employed (000 Rs.) 62 5
Profit earned (000 Rs.) 25 6
317
CA Foundation
Correlation coefficient between capital and profit = 0.92. The sum of the Regression
coefficients for the above data would be:
(a) 1.871 (b) 2.358 (c) 1.968 (d) 2.346
13. The following data relate to the expenditure on advertisement in thousands of rupees
and the corresponding sales in lakhs of rupees.
Expenditure on Ad 8 10 10 12 15
Sales 18 20 22 25 28
Find the appropriate regression equation.
(a) y = 6.4927+ 1.4643x (b) y = 7.5864 - 2.6451x
(c) y = 8.3527 + 4.3673x (d) y = 7.4575+ 1.7648x
Ans.
1 2 3 4 5 6 7 8 9 10
C B D A B C B C C B
11 12 13
A C A
318
CA Foundation
CLASS WORK - 9
7. If X and Y are independent, the value of regression coefficient byx is equal to:
(a) 0 (b) 1
(c) 2 (d) any positive value
11. If byx and bxy are two regression coefficients, they have
(a) Same sign (b) Opposite sign
(c) Either same or opposite signs (d) Nothing can be said
13. Given the following equations as 3x + y = 13 and 2x + 5y = 20, which one is the
regression equation of y on x?
(a) 1st equation (b) 2 nd equation
(c) Both (a) and (b) (d) None
320
CA Foundation
14. Given the following equations: 7x + 3y = 90 and 3x + 4y = 15, which one is the
regression equation of y on x?
(a) 1st equation (b) 2 nd equation
(c) Both the equations (d) None
15. If the regression line of y on x and that of x on y are given by y = -2x + 3 and 8x =
-y+ 3 respectively, what is the coefficient of correlation between x and y?
-1
(a) 0.5 (b) (c) -0.5 (d) None
2
19. The greater the angle between the regression lines ______ the correlation
between the variable
(a) lesser (b) higher (c) medium (d) None
20. If the two lines of regression are perpendicular to each other, the relation
between the two regression coefficient is
(a) byx = bxy (b) byx bxy = 1 (c) byx < bxy (d) byx = bxy =0
21. If the two lines of regression are identical to each other, the relation between the
two regression coefficient is
(a) byx = bxy (b) byx bxy = 1 (c) byx < bxy (d) byx = - bxy
27. Correlation coefficient r lies between the regression coefficients b yx and bxy
(a) True (b) False (c) Both (d) None
28. Since the correlation coefficient r is the _____ of the two regression coefficients
byx and bxy
(a) A.M (b) G.M (c) H.M (d) None
32. The point which always lies on the two lines of regression is
(a) (𝑥̅ , 𝑦̅) (b) (bxy, byx) (c) (σx, σy) (d) (0,0)
33. If there are two variables x and y, then the number of regression equations could
be
(a) 1 (b) 2 (c) Any number (d) 3
322
CA Foundation
39. The method applied for deriving the regression equations is known as
(a) Least squares (b) Concurrent deviation
(c) Product moment (d) Normal equation
40. Feature of least square regression lines are "The sum of the deviations at the Y's or
the X's from their regression lines are zero".
(a) True (b) False (c) Both (d) None
42. The line of regression passes through the points, bearing _____ no. of points on
both sides
(a) Equal (b) Unequal (c) Zero (d) None
44. Two lines of regression are given by 5x+7y-22=0 and 6x+2y-22=0. If the variance of
y is 15 find the standard deviation of x.
(a) 2.646 (b) 6.246 (c) 7.612 (d) 3.646
45. If two regression lines are: y = 4 + kx and x = 5+4y, then the range of k is -
(a) k ≤ 0 (b) k ≥ 0 (c) 0 ≤ k ≤ 1 (d) 0 ≤ 4k ≤ 1
46. If the relationship between two variables x and u is u + 3x = 10 and between two
other variables y and v is 2y + 5v = 25, and the regression coefficient of y on x is
known as 0.80, then what would be the regression coefficient of v on u?
(a) 0.32 (b) 0.1066 (c) 0.2548 (d) 0.1586
47. For the variables x and y, the regression equations are given as 7x-3y-18 = 0 and4x-
y-ll = 0 . After finding the arithmetic means of x and y, compute the correlation
coefficient between x and y. If the variance of x is 9, find the SD of y.
(a) 8.5642 (b) 6.2453 (c) 9.1647 (d) 7.4789
323
CA Foundation
324
CA Foundation
HOME WORK
cov x1 , y S S
3
cov 2 x2 , y x y Sx S y
(a) (b) (c) (d)
Sx S y Sx S y cov x, y cov x, y
[Dec. 2022]
2. The coefficient of rank correlation between the ranking of following 6 students in
two subjects.
Mathematics and Statistics is:
Mathematics 3 5 8 4 7 10
Statistics 6 4 9 8 1 2
(a) -0.25 (b) 0.35 (c) 0.38 (d) 0.20
[Dec. 2022]
4. If the data points of (X, Y) series on a scatter diagram lie along a straight line that
goes downwards as X-values move from left to right, then the data exhibit
________correlation.
(a) Direct (b) Imperfect indirect
(c) Indirect (d) Imperfect direct [Dec. 2021]
5. If the sum of the product of the deviation of and Y from their means is zero, the
correlation coefficient be- tween X and is:
(a) Zero (b) Positive (c) Negative (d) 10
[July 2021]
325
CA Foundation
8. Scatter diagram does not help us to
(a) Find the type of correlation
(b) Identify whether variables correlated or not
(c) Determine the linear (or) non- linear correlation
(d) Find the numerical value of correlation coefficient [Dec. 2020]
13. If the correlation coefficient between the variables X and Y is 0.5, then the
correlation coefficient between the variables 2 x 4 and 3 2y is
(a) 0.5 (b) 1 (c) -0.5 (d) 0
[Nov. 2018]
14. If the sum of squares of deviations of ranks of 8 students is 50 then the rank
correlation coefficient is :
(a) 0.40 (b) 0.45 (c) 0.5 (d) 0.8
[June 2018]
326
CA Foundation
17. In the method of Concurrent Deviations, only the directions of change (Positive
direction/Negative direction) in the variables are taken into account for
calculation of
(a) Coefficient of S.D (b) Coefficient of regression
(c) Coefficient of correlation (d) None [May 2018]
23. For positive and perfectly correlated random variables, one of the regression
coefficient is 1.3 and the standard deviation of X is 2, the variance of Y is
(a) 2.66 (b) 6.76 (c) 6.56 (d) 3.16 [June 2021]
24. For any two variables x and y the regression equations are given as 2x + 5y-9=0
and 3x-y-5=0. What are the A.M. of x and y?
(a) 2,1 (b) 1,2 (c) 4,2 (d) 2,4 [Dec. 2021]
327
CA Foundation
26. The straight-line graph of the linear equation Y = a + b X, slope is horizontal if:
29. If the regression line of Y on X is given by Y=X+2 and Karl Pearson's coefficient
y2
of correlation is 0.5 then _______.
x2
(a) 3 (b) 2 (c) 4 (d) None
[June.2019]
30. The two lines of regression intersect at the point:
(a) Mean (b) Median (c) Mode (d) None of the these
[Nov. 2018]
31. Correlation coefficient can be found out by
(a) Scatter Diagram (b) Rank Method (c) Both (d) None
32. When we are not concerned with the magnitude of the two variables under
discussion, we consider
(a) Rank correlation coefficient
(b) Product moment correlation coefficient
(c) Coefficient of concurrent deviation
(d) (a) or (b) but not (c)
33. In case' Insurance Companies' Profits and the no .of claims they have to pay" :
(a) Positive correlation (c) No correlation
(b) Negative correlation (d) None of these
35. Correlation coefficient between x and y is zero then two regression lines are
(a) Perpendicular to each other (c) Parallel to each other
(b) Coincide to each other (d) None of these
328
CA Foundation
36. If the correlation coefficient between the variables X and Y is 0.5, then the
correlation coefficient between the variables 2x - 4 and 3- 2y is
(a) 0.5 (b) 1 (c) -0.5 (d) 0
38. In case speed of an automobile and the distance required to stop the car after
applying brakes correlation is
(a) Positive (b) Negative (c) Zero (d) None
500
39. A relationship r2 =1 - is not possible
300
(a) True (b) False (c) Both (d) None
2
43. Find the probable error if r = and N = 36
10
(a) 0.6745 (b) 0.067 (c) 0.5287 (d) None
46. If scatter diagram from a line move from lower left to upper right corner, then
correlation is
(a) Perfect Positive (b) Perfect Negative
(c) Simple Positive (d) No correlation
329
CA Foundation
47. If correlation coefficient between x and y is 0.5, then find the correlation
coefficient between 2x-3 and 3-5y is
(a) 0.5 (b) -0.5 (c) 2.5 (d) -2.5
48. If the equation of the two regression lines are 2x-3y=0 and 4y -5x= 8 then the
correlation coefficient between x and y is equal to
8 15 6 1
(a) (b) (c) (d)
15 8 15 15
49. Consider to regression line 3x + 2y = 26, 6x + y = 31. Find the correlation coefficient
between x and y
(a) 0.5 (b) -0.5 (c) 0.25 (d) -0.25
52. For the set of observations {(1,2), (2, 5), (3,7), (4,8), (5,10)} the value of
karlperson’s coefficient of correlation is approximately given by
(a) 0.755 (b) 0.655 (c) 0.525 (d) 0.985
53. The coefficient of correlation between x and y is 0.5 the covariance, is 16, and
the standard deviation of y is if SD of x is 4
(a) 4 (b) 8 (c) 16 (d) 64
55. Given that the variance of x is equal to the square of standard deviation by and
the regression line of y on x is y = 40 + 0.5 (x-30).
Then regression line of x on y is
(a) y = 40 + 4 (x - 30) (b) y = 40 +(x-30)
(c) y = 40+ 2 (x-30) (d) x = 30 + 2 (y - 40)
330
CA Foundation
56. The regression coefficients remain unchanged due to
(a) A shift of scale (b)A shift of origin
1 1
(c) Replacing x - values by (d)Replacing y values by
x y
57. If y = 9x and x = 0.01 y then r is equal to:
(a) -0.1 (b) 0.1 (c) +0.3 (d) -0.3
58. The straight-line graph of the linear equation Y = a + b X, slope is horizontal if:
(a) b = 1 (b) b 0 (c) b = 0 (d) a = b 0
59. If byx = -1.6 and bxy = -0.4, then rxy will be:
(a) 0.4 (b) -0.8 (c) 0.64 (d) 0.8
60. If the sum of the product of the deviations of X and Y from their means is zero
the correlation coefficient between X and Y is:
(a) Zero (b) Positive (c) Negative (d) 10
61. If the slope of the regression line is calculated to be 5.5 and the intercept 15 then
the value of Y and X is 6 is:
(a) 88 (b) 48 (c) 18 (d) 78
62. The sum of square of any real positive quantities and its reciprocal is never less than:
(a) 4 (b) 2 (c) 3 (d) 4.
63. If the data points of (X, Y) series on a scatter diagram lie along a straight line that
goes downwards as X-values move from left to right, then the data exhibit
_________correlation
(a) Direct (b)Imperfect indirect
(c) Indirect (d) Imperfect direct
64. The intersecting point of two regression lines falls at X-axis. If the mean of X-
values is 16, the standard devaluation of X and Y are respectively, 3 and 4, then
the mean of Y-values is
(a) 16/3 (b) 4 (c) 0 (d) 1
66. If the plotted points in a scatter diagram lie from lower left to upper right, then
(a) Negative (b) Perfect Negative (c) Indirect (d) Positive
331
CA Foundation
67. For finding correlation between two qualitative characteristics, we use
(a) Coefficient of rank correlation
(b) Scatter diagram
(c) Coefficient of concurrent deviation
(d) Product moment correlation coefficient
69. For positive and perfectly correlated random variables, one of the regression
coefficient is 1.3 and the standard deviation of X is 2, the variance of Y is
(a) 2.66 (b) 6.76 (c) 6.56 (d) 3.16
70. The coefficient of rank correlation between the ranking of following 6 students in
two subjects.
Mathematics and Statistics is :
Mathematics 3 5 8 4 7 10
Statistics 6 4 9 8 1 2
(a) -0.25 (b) 0.35 (c) 0.38 (d) 0.20
72. The equations of the two lines of regression are 4x + 3y+7=0 and 3x + 4y +8=0.
Find the correlation coefficient between x and y.
(a) -0.75 (b) 0.25 (c) -0.92 (d) 1.25
73. If the regression equations are 2x+3y+1=0 and 5x+6y+1=0, then Mean of x and
y respectively are
(a) -1,-1 (b) -1,1 (c) 1,-1 (d) 2,3
74. If b, yx = 0.5, b, xy= 0.46 then the value of correlation coefficient r is:
(a) 0.23 (b) 0.25 (c) 0.39 (d) 0.48
75. For variables X and Y, we collect the four observations with ΣX= 10; ΣY 14; ΣΧ2 = 65;
ΣΥ2 = 5 and ΣXY = 3. What is the regression line of Y on X?
(a) Y = - 0.8X - 5.5 (b) Y = 0.8X - 5.5 (c) Y = - 0.8X + 5.5 (d) Y = 0.8X + 5.5
76. The regression lines will be perpendicular to each other when the value of r is:
(a) 1 (b) -1 (c) 1/2 (d) 0
332
CA Foundation
77. Given that r = 0.4 and n = 81, determine the limits for the population correlation
coefficient.
(a) (0.333, 0.466) (b) (0.367, 0.433) (c) (0.337, 0.463) (d) (0.373, 0.427)
79. If the regression equations are x+2y-5=0 and 2x + 3y-8= 0, then the mean of x
and the
mean of y are_________, respectively:
(a) -3 and 4 (b) 2 and 4 (c) 1 and 2 (d) 2 and 1.
81. The regression equation are 8x - 10y + 66 = 0 and 40 x - 18y = 214 find the coefficient
of correlation
(a) 4/5 (b) -4/5 (c) 3/5 (d) -1
82. If two regression lines are: x +3y = 7 and 2x+5y=12 then x and y are respectively.
(a) 2, 1 (b) 1, 2 (c) 2,3 (d) 2,4
83. If the mean of two variables x & y are 3 and 1 respectively. Then the equation of two
regression lines are _____
(a) 5x+7y-22=0 & 6x+2y-20=0 (c) 5x+7y+22=0 & 6x+2y-20=0
(b) 5x+7y-22=0 & 6x+2y+20=0 (d) 5x+7y+22=0 & 6x+2y+20=0
84. Out of the two lines of regression given by x+2y=4 and 2x+3y-5=0, the regression line
of x on y is:
(a) 2x+3y-5=0 (b) x+2y=4
(c) x+2y=0 (d) The given lines can't be regression lines.
85. Equations of two lines of regression are 4x+3y+7 = 0 and 3x+4y +8 = 0, the mean of x
and y are
(a) 5/7 and 6/7 (c) 2 and 4
(b) -4/7 and-11/7 (d) None of these
86. If the regression line of y on x and that of x on y are given by y = -2x + 3 and 8x = -y + 3
respectively, what is the coefficient of correlation between x and y?
(a) 0.5 (b) -1/√2 (c) -0.5 (d) None of these
87. 8x - 3y +7 = 0, 14x - 7y + 6 = 0 are two regression equations then the correlation coefficient,
r=
(a) 0.86 (b) -0.86 (c) 0.45 (d) -0.45
333
CA Foundation
88. If y =3x + 4 is the regression line of y on x and the arithmetic mean of x is -1,
what is the arithmetic mean of y?
(a) 1 (b) -1 (c) 7 (d) None of these.
92. If the coefficient of correlation between two variables is - 0.2, then the coefficient of
determination is
(a) 0.4 (b) 0.02 (c) 0.04 (d) 0.16
93. If the coefficient of correlation between two variables is -0.3, then the coefficient of
determination is
(a) 0.3 (b) 0.09 (c) 0.7 (d) 0.9
94. If the coefficient of correlation between two variables is -0.9, then the coefficient of
determination is
(a) 0.9 (b) 0.81 (c) 0.1 (d) 0.19
95. The coefficient of correlation between two variables is 0.5, then the coefficient of
determination is
(a) 0.5 (b) 0.25 (c) -0.5 (d) √0.5
96. If the coefficient of correlation between two variables is 0.6, then the percentage of
variation accounted for is
(a) 60% (b) 40% (c) 64% (d) 36%
97. If the coefficient of correlation between two variables is 0.6, then the percentage of
variation unaccounted for is
(a) 60% (b) 40% (c) 64% (d) 36%
98. If the coefficient of correlation between two variables is 0.7 then the percentages of
variation unaccounted for is
(a) 70% (b) 30% (c) 51% (d) 49%
334
CA Foundation
100. A relationship r2 = 1 - 580 is not possible
(a) True (b) False (c) Both (d) None
101. Find the coefficient of correlation when its probable error is 0.2 and the number of pairs
of item is 9.
(a) 0.505 (b) 0.332 (c) 0.414 (d) 0.316
102. The two regression lines are: 16x - 20y + 132 =0 and 80x -30y - 428 = 0, the value of
correlation coefficient is
(a) 0.6 (b) -0.6 (c) 0.54 (d) 0.45
103. Two regression equations are x+y=6 and x+2y=10, then correlation coefficient between x
and y is
1 1
(a) -1/2 (b) 1/2 (c) - (d)
2 √2
105. If the two lines of regression are x +2y -5 = 0 and 2x+3y -8=0, then the regression line
of y on x is
(a) x +2y -5 = 0 (c) 2x + 3y - 8 = 0
(b) x +2y = 0 (d) 2x + 3y = 0
106. If the two regression lines are 3X = Y and 8Y = 6X, then the value of correlation
coefficient is
(a) -0.5 (b) 0.5 (c) 0.75 (d)-0.80
335
CA Foundation
TEST PAPER
2. Take 200 and 150 respectively as the assumed mean for X and Y series of 11
values, then dx = X - 200, dy = Y - 150, dx = 13, dx2 = 2667, dy = 42, dy2
= 6964, dx dy = 3943. The value of r is:
(a) 0.77 (b) 0.98 (c) 0.92 (d) 0.82
3. For some bivariate data, the following results were obtained for the two variables
4. If the sum of squares of the rank difference in mathematics and physics marks
of 10 students is 22, then the coefficient of rank correlation is:
(a) 0.267 (b) 0.867 (c) 0.92 (d) None
7. If the covariance between two variables is 20 and the variance of one of the
variables is 16, what would be the variance of the other variable?
(a) More than 10 (b) More than 100
(c) More than 1.25 (d) Less than 10
10. If the sum of square of differences of rank is 50 and number of items is 8 then
what is the value of rank correlation coefficient.
(a) 0.59 (b) 0.40 (c) 0.36 (d) 0.63
336
CA Foundation
11. The correlation coefficient between x and y is - 1/2. The value of bxy -1/8.
Find byx.
(a) -2 (b) -4 (c) 0 (d) 2
15. The two regression lines are 7x-3y- 18 = 0 and 4x-y- 11 =0.Find the values of byx
and bxy
(a) 7/3, 1/4 (b) -7/3,-1/4 (c) -3/7,-1/4 (d) None of these.
19. For a bivariate data, two lines of regression are 40x -18y = 214 and 8x -10y + 66
= 0, then find the values of x and y
(a) 17 and 13 (b) 13 and 17 (c) 13 and-17 (d) -13 and 17
20. Three competitors in a contest are ranked by two judges in the order 1,2,3 and
2,3,1 respectively. Calculate the Spearman's rank correlation coefficient.
(a) -0.5 (b) -0.8 (c) 0-5 (d) 0.8
337
CA Foundation
21. If Y is dependent variable and X is Independent variable and the S.D of X and Y
are 5 and 8 respectively and Co-efficient of co-relation between X and Y is 0.8.
Find the Regression coefficient of Y on X.
(a) 0.78 (b) 1.28 (c) 6.8 (d) 0 32
22. If the regression lines are 8x - 10y + 66 = 0 and 40x - 18y = 214, the correlation
coefficient between ‘x’ and ‘y’ is :
(a) 1 (b) 0.6 (c) - 0.6 (d) -1
23. The coefficient of correlation between two variables x and y is the simple
__________of the two regression coefficients.
(a) Arithmetic Mean (b) Geometric Mean
(c) Harmonic Mean (d) None of the above.
24. If the covariance between variables X and Y is 25 and variance of X and Y are
respectively 36 and 25, then the coefficient of correlation is
(a) 0.409 (b) 0.416 (c) 0.833 (d) 0.0277
28. Two regression lines for a bivariate data are: 2x - 5y + 6 = 0 and 5x-4y 4-3 = 0.
Then the coefficient of correlation should be:
2 2 2 2 2 2
(a) 5 (b) 5 (c) 5 (d) 5
31. The coefficient of correlation between the temperature of environment and power
consumption is always:
(a) Positive (b) Negative (c) Zero (d) Equal to 1
338
CA Foundation
32. If two regression lines are x + y = 1 and x - y = 1 then mean values of x and y will
be:
(a) 0 and 1 (b) 1 and 1 (c) 1 and 0 (d) -1 and-1
36. If the correlation coefficient between the variables X and Y is 0.5, then the
correlation coefficient between the variables 2x - 4 and 3 - 2y is
(a) 1 (b) 0.5 (c) -0.5 (d) 0
i 1 n n2 1 i 1 n n 2 1
(c) (d)
38. Determine Spearman’s rank correlation coefficient from the given data d2 = 30,
n = 10:
(a) r = 0.82 (b) r = 0.32 (c) r = 0.40 (d) None of the above
339
CA Foundation
Answers:
1 A 2 C 3 D 4 B 5 C
6 C 7 B 8 A 9 B 10 B
11 A 12 B 13 B 14 B 15 A
16 C 17 C 18 B 19 B 20 A
21 B 22 B 23 B 24 C 25 A
26 B 27 C 28 C 29 B 30 C
31 A 32 C 33 D 34 D 35 B
36 C 37 B 38 A 39 B 40 A
340