0% found this document useful (0 votes)
580 views67 pages

Final - Chp. 17 Correlation and Regression

Uploaded by

vinit tandel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
580 views67 pages

Final - Chp. 17 Correlation and Regression

Uploaded by

vinit tandel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

CA Foundation

5 CORRELATION AND REGRESSION

 Bivariate data

 Correlation

 Methods of Studying Correlation

 Karl Pearson’s Coefficient Of Correlation

 Probable Error and Standard Error

 Non- Repeated & Repeated Ranks

 Concurrent Deviation

 Regression

 Regression line of y on x

 Formulae of Regression Equation of y on x

 Regression Equations

 Properties of Regression Coefficients

274
CA Foundation

Bivariate Data
Introduction :
So far we have studied the statistical methods used for analysis of data involving one
variable only. However, we may come across certain situations where observing two
variables is needed. e.g. we may have data regarding
(i) Demand and Price of a certain commodity over a period of time.
(ii) Weight of a person and Blood Pressure of the person.
(iii) Quantity of water given and yield of a crop.

Such a set of observations or measurements made on two variables is called Bivariate


Data. These variables are denoted by X and Y respectively. Then observations on two
variables X and Y can be represented by n ordered pairs
 x1, y1,  x2, y2 , ………..  xn, yn ,.
When data are collected on two variables simultaneously, they are known as bivariate
data and the corresponding frequency distribution, derived from it, is known as Bivariate
Frequency Distribution.

Example 1 : Prepare a Bivariate Frequency table for the following data relating to the
(narks in statistics (x) and Mathematics (y):
(15,13), (1,3) (2,6) (8,3) (15,10) (3,9) (13,19)
(10,11) (6,4) (18,14) (10,19) (12,8) (11,14) (13,16)
(17,15) (18,18) (11,7) (10,14) (14,16) (16,15) (7,11)
(5,1) (11,15) (9,4) (10,15) (13,12) (14,17) (10,1)
(6,9) (13,17) (16,15) (6,4) (4,8) (8,11) (9,12)
(14,11) (16,15) (9,10) (4,6) (5,7) (3,11) (4,16)
(5,8) (6,9) (7,12) (15,6) (18,11) (18,19) (17,16)
(10,14)
Take mutually exclusive classification for both the variables, the first class interval being
0-4 for both.

Solution : From the given data, we find that


Range for x = 19 - 1 = 18
Range for y = 19 - 1 = 18

We take the class intervals 0-4, 4-8, 8-12, 12-16, 16-20 for both the variables. Since the
first pair of marks is (15, 13) and 15 belongs to the fourth class interval (12-16) for x and
13 belongs to the fourth class interval for y, we put a stroke in the (4,4)-th cell. We carry
on giving tally marks till the list is exhausted
275
CA Foundation

Bivariate Frequency Distribution of Marks of Statistics and Mathematics


MARKS IN MATHS
Y 0-4 4-8 8-12 12-16 16-20 Total
X
0-4 I (1) I (1) II (2) 4
4-8 I (1) IIII (4) IIII (5) IIII I I (1) 12
MARKS IN 8-12 I (1) II (2) IIII (4) II I (1) 14
STATS
12-16 I (1) III (3) IIII (5) IIII (5) 11
16-20 III (3) 9
Total 3 8 15 14 10 50
We note, from the above table, that some of the cell frequencies (fij) are zero. Starting
from the above Bivariate Frequency Distribution, we can obtain two types of univariate
distributions which are known as :
(a) Marginal distribution.
(b) Conditional distribution.
If we consider the distribution of stat marks along with the marginal totals presented in
the last column of, we get the marginal distribution of marks of Statistics. Similarly, we
can obtain one more marginal distribution of Mathematics marks. The following table
shows the marginal distribution of marks of Statistics.
Marginal Distribution of Marks of Statistics
Marks No. of Students
0-4 4
4-8 12
8-12 14
12-16 11
16-20 9
Total 50
We can find the mean and standard deviation of marks of Statistics from Table they would
can find the known as marginal mean and marginal SD of stats marks. Similarly, we can
obtain that marginal mean and marginal SD of Maths marks. Any other statistical measure
in respect of x or y can be computed in a similar manner.
If we want to study the distribution of Stat Marks for a particular group of students, say
for those students who got marks between 8 to 12 in Maths, we come across another
univariate distribution known as conditional distribution.
Conditional distribution of Marks in Statistics for Students
Having Mathematics Marks between 8 to 12
Marks No. of Students
0-4 2
4-8 5
8-12 4
12-16 3
16-20 1
Total 15

276
CA Foundation

We may obtain the mean and SD from the above table. They would be known as conditional
mean and conditional SD of marks of Statistics. The same result holds for marks of
Mathematics. In particular, if there are m classification for x and n classifications for y,
then there would be altogether (m + n) conditional distribution.
Note :
1) For a (5 x 5) i.e. (p x q) classification of bivariate data, the maximum no of
conditional distributions is 5 + 5 = 10 (i.e. p + q.)
2) Some of the cell frequencies in bivariate Frequency table may be zero.
3) For a 5  5 (p  q) bivariate frequency table, the maximum no. of marginal
distributions is 2.

CLASS WORK - 1

1. Bivariate Data are the data collected for


(a) Two variables
(b) More than two variables
(c) Two variables at the same point of time
(d) Two variables at different points of time

2. Univariate Data is the data collected for


(a) One variable (b) Two variables
(c) Three variables (d) None

3. The statistical method, to find the relation /association between two variables of
bivariate data, known as
(a) Correlation (b) Regression (c) Mean (d) None

4. For a (p x q) classification of bivariate data, the maximum no of conditional


distributions is
p
(a) p+q (b) pq (c) (d) None of these
q
5. For a (p  q) bivariate frequency table, the maximum no. of marginal distributions is
(a) 2 (b) 4 (c) 1 (d) None of these

6. Some of the cell frequencies in a bivariate frequency table may be


(a) Negative (b) Zero (c) a or b (d) None

Ans.

1 2 3 4 5 6
C A A A A B
277
CA Foundation

Correlation
If it is evident from the bivariate data that the change in value of one variable induces
change in value of other variable, then the variables are said to be correlated and we say
that there is a correlation between the two underlying variables. In other words,
correlation is the mutual or joint relationship between the two variables. For the study of
correlation, there must be evidence for dependence between the variables
Study of correlation is essential for the variables such as
(i) Intelligence Quotient (IQ) and marks of a student.
(ii) (ii) Demand and price of a commodity.

Types of Correlation :
According to the direction of changes in two variables the correlation can be of two types:
i. Positive Correlation
ii. Negative Correlation

i Positive Correlation : If change in the value of one variable causes change in value
of other variable in the same direction, then the variables are said to be positively
correlated. In positive correlation, the increase (decrease) in the value of one
variable results in the corresponding increase (decrease) in value of other variable.
Examples :
 Income and expenditure of a family
 Sale and profit of company
 Supply and price of a commodity
 Height and weight of a group of students
 Marsks in Mathematics and Statistics.
Marks in Maths (x) 25 50 70 80 90
Marks in Statistics (y) 30 40 55 85 87

ii. Negative Correlation : If change in the value of one variable causes change in value
of other variable in the opposite direction, then the variables are said to be
negatively correlated.
In negative correlation, the increase (decrease) in the value of one variable results
in the corresponding decrease (increase) in the value of other variable.

Examples :
 Population and per capita income.
 Expenditure and saving of a family.
 Volume and pressure of gas
 Sale of woolen garments and day temperature of a place.
 Demand and price of commodity
Demand (y) 100 80 90 70 60
Price (x) 10 20 30 40 50

278
CA Foundation

Spurious Correlation :
It is the correlation between two variables having no causal relation.

Examples :
1. Height of student and marks scored in exam.
2. Size of shirt and monthly income of a person.

CLASS WORK - 2

1. Correlation analysis aims at ……….


15, 9, -20, -5, 0, 25, -13, 30, -45, -50
(a) Predicting one variable for a given value of the other variable
(b) Establishing relation between two variables
(c) Measuring the extent of relation between two variables
(d) Both (b) and (c)

2. ____________ is concerned with the measurement of the “strength of association”


between variables
(a) Correlation (b) regression (c) Both (d) None

3. Correlation helps us answer the following questions


(a) are the two variables related (b) How they are related
(c) To what extent they are related (d) All of the above

4. Age of Applicants for life insurance and the premium of insurance - correlations are
(a) Positive (b) Negative (c) Zero (d) None

5. “Unemployment index and the purchasing power of the common man” - Correlations
are
(a) Positive (b) Negative (c) Zero (d) None

6. If the ratio of change between two variables is the same then it is called ________
(a) Non linear correlation (b) Linear correlation
(c) a or b (d) None

7. Correlation would be __________ if the amount of change in one variable does not
bear a constant ratio to the amount of change in other variable
(a) Non linear (b) Linear (c) Can’t say (d) None

8. The distinction between linear & nonlinear correlation is based upon the constancy
of ratio of change between the variables
(a) False (b) true (c) Both (d) None

279
CA Foundation

9. Which of the following variables are negatively correlated?


(a) Number of vehicles & consumption of petrol
(b) Volume & pressure of gas
(c) Rainfall & yield of a crop
(d) Intelligence Quotient & marks of a student

10. The correlation between the speed of an automobile and the distance travelled by
it after applying the brakes is…………..
(a) positive (b) negative (c) zero (d) None

11. Day temperature and sale of cold drinks indicate ________ correlation
(a) Positive (b) Negative (c) Zero (d) None

12. What is spurious correlation?


(a) It is a bad relation between two variables
(b) Establishing relation between two variables
(c) It is the correlation between two variables having no causal relation
(d) It is a negative correlation

Ans.
1 2 3 4 5 6 7 8 9 10
D A D A B B A B B A
11 12
A C

Methods of Studying Correlation


The various methods of studying correlation are given below:

Methods of Studying Correlation

Graphical Algebraic

Scatter Diagram 1. Karl person’s Coefficient correlation


2. Rank Method
3. Concurrent Deviation Method

Scatter diagram
It is a simple graphical tool to study correlation. It gives an idea about the type of
correlation and degree (extent) of correlation. Let  x1,y1  x 2 ,y 2  ..................  x n ,y n  be n

280
CA Foundation

ordered pairs of value of two variables X and Y. These points are plotted corresponding to
each of the ordered pair by choosing a suitable scale on Cartesian Co-ordinate system.
We get a diagram of plotted points known as scatter diagram. From these plotted point
we may locate a curve or a line. The trend of the points in scatter diagram indicates the
nature of possible correlation between X and Y. The closeness of points to the line gives
the idea of extent of correlation. More the closeness, higher is the degree of correlation.
Following are different possible Scatter diagrams.

Case (A) n points are collinear i.e. all the points lie on a line.
(i) Perfect positive correlation: If the line containing n points is rising from left to right
then scatter diagram indicates a perfect positive correlation

(ii) Perfect negative correlation: If the line containing these n points is failing down
from left to right, the scatter diagram indicates a perfect negative correlation.

(iii) Zero correlation: In this case no trend line is observed.

281
CA Foundation

Case (B) The points form a band:


If the band is rising from left to right then it indicates positive correlation. If the width of
band is smaller the correlation is of higher degree. On the other hand if the band is falling
down from left to right it indicates negative correlation. If the points are completely
scattered i.e. no trend is observed, then there is no correlation

Positive Correlation with higher degree

Positive Correlation with lower degree

282
CA Foundation

CLASS WORK - 3

1. Method of studying correlation is ___________


(a) Graphical (b) Algebraic (c) Both A and B (d) None

2. Scatter diagram is graphical or diagrammatic method of studying correlation


(a) True (b) False (c) Both (d) None

3. Scatter diagram helps us to _________


(a) Find the nature of correlation between two variables
(b) Compute the extent of correlation between two variables
(c) Obtain the mathematical relationship between two variables
(d) Both (a) and (c)

4. Scatter diagram is considered for studying _________


(a) Linear relationship between two variables
(b) Curvilinear relationship between two variables
(c) Neither (a) nor (b)
(d) Both (a) and (b)

5. If the band of points is rising from left to right then it indicates_________ correlation
(a) negative (b) Zero (c) Positive (d) None

6. If the band of points is falling from left to right it indicates _________ correlation
(a) negative (b) Zero (c) Positive (d) None

7. If width of band is smaller it indicates


(a) Higher degree of correlation (b) Lower degree of correlation
(c) Both a & b (d) None

8. If width of band is bigger it indicates


(a) Higher degree of correlation (b) Lower degree of correlation
(c) Both a & b (d) None

9. If the points are completely scattered then


(a) There is low degree of positive correlation
(b) There is high degree of positive correlation
(c) There is no correlation
(d) None
283
CA Foundation

10. In case the correlation coefficient between two variables is 1, the relationship
between the two variables would be _________
(a) y = a + bx
(b) y = a + bx, b > 0
(c) y = a + bx, b < 0
(d) y = a + bx, both a and b being positive

11. If the relationship between two variables x and y is given by 2x + 3y + 4 = 0, then the
value of the correlation coefficient between x and y is _________
(a) 0 (b) 1 (c) -1 (d) Negative

12. If y = a + bx, then what is the coefficient of correlation between x and y?


(a) 1 (b) -1
(c) 1 or -1 according as b > 0 or b < 0 (d) None

13. When in scatter diagram all the points lie on trend line we get_________
(a) Perfect positive correlation (b) Perfect negative correlation
(c) Either (a) or (b) (d) None

14. If the plotted points in a scatter diagram are evenly distributed, then the correlation is
(a) Zero (b) Negative (c) Positive (d) (a) or (b)

15. Which of the following statements is not false?


(a) Scatter diagram fails to measure the extent of relationship between the
variables.
(b) Scatter diagram can measure correlation only when the variables are having a
linear relationship.
(c) Scatter diagram can measure correlation only when the variables are having a
non-linear relationship.
(d) None of these.

16. If the plotted points in a scatter diagram are evenly distributed, then the correlation is
(a) Zero (b) Negative (c) Positive (d) (a)or(b)

17. If the plotted points in a scatter diagram lie from upper left to lower right, then the
correlation is
(a) Positive (b) Zero (c) Negative (d) None of these

18. If all the plotted points in a scatter diagram lie on a single line, then the correlation is
(a) Perfect positive (b) Perfect negative (c) Both (a) and (b) (d) Either (a) or (b)

19. The more scattered the points are around a straight line a scattered diagram the
_____ is the correlation coefficient.
(a) Zero (b) More (c) Less (d) None

284
CA Foundation

20. The correlation coefficient being +1 if the slope of the straight line a scatter diagram is
(a) Positive (b) Negative (c) Zero (d) None

21. The correlation coefficient being -1 if the slope of the straight line in a scatter
diagram
(a) Positive (b) Negative (c) Zero (d) None

22. When r= 1, all the points in a scatter diagram would lie


(a) On a straight line directed from lower left to) upper right
(b) On a straight line directed form upper left to lower right
(c) On a straight line
(d) Both (a) and (b)

Ans.
1 2 3 4 5 6 7 8 9 10
C A A D C A A B C B
11 12 13 14 15 16 17 18 19 20
C C C A A A C D C A
21 22
B A

TRY YOURSELF - 1

1. Simple correlation is called


(a) Linear correlation (b) Nonlinear correlation (c) Both (d) None

2. If the values of y are not affected by changes in the values of x, the variables are
said to be
(a) Correlated (b) Uncorrelated (c) Both (d) Zero

3. When the variables are not independent, the correlation coefficient may be zero
(a) True (b) False (c) Both (d) None

4. When high values of one variable are associated with high values of the other & low
values of one variable are associated with low values of another, then they are said
to be
(a) Positively correlated (b) Directly correlated (c) Both (d) None

5. If high values of one tend to low values of the other, they are said to be
(a) Negatively correlated (b) Inversely correlated
(c) Both (d) None

285
CA Foundation

6. Correlation coefficient between two variables is a measure of their linear


relationship.
(a) True (b) False (c) Both (d) None

7. Correlation coefficient is a pure number.


(a) True (b) False (c) Both (d) None

8. Correlation coefficient is --— of the units of measurement.


(a) Dependent (b) Independent (c) Both (d) None

9. If two variables x and y are independent then the correlation coefficient between x
and y is _____.
(a) Positive (b) Negative (c) Zero (d) One

10. The correlation is said to be positive


(a) When the values of two variables move in the same direction.
(b) When the values of two variables move in the opposite direction.
(c) When the values of two variables would not change.
(d) None of these.

11. If x denotes height of a group of students expressed in cm. and y denotes their weight
expressed in kg, then the correlation coefficient between height and weight
(a) Would be shown in kg. (b) Would be shown in cm.
(c) Would be shown in kg. and cm. (d) Would be free from any unit.

12. The correlation between height and intelligence is _____


(a) Zero (b) Positive (c) Negative (d) None of these

13. The correlation between sale of cold drinks and day temperature is _____.
(a) Zero (b) Positive (c) Negative (d) None of these

14. The correlation between Employment and Purchasing power is _____.


(a) Positive (b) Negative (c) Zero (d) None of these

15. In case The ages of husbands and wives' correlation is


(a) Positive (b) Negative (c) Zero (d) None

16. Whatever may be the value of r, positive or negative, its square will be
(a) Negative only (b) Positive only (c) Zero only (d) None only

17. A small value of r indicates only a _____ linear type of relationship between the
variables
(a) Good (b) Poor (c) Maximum (d) Highest

286
CA Foundation

18. Correlation methods are used to study the relationship between two time series of
data which are recorded annually, monthly, weekly, daily and so on.
(a) True (b) False (c) Both (d) None

19. _____ is a relative measure of association between two or more variables.


(a) Coefficient of correlation (b) Coefficient of regression (c) Both (d) None

20. A coefficient near +1 indicates tendency for the larger values of one variable to be
associated with the larger values of the other.
(a) True (b) False (c) Both (d) None

21. There is a high direct association between measures of 'cigarette smoking' and 'lung
damage. The correlation coefficient consistent with the above statement is -
(a) 0.30 (b) 0.80 (c) -0.80 (d) -0.30

Ans.
1 2 3 4 5 6 7 8 9 10
A B A C C A A B C A
11 12 13 14 15 16 17 18 19 20
D A B A A B B A A A
21
B
Karl Pearson’s Coefficient of Correlation

Definition : Given a set of N pairs of observations  X1,Y1  X 2 ,Y2  ..................  X N,YN  relating
to two variables X and Y, the Coefficient of Correlation between X and Y, denoted by the
symbol ‘r’ is defined as ratio of covariance between x and y to the product of standard
deviations of x and y.
Cov.(X,Y)
r= Where Cov. (X,Y) = Covariance of X and Y
σx .σy
σx = Standard Deviation of X
σy = Standard Deviation of Y
This expression is known as Pearson’s product - moment formula and is used as measure
of linear correlation between X and Y.

Formula for Karl Pearson’s coefficient of correlation :

1 r=
Cov.(X,Y)
where Cov.(X,Y) =
(x - x)(y - y)
Var(x) Var(y) n

287
CA Foundation

Cov.(X,Y)
2. r=
σx .σy

3. r=
(x - x)(y - y)
(x - x) (y - y)
2 2

n xy    x   y 
4. r=
n x    x  n y    y 
2 2 2 2

x -a y -b
5. If x and y are large numbers then u = and v = a,b,h,k are constants and
h k
h 0&k  0
n uv    u   y 
rxy = ruv =
n u    u  . n v    v 
2 2 2 2

Condition of ‘r’ is -1 r  + 1

Properties of Correlation Coefficient


1. It measures only linear correlation
2. r(x, y) = r(y, x)
3. It is not affected by Shift of Origin
4. It is not affected by magnitude of change of Scale
5. r(x, x) = 1
6. -1  r  1
7. It is free from unit of measurement

Interpretation :
(i) If r > 0 the correlation is positive
(ii) If r < 0 the correlation is negative
(iii) If r = 0, no linear correlation
(iv) If r = 1, the correlation is perfect positive
(v) If r = -1, the correlation is perfect negative
(vi) If r > 0.8, there is high correlation
(vii) If 0.3 < r < 0.8, there is moderate correlation
(viii) If r < 0.3, there is marginal correlation.
Effect of shift of origin and change of scale :
1. It is not affected by shift of origin
2. It is affected by signs of changes of scale.
3. It is not affected by magnitude of change of scale.

Examples:
1. If u = 3x+4 and v = 2y+7 and rxy = 0.75 then ruv = +0.75 because both coefficients of
change of scale are positive.
288
CA Foundation

2. If u = -3x+4 and v = 2y+7 and rxy = 0.75. Then ruv = -0.75 because one of the
coefficients of change of scale is negative.
3. If u = -3x+4 and v = -2y+7 and rxy = +0.75 then ruv = + 0.75 because both coefficients
of change of scale are negative.

CLASS WORK - 4

1. Pearson’s correlation coefficient is used for finding _________


(a) Correlation for any type of relation
(b) Correlation for linear relation only
(c) Correlation for curvilinear relation only
(d) Both (b) and (c)

2. Product moment correlation coefficient is considered for _________


(a) Finding the nature of correlation
(b) Finding the amount of correlation
(c) Both (a) and (b)
(d) None

3. Product moment correlation coefficient may be defined as the ratio of _________


(a) product of standard deviations of the two variables to the covariance between
them
(b) Covariance between the variables to the product of the variances of them
(c) Covariance between the variables to the product of their standard deviations
(d) Either (b) and (c)

4. What are the limits of the correlation coefficient?


(a) No limit
(b) -1 and 1
(c) 0 and 1, including the limits
(d) -1 and 1, including the limits

5. r12 is the correlation coefficient between


(a) x1 and x2 (b) x2 and x1 (c) Both (d) None

6. If r = 0, the two variables are


(a) uncorrelated (b) Perfectly related
(c) Linearly independent (d) None

7. If cov (x,y) = 15, what restriction should be put for the standard deviations of x & y?
(a) No restriction
(b) The product of the standard deviations should be more than 15
(c) The product of the standard deviations should be less than 15
(d) The sum of the standard deviations should be less than 15

289
CA Foundation

x 5 y 7
8. If the correlation coefficient between x & y is r then between u = &u =
10 10
(a) r (b) -r (c) r - 5 10 (d) r - 7 10

9. The covariance between two variables is _________


(a) Strictly positive (b) Strictly negative
(c) Always 0 (d) Either positive or negative or zero

10. If the covariance between two variables is 20 and the variance of one of the
variable is 16, what would be the variance of the other variable ?
(a) More than 100 (b) More than 10 (c) Less than 10 (d) More than 1.25

11. If u + 5x = 6 and 3y - 7v = 20 and the correlation coefficient between x and y is 0.58


then what would be the correlation coefficient between u and v?
(a) 0.58 (b) -0.58 (c) -0.84 (d) 0.84

12. From the following data


x: 2 3 5 4 7
y: 4 6 7 8 10
The coefficient of correlation was found to be 0.93. What is the correlation between
u and v as given below ?
u: -3 -2 0 -1 2
v: -4 -2 -1 0 2
(a) -0.93 (b) 0.93 (c) 0.57 (d) -0.57

13. From the following data


x: 2 3 5 4 7
y: 4 6 7 8 10

The coefficient of correlation was found to be 0.93. What is the correlation between
u and v as given below ?
u: 10 15 25 20 35
v: -24 -36 -42 -48 -60
(a) -0.93 (b) 0.6 (c) -0.93 (d) 0.93

14. In calculating the Karl Pearson's coefficient of correlation it is necessary that the
data should be of numerical measurements. The statement is
(a) Valid (b) Not valid (c) Both (d) None

15. Relation rxy = cov (x, y)/ σx × σy is


(a) True (b) False (c) Both (d) None

16. Co-variance may be positive, negative or zero.


(a) True (b) False (c) Both (d) None
290
CA Foundation

17. Covariance measures_____ variations of two variables.


(a) Joint (b) Single (c) Both (d) None

18. Correlation coefficient between x and y = correlation coefficient between u and v


(a) True (b) False (c) Both (d) None

19. Karl Pearson's coefficient is defined from


(a) Ungrouped data (b) Grouped data (c) Both (d) none

20. When r = 0 then cov(x,y) is equal to


(a) +1 (b) -1 (c) 0 (d) None of these.

21. If the coefficient of correlation between x and y is 0.28, co-variance between x and
y is 7.6 and the variance of x is 9, then the S.D. of y series is;
(a) 9.8 (b) 10.1 (c) 9.05 (d) 10.05

22. If for two variable X and Y, the covariance, variance of X and variance of Y are 40,
16 and 256 respectively, what is the value of the correlation coefficient?
(a) 0.01 (b) 0.625 (c) 0.4 (d) 0.5

23. The co-efficient of correlation between two variables x and y is 0.5, their co-variance
is 16. If S.D. of x is 4 and S.D. of y is equal to:
(a) 4 (b) 8 (c) 16 (d) 64

24. If the relationship between two variables x and y is given by 2x+3y+4=0, then the
value of the correlation coefficient between x and y is
(a) 0 (b) 1 (c) -1 (d) Negative

25. Coefficient of correlation between x and y for 20 items is 0.4 . The AM'S and the SD'S of x
and y are known to be 12, 15, 3 and 4 respectively. Later on, it was found that the pair
(20, 15) was wrongly taken as (15, 20). Find the correct value of correlation coefficient.
(a) 0.28 (b) 0.31 (c) 0.53 (d) 0.47
Ans.
1 2 3 4 5 6 7 8 9 10
B C C D C C B A D A
11 12 13 14 15 16 17 18 19 20
B B C A A B A A A C
21 22 23 24 25
C B B C B

291
CA Foundation

Probable Error And Standard Error


Probable Error in Correlation :
The probable error of the coefficient of correlation is obtained by the following formula:

Probable Error of the Co-efficient Correlation: P.E.= 0.6745 ×


1-r  = 2 × 1-r 
2 2

n 3 n
Where n = number of pairs of observations, r = coefficient of correlation.
The probable error of the coefficient of correlation helps in interpreting its value. Since
the coefficients of correlation are, generally, computed from samples, they, like other
statistical quantities, are subject to errors of sampling. So from interpretation point of
view probable error of the coefficient of correlation is very useful.
Properties of Probable Error
It is used for interpreting the coefficient of correlation r whether it is significant or not
(i) If r < 6 x PE. then it is not significant. Perhaps there is no evidence of
correlation.
(ii) If r  6 x PE., then it is significant and the correlation exists.
(iii) By adding and -subtracting the value of Probable Error from r, we get
respectively upper and lower limits within which the coefficient of correlation
in the population can expected to. It is given as:
Correlation of the population = r  P.E.
Thus P. E is used for testing the reliability of the value of r.

Standard Error :
1-r 2
The standard error is defined as: Standard Error or S.E.=
n
where r = coefficient of correlation; n = number of pairs of observations.
Correlation coefficient measuring a linear relationship between the two variables
indicates amount of variation of one variable accounted for by the other variable. A better
measure this purpose is provided by the square of the correlation coefficient, Known as
'coefficient determination'. This can be interpreted as the ratio between the explained
variance to variance i.e.
Explained variance
r2
Total variance
2
Thus a value of 0.6 for r indicates that (0.6) x 100% or 36 per cent of the variation has
accounted for by the factor under consideration and the remaining 64 per cent variation
is due to other factors. The 'coefficient of non-determination' is given by 1-r 2  and can
be interpreted as the ratio of unexplained variance to the total variance.
unexplained variance
Coefficient of non- determination = 1-r 2  =
total variance

r = 0.6 can not conclude that 60% of the variation in dependent variable is due to the
variation independent variable, but coefficient of determination r 2 = 0.36 which implies
that only 36% of variation in dependent variable has been explained by independent
variable.
292
CA Foundation

CLASS WORK - 5

1. If r is the simple correlation, the quantity r 2 is known as


(a) Coefficient of determination (b) Coefficient of non – determination
(c) coefficient of regression (d) None

2. If r is simple correlation the quantity (1 – r 2 ) is called


(a) Coefficient of determination (b) Coefficient of non – determination
(c) coefficient of regression (d) None

3. In a correlation analysis, the value of Karl Pearson’s coefficient of correlation r = 0.8


N = 100 find PE(r) find range of r.
(a) 0.8  0.02482 (b)  0.01482
(c) 0.032 (d) None of these

4. In a correlation analysis, the value of Karl Person’s correlation coefficient and its
probable error were found to be 0.90 and 0.04 respectively. Find the value of n.
(a) 10 (b) 12 (c) 13 (d) None of these

5. Find the coefficient of correlation r, when its probable error is 0.2 and the number
of pairs of items is 9.
(a) 0.123 (b) 0.3323 (c) 0.223 (d) None of these

580
6. A relationship r 2 = 1- is not possible
300
(a) True (b) False (c) Both (d) None

7. If r < 6  PE, then r is


(a) significant (b) Not Significant (c) Can’t say (d) None.

8. If r  6  PE, then r is
(a) significant (b) Not Significant (c) Can’t say (d) None.

9. P.E is used for testing reliability of the value of ?


(a) True (b) False (c) Can’t say (d) None

10. Correlation coefficient between x and y is 0.3 for n =100. Find Standard Error in r.

(a) 0.081 (b) 0.09 (c) 0.0081 (d) 0.009

11. Correlation coefficient between x and y is 0.3 for n =100. Find Probable Error in r.
(a) 0.081 (b) 0.054 (c) 0.09 (d) None

293
CA Foundation

12. Correlation coefficient between x and y is 0.3 for n =100. Find range of r.

(a) 0.246 - 0.354 (b) 0.3 to 0.354 (c) 0.354 – 0.9 (d) None

13. If r = 0.9, PE = 0.04 then find n.


(a) n = 10 (b) n = 15 (c) n = 100 (d) n = 150
Ans.
1 2 3 4 5 6 7 8 9 10
A B A A B A B A A A
11 12 13
B A A

TRY YOURSELF - 2

1. Correlation coefficient is dependent on the change of both origin & the scale of
observations.
(a) True (b) False (c) Both (d) None

2. The value of correlation coefficient lies between


(a) -1 and +1 (b) -1 and 0 (c) 0 and 1 (d) None

3. Neither y nor x can be estimated by a linear function of the other variable when r
equals
(a) +1 (b) -1 (c) 0 (d) None

4. What are the limits of the correlation coefficient?


(a) No limit (b) -1 and 1, including the limits
(c) 0 and 1, including the limits (d) -1 and 0, including the limits.

5. If r is the correlation co-efficient, then


(a) r ≥ 1 (b) r ≤ 1 (c) |r| ≤ 1 (d) |r| ≥ 1

6. The partial correlation coefficient lies between


(a) -1 and +1 (b) 0 and +1 (c) -1 and 0 (d) None

7. r12 is
the correlation coefficient between
(a) X1 and X2 (b) X2and X1 (c) X1 and X3 (d) X2 and X3

8. r12 is the same as r21


(a) True (b) False (c) Both (d) None

294
CA Foundation

9. If a, b, c, d are constants such that a and c are of opposite signs and r is the
correlation coefficient between X and Y, then the correlation coefficient between
aX + b and cY + d is -
a c
(a)   r (b)   r (c) r + (d) r-
c a

10. r (X,Y) equals;


𝑐𝑜𝑣(𝑋,𝑌) 𝑐𝑜𝑣(𝑋,𝑌)
(a) Cov(XY) (b) 𝑣𝑎𝑟(𝑋).𝑣𝑎𝑟(𝑌)
(c) (d) None of these
√𝑣𝑎𝑟(𝑋).√𝑣𝑎𝑟(𝑌)

11. Two variables X and Y are related as 4x + 3y = 7, then the Correlation between x
and y is -
(a) Perfect Positive (b) Perfect Negative (c) Zero (d) None of these

12. If Cov (u,v) =3, 𝜎𝑢2 = 4.5, 𝜎𝑣2 = 5.5, then p(u,v) is:
(a) 0.347 (b) 0.603 (c) 0.07 (d) 0.121

13. If r = 0.28, Cov (x, y) = 7.6, V (x) = 9, then y =


(a) 8.75 (b) 9.04 (c) 6.25 (d) None

14. The following data relate to the test scores obtained by eight salesmen in an
aptitude test and their daily sales in thousands of rupees:
Salesman Scores Sales
1 60 31
2 55 28
3 62 26
4 56 24
5 62 30
6 64 35
7 70 28
8 54 24
(a) 0.23 (b) 0.48 (c) 0.77 (d) 0.89

15. Find the product moment correlation coefficient from the following information:
X 2 3 5 5 6 8
Y 9 8 8 6 5 3
(a) -0.93 (b) 0.57 (c) -0.49 (d) 0.73

16. Examine the correlation between age and blindness on the basis of the following data.
Age in years No. of persons (in 000's) No of blind persons
0-10 90 10
10-20 120 15
20-30 140 18
30→0 100 20
40-50 80 15
50-60 60 12
60-70 40 10
70-80 20 06
(a) 0.73 (b) 0.96 (c) 0.58 (d) 0.37
295
CA Foundation

17. What is the coefficient of correlation from the following data?


X 1 2 3 4 5
Y 8 6 7 5 5
(a) 0.75 (b) -0.75 (c) -0.85 (d) 0.82

18. What is the value of correlation coefficient Karl Pearson on the basis of the
following data:
X -4 -3 -2 -1 0 1 2 3 4
Y 18 11 6 3 2 -3 -6 -11 -18
(a) 1 (b) -1 (c) 0 (d) -0.5

19. Compute the correlation coefficient between x and y from the following data.
n = 10 , ∑ x = 40, ∑y = 50, ∑xy = 220, ∑x2 =200, ∑y2 = 262.
(a) 0.33 (b) 0.51 (c) 0.91 (d) 0.87

20. Given that for twenty pairs of observations, ∑xu = 525, ∑x = 129, ∑u = 97, ∑x 2 = 687,
∑u2 = 427 and y = 10 - 3u, the coefficient of correlation between x and y is
(a) -0.7 (b) 0.74 (c) -0.74 (d) 0.75

Ans.
1 2 3 4 5 6 7 8 9 10
D A C B C A A A D C
11 12 13 14 15 16 17 18 19 20
B B B B A B C C C C



Non- Repeated & Repeated Ranks


Spearman’s Rank Correlation:
Charles Edward Spearman developed a formula for measuring the correlation between
qualitative characteristics. It is defined as the coefficient of correlation between the ranks of
items of two variables according to some qualitative characteristic.
Let x1, x2, …. xn be the ranks given to n items of X according to characteristics A and y1,
y2, …. yn be the ranks given to n items of Y according to characteristic B.
Spearman's rank correlation coefficient is defined as follows. It is denoted by R.
6 d2i
R = 1- ,-1  R  1
n(n2 -1)
Where di = xi -y i,n= Number of pairs  x i - y i 
[Note : When ranks are not given, convert the quantitative data into rank data by assigning
rank I to the highest observation, rank 2 to the next highest observation and so on.]

296
CA Foundation

Rank correlation with ties:


Some times it happens that in the given data two or more items or observations have same
merits or quality or value. In such a case common rank is assigned to each of such items.
This common rank is the arithmetic mean of rank assigned to such items with slightly
different merit or quality. The number of items getting the same rank is called the length of
the ties. It is denoted by ‘m’.

e.g. suppose an item is repeated twice at rank 5, then the common rank to be assigned to
5+6
each item will be  5.5 The next item will be ranked 7. If an item is repeated thrice at
2
3+ 4 +5
rank 3, then common rank to be assigned to each item will be  4. The next rank
3
assigned will be 6.
 m  m2  1 
In case of the tie occurs need to be corrected. The correction factor is   
 12 
The corrections for ties are computed separately for each of two series x and y.
Let Tx = Total correlation due to ties in x
Ty = Total correlation due to ties in y
Then, corrected  d Tx  Ty
2
i

6 corrected  d  2

R = 1- 
i

n n2 -1

CLASS WORK - 6

1. For finding correlation between two attributes, we consider


(a) Pearson’s correlation coefficient
(b) Scatter diagram
(c) Spearman’s rank correlation coefficient
(d) Coefficient of concurrent coefficient

2. Which of the following is true for the rank correlation R


(a) - 1 ≤ R ≤ 1 (b) - 1 ≥ R ≥ 1 (c) - 1 ≤ r ≤ 1 (d) None

3. In rank correlation formula ∑d is always equal zero


(a) True (b) False (c) Can’t say (d) None

4. Spearman’s correlation coefficient is based on ranks rather than on actual observation


(a) True (b) False (c) Can’t say (d) None

297
CA Foundation

5. For finding the degree of agreement about beauty between two Judges in a Beauty
Contest, we use
(a) Scatter diagram (b) Coefficient of rank correlation
(c) Coefficient of correlation (d) Coefficient of concurrent deviation

6. If there is a perfect disagreement between the marks in Geography and Statistics, then
what would be the value of rank correlation coefficient?
(a) Any value (b) Only 1 (c) Only -1 (d) (b) or (c)

7. For the following data the coefficient of rank correlation is


Rank in Botany 1 2 3 4 5
Rank in Chemistry 2 3 1 5 4
(a) 0.93 (b) 0.4 (c) 0.6 (d) None

8. If the sum of squares of the rank difference in mathematics & physics marks of 10
students is 22, then the coefficient of rank correlation is
(a) 0.267 (b) 0.867 (c) 0.92 (d) None

9. Three competitors in a contest are ranked by two judges in the order 1, 2, 3 & 2, 3, 1
respectively. Calculate spearman’s rank correlation coefficient
(a) - 0.5 (b) - 0.8 (c) 0.5 (d) 0.8

10. Spearman’s While calculating spearman’s correlation if item or value of any one
characteristic is repeated thrice at rank 3, then common rank to be assigned to each
item will be
3+ 3+ 3 3+ 4 +5 4+4+4 5+5+5
(a) 3 (b) 4 (c) 4 (d) 5
3 3 3 3

11. Rank correlation coefficient lies between


(a) 0 to 1 (b) -1 to +1 (c) -1 to 0 (d) Both

12. Maximum value of Rank Correlation coefficient is


(a) -1 (b) +1 (c) 0 (d) None

13. The sum of the difference of rank is


(a) 1 (b) -1 (c) 0 (d) None

14. In rank correlation coefficient only an increasing / decreasing relationship is required.


(a) False (b) True (c) Both (d) None

15. Great advantage of _____ is that it can be used to rank attributes which cannot be
expressed by way of numerical value.
(a) Concurrent correlation (b) Regression
(c) Rank correlation (d) None

298
CA Foundation

16. Compute the coefficient of rank correlation between Eco marks and Stats marks as
given below.
Eco Marks Stats Marks
80 90
56 75
50 75
48 65
50 65
62 50
60 65
(a) 0.25 (b) 0.38 (c) 0.15 (d) 0.71

17. For a group of 8 students, the sum of squares of differences in ranks for Maths and
Stats mark was found to be 50. What is the value of rank correlation coefficient?
(a) 0.23 (b) 0.40 (c) 0.78 (d) 0.92

18. If the sum of squares of difference of ranks, given by two judge A and B, of 8 students
in 21, what is the value of rank correlation coefficient?
(a) 0.7 (b) 0.65 (c) 0.75 (d) 0.8

19. While computing rank correlation coefficient between profit and investment for the
last 6 years of a company the difference in rank for a year was taken 3 instead of 4.
What is the rectified rank correlation coefficient if it is known that the original value
of rank correlation coefficient was 0.4?
(a) 0.3 (b) 0.2 (c) 0.25 (d) 0.28
Ans.
1 2 3 4 5 6 7 8 9 10
C A A A B C C B A B
11 12 13 14 15 16 17 18 19
B B C B C C B C B

Concurrent deviation
Concurrent deviation method :
Meaning :
Concurrent Deviation Method is based on the direction of change in the two paired
variables. The Correlation Coefficient between two series of direction of change is called
Coefficient of Concurrent Deviation. It is given by the formula :
 2c-m 
rc = ± ±  
 m 
Where, rc = Coefficient of Concurrent Deviation.
C = Number of Positive signs after multiplying the direction of change of X series and Y
series.
M = Number of pairs of observation compared.(i.e. Number of + or - signs)

299
CA Foundation

Important Points :
1. It is quickest method to study correlation.
2. The results obtained by this method are only an approximate indicator of presence
or absence of correlation.
3. In this method, magnitudes are not considered and tendency of increasing or
decreasing is considered.

CLASS WORK - 7

1. ________ method is based on the direction of change the two paired variables
(a) Karl Pearson’s coefficient of correlation
(b) Spearman’s rank correlation
(c) Concurrent deviation
(d) None

2. In case of concurrent deviation method we are not concerned with the magnitude of
the variable
(a) True (b) False (c) (a) or (b) (d) None

3. Quickest method to find the correlation between two variables is concurrent


deviation method
(a) True (b) False (c) (a) or (b) (d) None

4. In case of concurrent method magnitudes are not considered and tendency of


increasing or decreasing is considered
(a) True (b) False (c) (a) or (b) (d) None
1
5. The coefficient of concurrent deviation for p pairs of observations was found to
3
be. If the number of concurrent deviation was found to be 6, then the value of p is
(a) 10 (b) 9 (c) 8 (d) None

6. What is the quickest method to find correlation between two variables?


(a) Scatter diagram (b) Method of concurrent deviation
(c) Method of rank correlation (d) Method of product moment correlation.

7. In Method of Concurrent Deviations, only the directions of change (Positive direction


/ Negative direction) in the variables are taken into account for calculation of
(a) Coefficient of S.D. (b) Coefficient of regression.
(c) Coefficient of correlation (d) None

8. For 10 pairs of observations, no. of concurrent deviations was found to be 4. What is


the value of the coefficient of concurrent deviation?
(a) √0.2 (b) - √0.2 (c) 1/3 (d) -1/3

300
CA Foundation

9. What is the coefficient of concurrent deviations for Demand and Supply relating to
following data:
Supply Demand Supply Demand
68 65 38 45
43 60 23 40
38 55 83 85
78 61 63 80
66 35 53 85
83 75
(a) 0.82 (b) 0.85 (c) 0.89 (d) -0.81

10. Correlation coefficient between x and y is 0.3. Find coefficient of determination


(a) 0.09 (b) 0.81 (c) 0.90 (d) 0.10

11. Correlation coefficient between x and y is 0.4. Find coefficient of non-determination


(a) 0.16 (b) 0.08 (c) 0.84 (d) 0.91
Ans.
1 2 3 4 5 6 7 8 9 10 11
C A A A A B C D C A C

TRY YOURSELF - 3

1. Two numbers within the brackets denote the ranks of 10 students of class in two
subjects : (1,10), (2,9), (3,8), (4,7), (5,6), (6,5) , (7,4), (8,3), (9,2), (10,1), then rank
correlation coefficient is -
(a) 0 (b) -1 (c) 1 (d) 0.5

2. What is the value of: Rank correlation coefficient between the following marks in
Physics and Chemistry:
Roll No. 1 2 3 4 5 6
Physics 25 30 46 30 55 80
Chemistry 30 25 50 40 50 78
(a) 0.782 (b) 0.696 (c) 0.932 (d) 0.857

3. Compute the coefficient of rank correlation between sales and advertisement


expressed in thousands of rupees from the following data.
Sales Advertisement
90 7
85 6
68 2
75 3
82 4
80 5
95 8
70 1
(a) 0.53 (b) 0.28 (c) 0.77 (d) 0.95
301
CA Foundation

4. Let x1, x2....xn be the ranks of n individuals according to character A and y1, y2...yn be
the ranks of the same individuals according to other character B such that xi + yi = n +
1, then the coefficients of rank correlation between the characters A and B is.
(a) -1 (b) 1 (c) 0 (d) None of these

5. If the sum of the squares of rank differences in the marks of 10 students in two students
is 44, then the coefficient of rank correlation is _____
(a) 0.78 A (b) 0.73 X (c) 0.87 (d) None

6. If the rank correlation coefficient between marks in management and mathematics for
a group of student is 0.6 and the sum of squares of the differences in ranks in 66, what
is the number of students in the group?
(a) 10 (b) 9 (c) 8 (d) 11

7. For a number of towns, the coefficient of rank correlation between the people living
below the poverty line and increase of population is 0.50. If the sum of squares of the
differences in ranks awarded to these factors is 82.50, find the number of towns.
(a) 10 (b) 50 (c) 20 (d) 70

8. While computing rank correlation coefficient between profits and investment for 10
years of a firm, the difference in rank for a year was taken as 7 instead of 5 by mistake
and the value of rank correlation coefficient was computed as 0.80. What would be the
correct value of rank correlation coefficient after rectifying the mistake?
(a) 0.23 (b) 0.57 (c) 0.78 (d) 0.95

Ans.
1 2 3 4 5 6 7 8
B D D A D A A D


Introduction of Regression
We have already learns that correlation is used to measure the strength and direction of
association between two variables. In statistics, correlation denotes association between two
quantitative variables. It is assumed that this association is linear. That is, one variable increase
or decreases by a fixed amount for every unit of increase or decrease in the other variable.
Consider the relationship between the two variables in each of the following examples.
1. Advertising and sales of a product. (Positive correlation)
2. Height and weight of a primary school student. (Positive correlation)
3. The amount of fertilizer and the amount of crop yield. (Positive correlation)
4. Duration of exercise and weight loss. (Positive correlation)
5. Demand and price of a commodity. (Positive correlation)
6. Income and consumption. (Positive correlation)
7. Supply and price of a commodity. (Negative correlation)
8. Number of days of absence (in school) and performance in examination. (Negative
correlation)
302
CA Foundation

9. The more vitamins one consumes, the less likely one is to have a deficiency. (Negative
correlation)

Correlation coefficient measures association between two variables but cannot determine
the value of one variable when the value of the other variable is known or given. The
technique used for predicting the value of one variable for a given value of the other variable
is called regression. Regression is a statistical tool for investigating the relationship between
variables. It is frequently used to predict the value and to identify factors that cause an
outcome. Karl Pearson defined the coefficient of correlation known as Pearson’s Product
Moment correlation coefficient.
Carl Friedrich Gauss developed the method known as the Least Squares Method for finding
the linear equation that best describes the relationship between two or more variables. R.A.
Fisher combined the work of Gauss and Pearson to develop the complete theory of least
squares estimation in linear regression. Due to Fisher's work, linear regression is used for
prediction and understanding correlations.

Note: Some statistical methods attempt to determine the value of an unknown quantity,
which may be a parameter or a random variable. The method used for this purpose is called
estimation if the unknown quantity is a parameter, and prediction if the unknown quantity is
a variable.

Meaning and Types of Regression


Linear regression is a method of predicting the value of one variable when the values of all
other variables are known or specified. The variable being predicted is called the response or
dependent variable. The variables used for predicting the response or dependent variable are
called predictors or independent variables. Linear regression proposes that the relationship
between two or more variables is described by a linear equation. The linear equation used for
this purpose is called a linear regression model. A linear regression model consists of a linear
equation with unknown coefficients. The unknown coefficients in the linear regression model
are called parameters of the linear regression model.
Observed values of the variables are used to estimate the unknown parameters of the model.
The process of developing a linear equation to represent the relationship between two or more
variables using the available sample data is known as fitting the linear regression model to
observed data. Correlation analysis is used for measuring the strength or degree of the
relationship between the predictors or independent variables and the response or dependent
variable. The sign of correlation coefficient indicates the direction (positive or negative) of the
relationship between the variables, while the absolute value (that is, magnitude) of correlation
coefficient is used as a measure of the strength of the relationship.
Correlation analysis, however, does not go beyond measuring the direction and strength of
the relationship between predictor or independent variables and the response or dependent
variable. The linear regression model goes beyond correlation analysis and develops a
formula for predicting the value of the response or dependent variable when the values of the
predictor or independent variables are known. Correlation analysis is therefore a part of
regression analysis and is performed before performing regression analysis. The purpose of
correlation analysis is to find whether there is a strong correlation between two variables.
Linear regression will be useful for prediction only if there is strong correlation between the
two variables.

303
CA Foundation

Types of Linear Regression


The primary objective of a linear regression is to develop a linear equation to express or
represent the relationship between two or more variables. Regression equation is the
mathematical equation that provides prediction of values of the dependent variable based
on the known or given values of the independent variables.
When the linear regression model represents the relationship between the dependent variable
(Y) and only one independent variable (X), then the corresponding regression model is called
a simple linear regression model. When the linear regression model represents the
relationship between the dependent variable and two or more independent variables, then the
corresponding regression model is called a multiple linear regression model.

Following examples illustrate situations for simple linear regression.


1. A firm may be interested in knowing the relationship between advertising (X) and sales of
its product (Y), so that it can predict the amount of sales for the allocated advertising budget.
2. A botanist wants to find the relationship between the ages (X) and heights (Y) of
seedling in his experiment.
3. A physician wants to find the relationship between the time since a drug is administered
(X) and the concentration of the drug in the blood-stream (Y).

Following examples illustrate situations for multiple linear regression


1. The amount of sales of a product (dependent variable) is associated with several
independent variables such as price of the product, amount of expenditure on its
advertisement, quality of the product, and the number of competitors.
2. Annual savings of a family (dependent variable) are associated with several
independent variables such as the annual income, family size, health conditions of
family members, and number of children in school or college.
3. The blood pressure of a person (dependent variable) is associated with several
independent variables such as his or her age, weight, the level of blood cholesterol,
and the level of blood sugar.

The linear regression model assumes that the value of the dependent variable changes in
direct proportion to changes in the value of an independent variable, regardless of values of
other independent variables. Linear regression is the simplest form of regression and there
are more general and complicated regression models. We shall restrict our attention only to
linear regression model in this chapter.

Fitting Simple Linear Regression


Consider an example where we wish to predict the amount of crop yield (in kg. per acre) as
a linear function of the amount of fertilizer applied (in kg. per acre). In this example, the crop
yield is to be predicted. Therefore, It is dependent variable and is denoted by Y. The amount
of fertilizer applied is the variable used for the purpose of making the prediction. Therefore,
it is the independent variable and is denoted by X.

304
CA Foundation

Amount of fertilizer (X) (Kgs. In per acre) Yield (Y) (in ’00 kg)
30 43
40 45
50 54
60 53
70 56
80 63
The amount of fertilizer and the crop yield for six cases. These pairs of observations are
used to obtain the scatter diagram.

Scatter diagram of the yield of grain and amount of fertilizer used.


We want to draw a straight line that is closest to the points in the scatter diagram If all the
points were collinear (that is, on a straight line), there would have been no problem in
drawing such a line. There is a problem because all the points are not on a straight line.
Since the points in the scatter diagram do not form a straight line, we want to draw a straight
line that is closest to these points. Theoretically, the number of possible line is unlimited. It
is therefore necessary to specify some condition in order to ensure that we draw the straight
line that is closest to all the data points in the scatter diagram. The method of least squares
provide the line of best fit because it is closest to the data points in the scatter diagram
according to the least squares principle.

Method of Least Squares


The principle used in obtaining the line of best fit is called the method of least squares. The
method of least squares was developed by Adrien-Maire Lagendre and Carl Friedrich Gauss
independently of each other. Let us understand the central idea behind the principle of least
squares.

305
CA Foundation

Suppose the data consists of n pairs of values  x1 y1  ,  x2 y2  ......  xn yn  and suppose that
the line that fits best to the given data is written as follows. Ŷ = a + bX (Here, 𝑌̂ is to be read
as Y cap.)This equation is called the prediction equation. That is using the same values of
constants a and b, the predicted value of Y are given by 𝑌̂= a + bxi where xi is the value
of the independent variable and 𝑌̂i is the corresponding predicted value of Y. Note that the
observed value yi of the independent variable Y is different from the predicted value (𝑌̂𝑖 ).
The observed valued (yi) and predicted values (𝑌̂𝑖 ) of Y do not match perfectly because the
observations do not fall on a straight line. The Difference between the observed values and
the predicted values are called errors or residuals.
Mathematically speaking the quantities
y1 – 𝑌̂𝑖 , y2 – 𝑌̂2 _____ yn – 𝑌̂𝑛 or equivalently the quantities y1 – (a + bx1), y2 – (a + bx2),
______, (a + bxn) are deviations of observed values of Y from the corresponding predicted
values and are therefore called errors or residuals. We write e i = yi - 𝑌̂1= yi – (a + bx), for i =
1, 2, _______ n.
Geometrically, the residual ei, which is given by yi − (a + bxi), denotes the vertical distance
(which may be positive or negative) between the observed value (y) and the predicted value
(𝑌̂𝑖 ).
The principle of the method of least squares can be stated as follows.
Among all the possible straight lines that can be drawn on the scatter diagram, the line of
best fit is defined as the line that minimizes the sum of squares of residuals, that is, the sum
of squares of deviations of the predicted y-values from the observed y-values. In other
words, the line of the best fit is obtained by determining the constant a and b so that
𝑛 𝑛 𝑛

∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑦̂𝑖 ) = ∑[𝑦𝑖 − (𝑎 + 𝑏𝑥𝑖 )]2


2

𝑖=1 𝑖=1 𝑖=1


is minimum. The straight line obtained using this principle is called the least regression line
Symbolically, we write
𝑛

S 2 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2
𝑖=1

as the sum of squared errors. It can also be written as


𝑛

S 2 = ∑[𝑦𝑖 − (𝑎 + 𝑏𝑥𝑖 )]2


𝑖=1
We want to determine the constants a and b in such a way that S2 is minimum.
Note that S2 is a continuous and differentiable function of both a and b. We differentiate S2
with respect to a (assuming b to be constant) and with respect to b (assuming a constant)
and with respect to b ( assuming a to be constant). We then equate both these derivatives
to zero in order to minimize S2. As the result, we get the following two linear equations in
two unknowns a and b.
𝑛 𝑛

∑ 𝑦𝑖 = 𝑛𝑎 + 𝑏 ∑ 𝑥𝑖
𝑖=1 𝑖=1

306
CA Foundation
𝑛 𝑛 𝑛

∑ 𝑥𝑖 𝑦𝑖 = 𝑎 ∑ 𝑥𝑖 + 𝑏 ∑ 𝑥𝑖2
𝑖=1 𝑖=1 𝑖=1
When we solve these two linear equations, the values of a and b that minimize S2 are given
by
𝑐𝑜𝑣(𝑋𝑌)
𝑎 = 𝑦̅ − 𝑏𝑥̅ , 𝑏 =
𝜎𝑥2
Where
𝑛
1
cov(X, Y) = ∑ 𝑥𝑖 𝑦𝑖− 𝑥̅ 𝑦̅
𝑛
𝑖=1

and
𝑛
1
𝜎𝑥2 = ∑ 𝑥𝑖2 −−2
𝑥
𝑛
𝑖=1
Substituting the values of a and b obtainedas indicated above in the regression equation
Y= a + bX,
We get the equation
Y - 𝑦̂ – b(X - 𝑥̅ )
Note: The constant b is called the regression coefficient (or the slope of the regression line)
and the constant a is called the Y-intercept (that is, the Y value when X = 0). Recall that the
equation Y = a + bX defining a straight line is called the slope intercept formula of the straight
line.
When observations on two variables, X and Y, are available, it is possible to fit a linear
regression of Y on X as well as a linear regression of X on Y. Therefore, we consider both
the models in order to understand the difference between the two and also the relationship
between the two.
Regression of Y on X

We now introduce notation b xy for b when Y is the dependent variable and X is the
independent variable.
Linear regression of Y on X assumes that the variable X is the independent variable and the
variable Y is the dependent variable. In order to make this explicit, we express the linear
regression model as follows.
Y  y  b xy  X  x  ,

Or
Here, note that b is replaced by byx.

cov  X,Y 
b xy 
var(X)

  x  x  y  y 
i i
=
  x  x
2

307
CA Foundation

=  x y  nx y
i i

 x  n x
2
2
i

Regression of X on Y
The notation b xy stands for b when X is the dependent variable and Y is the independent
variable. Linear regression of X on Y assumes that the variable Y is the independent variable
and the variable X is the dependent variable. In order to make it clear that this model is
different from the linear regression of Y on X, we express the linear regression model as
follows.
X = a + b ’Y
The method of least squares, when applied to this model leads to the following expressions
for the constant a′ and b′.
a'  x  b' y
and
cov  X,Y 
b' 
var(X)
Substituting the values of a′ and b′ in the linear regression model, we get
X  a' b' Y and
X  x  b' y  b' Y
i.e.  X  x   b '  Y  y 

Note: The constant b′ in the above equation is called the regression coefficient of X on Y. In
order to make this explicit, it will henceforth be written as b xy instead of b′. The least squares
regression of X on Y will therefore be written as
 X  x   byx  Y  y  .
Note that the linear regression of X on Y is expressed as
X  a' byx Y
Here note that b is replaced by b xy This can be written as
1
Y  X  a' 
byx
 1 
Showing that the constant   is the slope of the line of regression of X on Y.
 b yx 
Further, note that the regression coefficient b xy involved in the linear regression of X on Y is
given by

308
CA Foundation

cov(X,Y)
b xy 
var(Y)
1

 xi  x yi  y  
 n
1
  
2
yi  y
n


 x y  nx y
i i

 x  n x
2
2
i

Observed that the point  x, y  satisfies equation of both the lines of regression. Therefore,
the point  x, y  is the point of intersection of the two lines regression.

Properties of Regression
The line of regression of Y on X is given by Y  a  byx X and the line of regression of X on
Y is given by X  a' bxy Y
cov(X,Y)
Here, b xy  , the slope of the line of regression of Y on X is called the regression
var(Y)
coefficient of Y on X. Similarly.
cov(X,Y)
b xy  , the slope of the line of regression of X on Y is called the regression
var(Y)
coefficient of X on Y. These two regression coefficients have the following property.

(a) bYX .bYX  r 2 where r is the correlation coefficient between X and Y,

bYX is the regression coefficient of X on Y. and bYX is the regression coefficient of Y


on X.

Proof: Note that

cov(X,Y) cov(X,Y)
b XY  bYX  ,
var(Y) var(X)
2
 cov(X,Y) 
 
 X  Y 
Can it be said that the correlation coefficient is the square root of the product of the
two regression coefficients?

309
CA Foundation

(b) If bYX  1, then bXY  1,


Proof: Let, if possible, bXY , bYX  1, implies that r2 > 1, which is impossible. (Can you
provide the reason?) This shows that our assumption must be invalid. That is, both the
regression coefficients cannot simultaneously exceed unity.
We already know that the two variances  x ,  y and the correlation coefficient r satisfy
2 2

the relation.
cov(X,Y)  r.X  Y
 cov(X,Y) 
 r 
 X  Y 
The regression coefficients can also be written as follows.
cov(X,Y)
bYX 
2x
r  x . y

2x
y
r
x
and
cov(X,Y)
b XY 
2y
r  x .y

2y
x
r
y

b YX  b XY
(c)  |r |
2
Proof: We have already seen that
 
bYX  r Y and b XY  r X
X Y
where  X and Y are the standard deviations of X and Y, respectively.
Therefore,
Y 
bYX  b XY  r r X
X Y
  
 Y  X
 X Y 

310
CA Foundation

   X 
 Y  . . . . (1)
 X  Y 
But (   X   Y   0 and therefore
2

2 X  2 Y  2XY  0
2 X  2 Y  2 XY
2 Y  2 X
2
X  Y
2 Y  2 X
 r  2r (2)
X  Y
From (1) and (2), we have
bYX  bXY  2r
b YX  b XY
 r
2
this result shows that the arithmetic mean of the two regression coefficients, namely
bYX and bXY is greater than or equal to r. This result, however, holds only when bYX,
bXY and r are positive. (Can you find the reason?) Consider the case where bYX = 0.8
and bXY = 0.45 In this case, we have r = −0.6. (Can you find the reason?)

Note that bYX + bXY = -1.25, and 2r = −1.2. This shows that bYX + bXY ≤ 2r.
It may be interesting to note that

cov(XY)
b YX 
2x

cov(XY)
b XY 
2Y

cov(XY)
r
X  Y

It is evident from the above three equations that all the coefficients have the same n
have the same sign. In other words, if numerator and this numerator determines their
sign. As the result, all these coefficients r > 0, then bYX > 0, and bXY > 0. Similarly. if
r < 0. then bYX < 0, and bXY < 0. Finally. if r = 0, then bYX = bXY = 0.

(d) bYX and bXY are not affected by change of origin, but are affected by change of scale.
This property is known as invariance property.
The invariance property states that bYX and bXY are invariant under change of origin,
but are not invariant under change of scale.
311
CA Foundation

Xa Y b
Proof: Let U = and V
h k
where a, b, h and k are constants with the condition that h, k ≠ 0
We have already proved that Y  k v 2 and cov(X, Y) = hkcov(U, V).
2 2 2

Therefore,
cov(XY)
b YX 
2x
hk cov(U,V)
=
h2 U2
k cov(U,V)
=
h U2
That is,
k
b YX  b VU
h
Similarly,
k
b XY  bUV
h
These two results show that regression coefficients are invariant under change of
origin, but are not invariant under change of scale.

CLASS WORK - 8

1. Regression analysis is concerned with _________


(a) Establishing a mathematical relationship between two variables
(b) Measuring the extent of association between two variables
(c) Predicting the value of the dependent variable for a given value of the
independent variable
(d) Both (a) and (c)

2. The process of developing an algebraic equation between two variables & predicting
the value of one variables & predicting the value of one variables given the value of
other variables is known as
(a) Correlation analysis (b) Regression analysis
(c) (a) or (b) (d) none

3. The variables whose value is predicted using the algebraic equation is called
(a) Dependent variable
(b) Response variable
(c) Explained variable or regressed variable
(d) All of the above

312
CA Foundation

4. The variables whose value is used as the basis for prediction is called
(a) Independent variable
(b) Predictor variable
(c) Regressor or explanatory variable
(d) All of the above

5. The strength or degree of relationship between two variables is measured using ______
(a) Regression analysis (b) Correlation analysis
(c) Both (d) None

6. An algebraic equation between two correlated variables is formed using ________


(a) Regression analysis (b) Correlation analysis
(c) Both (d) None

7. Correlation Analysis fails to give answers of following questions


(a) What is the direction of relationship between two variables
(b) What is the degree of relationship between two variables
(c) What is the functional(or algebraic) relationship between two variables?
(d) All of the above

8. Simple regression has


(a) One dependent variable & one independent variable
(b) One dependent variable & more than one independent variable
(c) More than one dependent variables & one independent variable
(d) All of the above

9. Multiple model regression has


(a) One dependent variable & one independent variable
(b) One dependent variable & more than one independent variable
(c) More than one dependent variables & one independent variable
(d) None

10. If there are two variables x and y, then the number of regression equations could be _____
(a) 1 (b) 2 (c) Any Number (d) 3
11 _________ gives the mathematical relationship of the variables.
(a) Correlation (b) Regression (c) Both (d) None

12. Under Algebraic Method we get _____ linear equations


(a) One (b) Two (c) Three (d) Four

13. Regression equation is also named as:


(a) Prediction equation (b) Estimating equation
(c) Line of average relationship (d) All the above

313
CA Foundation

14. Blood Pressure of a person depends on age, age is _________


(a) Independent variable (b) dependent variable
(c) Both (d) None

15. Sales turnover depends on advertisement sales turnover is __________


(a) Independent variable (b) dependent variable
(c) Both (d) None
16. line of best fit is known as regression line
(a) True (b) False (c) both (d) None

17. The line of regression passes through the points, bearing ________ no. of points on
both sides.
(a) equal (b) unequal (c) zero (d) none

18. The method applied for deriving the regression equations is known as
(a) Least squares (b) Concurrent deviation
(c) Product moment (d) Normal equation.

19. The difference between the observed value and the estimated value in regression
analysis is known as
(a) Error (b) Residue (c) Deviation (d) (a) or (b).

20. The errors in case of regression equations are


(a) Positive (b) Negative (c) Zero (d) All these.

21. The line Y = a + bX represents the regression equation of into of regression Y on X.


(a) Y on X (b) X on Y (c) both (d) none.

22. The line Y = 13 – 3X /2 is the regression equation of


(a) Y on X (b) X on Y (c) both (d) none.

23. In the line Y = 19- 5X /2, byx is equal to


(a) 19 / 2 (b) 5 / 2 (c) -5 / 2 (d) none.

24. byx is called regression coefficient of regression line


(a) X on Y (b) Y on X (c) both (d) none.

25. b yx 
x y
(a) r (b) r (c) Both (d) None
y x
26. Regression line y on x is
(a) x - x  b xy  y - y  (b) x + x  b xy  y + y 
(c) y + y = b xy  x - x  (d) None

314
CA Foundation

27. In the regression line y on x , x is known as


(a) Independent variable (b) dependent variable
(c) both (d) None

28. In the regression line y on x , y is known as


(a) Independent variable (b) dependent variable
(c) both (d) None

29. The line X = 31 / 6 – Y /6 is the regression equation of


(a) y on x (b) x on y (c) both (d) none.

30. In the equation X = 35/8 – 2Y /5, bxy is equal to


(a) -2/5 (b) 35/8 (c) 2/5 (d) 5/2

31. bxy is called regression coefficient of regression line of


(a) x on y (b) y on x (c) both (d) none.

32. The slope of the regression line of x on y


(a) b yx (b) bxy (c) 1/ bxy (d) 1/ b yx

33. b yx 
x cov(x,y)
(a) r (b)
y var(y)

(c)
 xy - x y (d) None
 x n x 
2 2

34. Regression line x on y is


(a) y - y = b yx  x - x  (b) x - x = b yx  y - y 
(c) Both (d) None

35. In the regression line x on y , y is known as


(a) Independent variable (b) Dependent variable
(c) Both (d) None

36. In the regression line x on y , x is known as


(a) Independent variable (b) Dependent variable
(c) Both (d) None

315
CA Foundation

37. The HRD manager of a company wants to find a measure which he can use to fix the
monthly income of persons applying for a job in the production Department. As an
experimental project, he collected data of 7 persons from that department referring
to years of service and their monthly incomes.
Years of service (X) 11 7 9 5 8 6 10
Income (₹ in 1000's) (Y) 10 8 6 5 9 7 11
(i) Find the regression equation of income on years of service.
(ii) What initial start would you recommend for a person applying for the job after
having served in similar capacity in another company for 13 years?
(a) y=2+075×11,750 (b) y=3+0.75× 12,750
(c) y=4+0.8×14,400 (d) y=2-0.75×11,750

38. The management of a large furniture store would like to determine sales (in
thousands of `. (X) on a given day, on the basis of number of people (Y) visited the
store on that day. The necessary records were kept and a random sample of ten
days was selected for the study. The summary results were as follows:
 x = 370,  y = 580,  x 2
= 17206, y 2
= 41658,  xy = 11494,n = 10
Obtain the line of regression of X on Y.
(a) y = 109.0912 - 1.243 x (b) x = 109.0912 - 1.243 y
(c) x = 109.0912 + 1.243 y (d) y = 109.0912 + 1.243 x
Ans.
1 2 3 4 5 6 7 8 9 10
D B D D B A C A B B
11 12 13 14 15 16 17 18 19 20
B B D A B A A A D D
21 22 23 24 25 26 27 28 29 30
A A C B B D A B B A
31 32 33 34 35 36 37 38
A C D B A B A B

TRY YOURSELF - 4

1. The statistical method which helps us to estimate or predict the unknown value of
one variable from the known value of the related variable is called
(a) Correlation (b) Scatter diagram (c) Regression (d) Dispersion

2. _____ gives the mathematical relationship of the variables.


(a) Correlation (b) Regression (c) Both (d) none

3. The regression analysis measures


(a) The degree of co—variability between X & Y.
(b) The variation of series
(c) The variation of X series
(d) Functional relationship between X and Y,

316
CA Foundation

4. If the regression line of x on y is 3x + 2y = 100, then find the value of bxy?


−2 10 3 2
(a) 3
(b) 3
(c) 2 (d) 3

5. If the regression coefficient of y on x, the coefficient of correlation between x and y


and variance of y are -3/4, - √3/2 and 4 respectively, what is the variance of x?
(a) 2/√3/2 (b) 16/3 (c) 4/3 (d) 4

6. Following are the two normal equations obtained for deriving the regression line of y
and x:
5a + 10b = 40
10a + 25b = 95
The regression line of y on x is given by
(a) 2x + 3y = 5 (b) 2y + 3x =5 (c) y = 2 + 3 x (d) y = 3 + 5x

7. Find the two regression equation from the following data and estimate y when x is 13,
x when y is 15.
X 2 4 5 5 8 10
y 6 7 9 10 12 12
(a) 16.2546 and 11.2489 ↓j→ (b) 15.3063 and 11.75 f∩
(c) 14.6352 and 10.50 (d) 18.2453 and 12.85

8. The regression equation of y on x for the following data:


X 41 82 62 37 58 96 127 74 123 100
y 28 56 35 17 42 85 105 61 98 73
Is given by
(a) y=1.2x-15 (b) y=1.2x+15 (c) y=0.93x-14.64 (d) y=1.5x-10.89

9. For y = 25, what is the estimated value of x, from the following data:
X 11 12 15 16 18 19 21
Y 21 15 13 12 11 10 9
(a) 15 (b) 13.926 (c) 13.588 (d) 14.986

10. The following data relate to the heights of 10 pairs of fathers and sons:
(175.173), (172,172), (167,171), (168,171),
(172.173), (171,170), (174,173), (176,175),
(169,170), (170,173)
The regression equation of height of son on that of father is given by
(a) Y=100+5x (b) Y=99.708+0.405x
(c) Y=89.653+0.582x (d) Y=88.758+0.562x

11. Given below the information about the capital employed and profit earned by a
company over the last twenty five years:
Particulars Mean SD
Capital employed (000 Rs.) 62 5
Profit earned (000 Rs.) 25 6

317
CA Foundation

Correlation coefficient between capital and profit = 0.92. The sum of the Regression
coefficients for the above data would be:
(a) 1.871 (b) 2.358 (c) 1.968 (d) 2.346

12. Marks of 8 students in Mathematics and Statistics are given as follows.


Maths 80 75 76 69 70 85 72 68
Stats 85 65 72 68 67 88 80 70
Find the regression lines. When mark of a student in Mathematics is 90, what is his
most likely mark in Statistics?
(a) 73 (b) 64 (c) 92 (d) 81

13. The following data relate to the expenditure on advertisement in thousands of rupees
and the corresponding sales in lakhs of rupees.
Expenditure on Ad 8 10 10 12 15
Sales 18 20 22 25 28
Find the appropriate regression equation.
(a) y = 6.4927+ 1.4643x (b) y = 7.5864 - 2.6451x
(c) y = 8.3527 + 4.3673x (d) y = 7.4575+ 1.7648x

Ans.
1 2 3 4 5 6 7 8 9 10
C B D A B C B C C B
11 12 13
A C A


Important Point of Regression

1. Regression is mathematical relationship between two variables which are known as


independent variable and dependent variable
2. regression is used to estimate dependent variable
3. Equation y on x is used to minimize vertical distance.
4. Equation x on y is used to minimize horizontal distance.
5. In the regression equation y on x byx represent the slope of the regression line
6. byx is the rate of change in value of y for unit change in the value of x.
7. In the regression equation y on x, y = a + bx 'a' represent y intersect which indicate
the average value of the dependent variable when x = 0
1
8. Represent the slope of the line x on y
b xy
9. bxy is the rate of change in the value of x for a unit change in the value of y

318
CA Foundation

10. In the regression equation x on y, x = a + by 'a' represent x intersect which


indicate the average value of the dependent variable when y = 0
11. bxy > 1, then bxy < 1
byx +b xy
12.  r  A.M  G.M 
2

13. Karl pearsons correlation coefficient (r) is geometric mean of regression


coefficient.
14. If r = ± 1, then regression lines coincide or identical.
15. Both regression coefficients and correlation coefficient has same signs. (i.e. all
are +ve or all are -ve)
16. It r = 0, then regression lines are perpendicular to each other.
17. Regression coefficient is not affected by shift of origin.
18. Regression coefficient is affected by magnitudes of corresponding coefficient of
change of scale.
19. Lines of regression have point of intersection x,y
byx.bxy -1
20. Angle between two regression lines is tan θ=
byx +bxy
21. Difference between observed value and estimated value is called error or residue.
If O-E>0 then error is positive.
If O-E<0 then error is negative
If O-E=0 then error is zero.

CLASS WORK - 9

1. The regression coefficients remain unchanged due to …….


(a) Shift of origin (b) Shift of scale
(c) Both (a) and (b) (d) (a) or (b)

2. If u = 2x + 5 and v = -3y - 6 and regression coefficient of y on x is 2.4, what is the


regression coefficient of v on u?
(a) 3.6 (b) -3.6 (c) 2.4 (d) -2.4

3. If the regression coefficient of y on x, the coefficient of correlation between x and


-3 - 3
y and variance of y are , and 4 respectively, what is the variance of x?
4 2
2 16 4
(a) (b) (c) (d) 4
3 3 3
2
319
CA Foundation

4. One of the regression coefficients of two perfectly correlated variables is 0.5,


hence the other regression co-efficient is
(a) 0.05 (b) -0.5 (c) 2 (d) -2

5. If each of X variate is divided by 5 and of Y by 10, then by coded values byx is


(a) Same as byx (b) Half of byx (c) Twice byx (d) None

6. If each value of X is divided by 2 and of Y is multiplied by 2, then byx by coded


values is
(a) Same as byx (b) Half of byx
(c) Four time of byx (d) Eight time of byx

7. If X and Y are independent, the value of regression coefficient byx is equal to:
(a) 0 (b) 1
(c) 2 (d) any positive value

8. If byx > 1 then bxy is


(a) Less than 1 (b) Greater than 1 (c) Equal to 1 (d) Equal to 0

9. If a constant 50 is subtracted from each of the value of X and Y, the regression


coefficient is:
(a) Reduced by 50
(b) (1/50)th of the original regression co-efficient
(c) Increased by 50
(d) Not changed

10. If byx = -0.8 bxy = -0.45, then r =


(a) 0.5 (b) -0.6 (c) 0.6 (d) - 0.36

11. If byx and bxy are two regression coefficients, they have
(a) Same sign (b) Opposite sign
(c) Either same or opposite signs (d) Nothing can be said

12. From the two regression equations, find r, x and y .


4y = 9x + 5 and 25x = 6y + 7
r =0.7348 (b) r = -0.7348
(a) x = 2.5652 x = 2.5652
y = 9.5217 y = 9.5217
r =0.9348 r =0-09348
(c) x = 3.5652 (d) x = 3.5652
y = 7.5217 y = 7.5217

13. Given the following equations as 3x + y = 13 and 2x + 5y = 20, which one is the
regression equation of y on x?
(a) 1st equation (b) 2 nd equation
(c) Both (a) and (b) (d) None

320
CA Foundation

14. Given the following equations: 7x + 3y = 90 and 3x + 4y = 15, which one is the
regression equation of y on x?
(a) 1st equation (b) 2 nd equation
(c) Both the equations (d) None

15. If the regression line of y on x and that of x on y are given by y = -2x + 3 and 8x =
-y+ 3 respectively, what is the coefficient of correlation between x and y?
-1
(a) 0.5 (b) (c) -0.5 (d) None
2

16. If y = 3x + 4 is the regression line of y on x and the arithmetic mean of x is -1,


what is the arithmetic mean of y ?
(a) 1 (b)-1 (c) 7 (d) None

17. The two lines of regression become identical or parallel when,


(a) r = 1 (b) r = -1 (c) r = 0 (d) a or b

18. The regression lines are perpendicular to each other if r = ?


(a) 0 (b) +1 (c) -1 (d) None

19. The greater the angle between the regression lines ______ the correlation
between the variable
(a) lesser (b) higher (c) medium (d) None

20. If the two lines of regression are perpendicular to each other, the relation
between the two regression coefficient is
(a) byx = bxy (b) byx bxy = 1 (c) byx < bxy (d) byx = bxy =0

21. If the two lines of regression are identical to each other, the relation between the
two regression coefficient is
(a) byx = bxy (b) byx bxy = 1 (c) byx < bxy (d) byx = - bxy

22. If two variable are uncorrelated their regression lines are,


(a) Parallel (b) Perpendicular (c) coincidence (d) inclined at 45 

23. What are the limits of the two regression coefficients?


(a) No limit
(b) Must be positive
(c) One positive and the other negative
(d) Product of the regression coefficient must be numerically less than unity.

24. Regression coefficient is independent of the change of


(a) Scale. (b) Origin.
(c) Both origin and scale. (d) Neither origin nor scale.
321
CA Foundation

25. Regression coefficients are affected by _____


(a) Change of Origin (b) Change of Scale
(c) Both Origin & Scale (d) Neither Origin nor Scale

26. Regression lines are passes through the _____ points


(a) Mean W (b) Standard deviation (c) Both (a) & (b) (d) None

27. Correlation coefficient r lies between the regression coefficients b yx and bxy
(a) True (b) False (c) Both (d) None

28. Since the correlation coefficient r is the _____ of the two regression coefficients
byx and bxy
(a) A.M (b) G.M (c) H.M (d) None

29. R, bxy, byx all have _____ sign.


(a) Different (b) Same (c) Both (d) None

30. If byx and bxy are both positive, then -


1 1 2 1 1 2
(a) 𝑏 + 𝑏 < 𝑟 (b) 𝑏 + 𝑏 > 𝑟
𝑦𝑥 𝑥𝛾 𝑦𝑥 𝑥𝛾
1 1 𝑟 1 1 𝑟
(c) 𝑏𝑦𝑥
+ 𝑏𝑥𝛾
< 2
(d) 𝑏𝑦𝑥
+ 𝑏𝑥𝛾
=2

31. The two lines of regression meet at:


(a) (𝑥̅ , 𝑦̅) (b) (σx,σy) (c) (σx2,σy2) (d) (x,y)

32. The point which always lies on the two lines of regression is
(a) (𝑥̅ , 𝑦̅) (b) (bxy, byx) (c) (σx, σy) (d) (0,0)

33. If there are two variables x and y, then the number of regression equations could
be
(a) 1 (b) 2 (c) Any number (d) 3

34. The angle between the regression lines depends on


(a) Correlation coefficient (b) Regression coefficient
(c) Both (d) None

35. The regression lines are identical if r is equal to-


(a) +1 (b) -1 (c) + 1 or – 1 (d) 0

36. The regression lines are perpendicular to each other if r is equal to


(a) 0 (b) +1 (c) -1 (d) +-1

322
CA Foundation

37. Two regression lines coincide when


(a) r = 0 (b) r = 2 (c) r = +1 or – 1 (d) None

38. The equations Y = a + bX and X = a + bY are based on the method of


(a) greatest squares (b) least squares (c) both (d) none

39. The method applied for deriving the regression equations is known as
(a) Least squares (b) Concurrent deviation
(c) Product moment (d) Normal equation

40. Feature of least square regression lines are "The sum of the deviations at the Y's or
the X's from their regression lines are zero".
(a) True (b) False (c) Both (d) None

41. The regression line of y on x is derived by


(a) The minimization of vertical distances in the scatter diagram
(b) The minimization of horizontal distances in the scatter diagram
(c) Both (a) and (b)
(d) (a)or(b)

42. The line of regression passes through the points, bearing _____ no. of points on
both sides
(a) Equal (b) Unequal (c) Zero (d) None

43. Two regression lines always intersect at the means.


(a) True (b) False (c) Both (d) None

44. Two lines of regression are given by 5x+7y-22=0 and 6x+2y-22=0. If the variance of
y is 15 find the standard deviation of x.
(a) 2.646 (b) 6.246 (c) 7.612 (d) 3.646

45. If two regression lines are: y = 4 + kx and x = 5+4y, then the range of k is -
(a) k ≤ 0 (b) k ≥ 0 (c) 0 ≤ k ≤ 1 (d) 0 ≤ 4k ≤ 1

46. If the relationship between two variables x and u is u + 3x = 10 and between two
other variables y and v is 2y + 5v = 25, and the regression coefficient of y on x is
known as 0.80, then what would be the regression coefficient of v on u?
(a) 0.32 (b) 0.1066 (c) 0.2548 (d) 0.1586

47. For the variables x and y, the regression equations are given as 7x-3y-18 = 0 and4x-
y-ll = 0 . After finding the arithmetic means of x and y, compute the correlation
coefficient between x and y. If the variance of x is 9, find the SD of y.
(a) 8.5642 (b) 6.2453 (c) 9.1647 (d) 7.4789

323
CA Foundation

48. If the regressive line of y on x and of x on y are given by 2x + 3y = -1 and 5x + 6y = -


1 then the arithmetic means of x and y are given by
(a) (1,-1) (b) (-1. 1) (c) (-1,-1) (d) (2, 3)

49. If 4y - 5x = 15 is the regression line of y on x and the coefficient of correlation


between x and y is 0.75, what is the value of the regression coefficient of x on y?
(a) 0.45 (b) 0.9375 (c) 0.6 (d) None of these

50. If the relationship between two variables x and y is given by 2x + 3y + 4 = 0, then


the value of the correlation coefficient between x & y is
(a) 0 (b) 1 (c) -1 (d) Negative.

51. A student of Statistics treated the regression equation of X on Y as of Y on X and of


Y on X as of X on Y and calculated bxy=-2/3 b and byx=-6
Calculate (a) Wrong Correlation Coefficient, (b) Correct Regression Coefficients, (c)
Correct Correlation Coefficient and (d) Regression Equations, if X = 4 and Y =7
3 1 1 31 3
(a) -2,  ,  , X   Y  , Y   X  13
2 2 6 6 2
3 1 1 31 3
(b) 2, ,  , X   Y  , Y   X  13
2 2 6 6 2
3 1 1 31 3
(c) -2,  ,  , X   Y  , Y   X  13
2 2 6 6 2
(d) None of these
Ans.
1 2 3 4 5 6 7 8 9 10
A B B C B C A A D B
11 12 13 14 15 16 17 18 19 20
A A B B C A D A A D
21 22 23 24 25 26 27 28 29 30
B B D B B A A B B B
31 32 33 34 35 36 37 38 39 40
A A B A C A C B A A
41 42 43 44 45 46 47 48 49 50
A A A A D B C A A C
51
A


324
CA Foundation

HOME WORK

1. Pearson's Correlation co- efficient between x and y is:

cov  x1 , y  S S 
3
cov 2  x2 , y  x y Sx S y
(a) (b) (c) (d)
Sx S y Sx S y cov  x, y  cov  x, y 
[Dec. 2022]
2. The coefficient of rank correlation between the ranking of following 6 students in
two subjects.
Mathematics and Statistics is:
Mathematics 3 5 8 4 7 10
Statistics 6 4 9 8 1 2
(a) -0.25 (b) 0.35 (c) 0.38 (d) 0.20
[Dec. 2022]

3. For n pairs of observations, the coefficient of concurrent deviation is calculated


1
as If there are six concurrent deviations, then n =
5
(a) 11 (b)10 (c)9 (d) 8
[June 2022]

4. If the data points of (X, Y) series on a scatter diagram lie along a straight line that
goes downwards as X-values move from left to right, then the data exhibit
________correlation.
(a) Direct (b) Imperfect indirect
(c) Indirect (d) Imperfect direct [Dec. 2021]

5. If the sum of the product of the deviation of and Y from their means is zero, the
correlation coefficient be- tween X and is:
(a) Zero (b) Positive (c) Negative (d) 10
[July 2021]

6. The coefficient of correlation between x and y is 0.5 the covariance, is 16 and


the standard deviation of is if S.D. of is 4
(a) 4 (b) 8 (c) 16 (d) 64
[Jan. 2021]
7. Which of the following is spurious correlation?
(a) Correlation between two variables having no causal relationship
(b) Negative Correlation
(c) Bad relation between two variables
(d) Very low correlation between two variables [Dec. 2020]

325
CA Foundation
8. Scatter diagram does not help us to
(a) Find the type of correlation
(b) Identify whether variables correlated or not
(c) Determine the linear (or) non- linear correlation
(d) Find the numerical value of correlation coefficient [Dec. 2020]

9. The Covariance between two variables is


(a) Strictly Positive (b) Strictly Negative
(c) Always Zero (d) Either positive (or) Negative (or)
[Dec. 2020]
2
10. Find the probable error if r = and N = 36
10
(a) 0.6745 (b) 0.067 (c) 0.5287 (d) None
[June 2019]

11. Correlation coefficient is __________of the units of measurement


(a) dependent (b) independent (c) both (d) None
[May 2018]

12. Rank correlation coefficient lies between


(a) 0 to 1 (b) -1 to +1 inclusive of these values
(c) -1 to 0 (d) Both [May 2018]

13. If the correlation coefficient between the variables X and Y is 0.5, then the
correlation coefficient between the variables 2 x  4 and 3  2y is
(a) 0.5 (b) 1 (c) -0.5 (d) 0
[Nov. 2018]

14. If the sum of squares of deviations of ranks of 8 students is 50 then the rank
correlation coefficient is :
(a) 0.40 (b) 0.45 (c) 0.5 (d) 0.8
[June 2018]

15. The covariance between two variables is


(a) Strictly positive (b) Strictly negative
(c) Always Zero (d) Either positive or negative or zero
[May 2018]
16. The coefficient of determination is defined by the formula
1 - unexplained variance explained variance
(a) r2 = (b) r2 =
total variance total variance
(c) both (a) and (b) (d) None [May 2018]

326
CA Foundation
17. In the method of Concurrent Deviations, only the directions of change (Positive
direction/Negative direction) in the variables are taken into account for
calculation of
(a) Coefficient of S.D (b) Coefficient of regression
(c) Coefficient of correlation (d) None [May 2018]

18. If r = 0.58, correlation coefficient of u=-5x+3 and v = y + 2 is__________:


(a) 0.58 (b) -0.58 (c) 0.62 (d) None
[June 2018]

19. Correlation between temperature and power consumption is


(a) Positive (b) Negative (c) Zero (d) None
[June 2017]

20. If r = 0.6 then the coefficient of non-determination is:


(a) 0.4 (b) -0.6 (c) 0.36 (d) 0.64
[Dec. 2017]

21. If b, yx  0.5 , b, xy  0.46 then the value of correlation coefficient r is:


(a) 0.23 (b) 0.25
(c) 0.39 (d) 0.48 [Dec.2022]

22. The equations of the two lines of regression are 4 x  3 y  7  0 and 3x  4 y  80


Find the correlation coefficient between x and y.
(a) -0.75 (b) -0.92 (c) 0.25 (d) 1.25 [Dec. 2022]

23. For positive and perfectly correlated random variables, one of the regression
coefficient is 1.3 and the standard deviation of X is 2, the variance of Y is
(a) 2.66 (b) 6.76 (c) 6.56 (d) 3.16 [June 2021]

24. For any two variables x and y the regression equations are given as 2x + 5y-9=0
and 3x-y-5=0. What are the A.M. of x and y?
(a) 2,1 (b) 1,2 (c) 4,2 (d) 2,4 [Dec. 2021]

25. The regression coefficients remain unchanged due to


(a) A shift of scale (b) A shift of origin
1 1
(c) Replacing x - values by (d) Replacing y values by
x y
[Jan. 2021]

327
CA Foundation
26. The straight-line graph of the linear equation Y = a + b X, slope is horizontal if:

(a) b=1 (b) b 0 (c) b=0 (d) a = b  0


[July 2021]
27. If Y = 9X and X = 0.01Y, then r is equal to:
(a) -0.1 (b) 0.1 (c) 0.3 (d) -0.3
[July 2021]

28. The interesting point of the two regression lines: y on x and x on y is


(a) (0,0) (b)( x, y ) (c) ( byx , bxy ) (d) (1,1)
[Jan. 2021]

29. If the regression line of Y on X is given by Y=X+2 and Karl Pearson's coefficient
 y2
of correlation is 0.5 then  _______.
 x2
(a) 3 (b) 2 (c) 4 (d) None
[June.2019]
30. The two lines of regression intersect at the point:
(a) Mean (b) Median (c) Mode (d) None of the these
[Nov. 2018]
31. Correlation coefficient can be found out by
(a) Scatter Diagram (b) Rank Method (c) Both (d) None

32. When we are not concerned with the magnitude of the two variables under
discussion, we consider
(a) Rank correlation coefficient
(b) Product moment correlation coefficient
(c) Coefficient of concurrent deviation
(d) (a) or (b) but not (c)

33. In case' Insurance Companies' Profits and the no .of claims they have to pay" :
(a) Positive correlation (c) No correlation
(b) Negative correlation (d) None of these

34. Correlation coefficient between x and y is l,then correlation coefficient between


x-2 and (-y/2) +1 is
(a) 1 (b) -1 (c) -1/2 (d) 1/2

35. Correlation coefficient between x and y is zero then two regression lines are
(a) Perpendicular to each other (c) Parallel to each other
(b) Coincide to each other (d) None of these

328
CA Foundation
36. If the correlation coefficient between the variables X and Y is 0.5, then the
correlation coefficient between the variables 2x - 4 and 3- 2y is
(a) 0.5 (b) 1 (c) -0.5 (d) 0

37. Correlation coefficient between x and y is 0.2. Find coefficient of alienation


(a) 0.04 (b) 0.96 (c) 0.98 (d) 0.16

38. In case speed of an automobile and the distance required to stop the car after
applying brakes correlation is
(a) Positive (b) Negative (c) Zero (d) None

500
39. A relationship r2 =1 - is not possible
300
(a) True (b) False (c) Both (d) None

40. The two line of regression intersect at the point


(a) Mean (b) Mode (c) Median (d) None of these

41. A.M. of regression coefficient is


(a) Equal to r (b) Greater than or equal to r
(c) Half of r (d) None

42. Given that


x -3 -3/2 0 3/2 3
y 9 9/4 0 9/4 9
Then Karipearson’s coefficient of correlation is
(a) Positive (b) Zero (c) Negative (d) None

2
43. Find the probable error if r = and N = 36
10
(a) 0.6745 (b) 0.067 (c) 0.5287 (d) None

44. If the regression line of y on x is given by Y = X + 2 and Karl Pearson’s coefficient


 y2
of correlation is 0.5 then 2  _____ .
x
(a) 3 (b) 2 (c) 4 (d) None

45. If two line of regression are x + 2y - 5 = 0 and 2x + 3y - 8 = 0. So x + 2y - 5 = 0 is


(a) y on x (b) x on y (c) both (d) None

46. If scatter diagram from a line move from lower left to upper right corner, then
correlation is
(a) Perfect Positive (b) Perfect Negative
(c) Simple Positive (d) No correlation

329
CA Foundation
47. If correlation coefficient between x and y is 0.5, then find the correlation
coefficient between 2x-3 and 3-5y is
(a) 0.5 (b) -0.5 (c) 2.5 (d) -2.5

48. If the equation of the two regression lines are 2x-3y=0 and 4y -5x= 8 then the
correlation coefficient between x and y is equal to
8 15 6 1
(a) (b) (c) (d)
15 8 15 15

49. Consider to regression line 3x + 2y = 26, 6x + y = 31. Find the correlation coefficient
between x and y
(a) 0.5 (b) -0.5 (c) 0.25 (d) -0.25

50. Which of the following is spurious correlation?


(a) Correlation between two variables having no casual relationship
(b) Negative correlation
(c) Bad relation between two variables
(d) Very low correlation between two variables.

51. Scatter diagram does not help us to?


(a) Find the type of correlation
(b) Identify whether variables correlated or not
(c) Determine the linear or non-linear correlation
(d) Find the numerical value of correlation coefficient

52. For the set of observations {(1,2), (2, 5), (3,7), (4,8), (5,10)} the value of
karlperson’s coefficient of correlation is approximately given by
(a) 0.755 (b) 0.655 (c) 0.525 (d) 0.985

53. The coefficient of correlation between x and y is 0.5 the covariance, is 16, and
the standard deviation of y is if SD of x is 4
(a) 4 (b) 8 (c) 16 (d) 64

54. The interesting point of the two regression lines: y on x and x on y is


(a) (0, 0) (b) ( x , y ) (c) (byx,bxy) (d) (1,1)

55. Given that the variance of x is equal to the square of standard deviation by and
the regression line of y on x is y = 40 + 0.5 (x-30).
Then regression line of x on y is
(a) y = 40 + 4 (x - 30) (b) y = 40 +(x-30)
(c) y = 40+ 2 (x-30) (d) x = 30 + 2 (y - 40)

330
CA Foundation
56. The regression coefficients remain unchanged due to
(a) A shift of scale (b)A shift of origin
1 1
(c) Replacing x - values by (d)Replacing y values by
x y
57. If y = 9x and x = 0.01 y then r is equal to:
(a) -0.1 (b) 0.1 (c) +0.3 (d) -0.3

58. The straight-line graph of the linear equation Y = a + b X, slope is horizontal if:
(a) b = 1 (b) b  0 (c) b = 0 (d) a = b  0

59. If byx = -1.6 and bxy = -0.4, then rxy will be:
(a) 0.4 (b) -0.8 (c) 0.64 (d) 0.8

60. If the sum of the product of the deviations of X and Y from their means is zero
the correlation coefficient between X and Y is:
(a) Zero (b) Positive (c) Negative (d) 10

61. If the slope of the regression line is calculated to be 5.5 and the intercept 15 then
the value of Y and X is 6 is:
(a) 88 (b) 48 (c) 18 (d) 78

62. The sum of square of any real positive quantities and its reciprocal is never less than:
(a) 4 (b) 2 (c) 3 (d) 4.

63. If the data points of (X, Y) series on a scatter diagram lie along a straight line that
goes downwards as X-values move from left to right, then the data exhibit
_________correlation
(a) Direct (b)Imperfect indirect
(c) Indirect (d) Imperfect direct

64. The intersecting point of two regression lines falls at X-axis. If the mean of X-
values is 16, the standard devaluation of X and Y are respectively, 3 and 4, then
the mean of Y-values is
(a) 16/3 (b) 4 (c) 0 (d) 1

65. The regression coefficients remain unchanged due to


(a) Shift of origin (b) Shift of scale (c) Always (d) Never

66. If the plotted points in a scatter diagram lie from lower left to upper right, then
(a) Negative (b) Perfect Negative (c) Indirect (d) Positive

331
CA Foundation
67. For finding correlation between two qualitative characteristics, we use
(a) Coefficient of rank correlation
(b) Scatter diagram
(c) Coefficient of concurrent deviation
(d) Product moment correlation coefficient

68. For n pairs of observations, the coefficient of concurrent deviation is calculated


1
as , If there are six concurrent deviations, then n =
5
(a) 11 (b) 10 (c) 9 (d) 8

69. For positive and perfectly correlated random variables, one of the regression
coefficient is 1.3 and the standard deviation of X is 2, the variance of Y is
(a) 2.66 (b) 6.76 (c) 6.56 (d) 3.16

70. The coefficient of rank correlation between the ranking of following 6 students in
two subjects.
Mathematics and Statistics is :
Mathematics 3 5 8 4 7 10
Statistics 6 4 9 8 1 2
(a) -0.25 (b) 0.35 (c) 0.38 (d) 0.20

71. Pearson's Correlation coefficient between x and y is:


cov  x1 , y  cov 2  x2 , y  ( S x S y )3 Sx S y
(a) (b) (c) (d)
Sx S y Sx S y cov  x, y  cov  x, y 

72. The equations of the two lines of regression are 4x + 3y+7=0 and 3x + 4y +8=0.
Find the correlation coefficient between x and y.
(a) -0.75 (b) 0.25 (c) -0.92 (d) 1.25

73. If the regression equations are 2x+3y+1=0 and 5x+6y+1=0, then Mean of x and
y respectively are
(a) -1,-1 (b) -1,1 (c) 1,-1 (d) 2,3

74. If b, yx = 0.5, b, xy= 0.46 then the value of correlation coefficient r is:
(a) 0.23 (b) 0.25 (c) 0.39 (d) 0.48

75. For variables X and Y, we collect the four observations with ΣX= 10; ΣY 14; ΣΧ2 = 65;
ΣΥ2 = 5 and ΣXY = 3. What is the regression line of Y on X?
(a) Y = - 0.8X - 5.5 (b) Y = 0.8X - 5.5 (c) Y = - 0.8X + 5.5 (d) Y = 0.8X + 5.5

76. The regression lines will be perpendicular to each other when the value of r is:
(a) 1 (b) -1 (c) 1/2 (d) 0

332
CA Foundation
77. Given that r = 0.4 and n = 81, determine the limits for the population correlation
coefficient.
(a) (0.333, 0.466) (b) (0.367, 0.433) (c) (0.337, 0.463) (d) (0.373, 0.427)

78. Spearman's rank correlation coefficient rR is given by:


6 di2 6 di2 6 di2 6 di2
(a) 1  (b) 1  (c) 1  (d) 1 
n  n 2  1 n  n 2  1 n  n2  1 n  n 2  1

79. If the regression equations are x+2y-5=0 and 2x + 3y-8= 0, then the mean of x
and the
mean of y are_________, respectively:
(a) -3 and 4 (b) 2 and 4 (c) 1 and 2 (d) 2 and 1.

80. The regression coefficients are zero if r is equal to


(a) 2 (b) -1 (c) 1 (d) 0

81. The regression equation are 8x - 10y + 66 = 0 and 40 x - 18y = 214 find the coefficient
of correlation
(a) 4/5 (b) -4/5 (c) 3/5 (d) -1

82. If two regression lines are: x +3y = 7 and 2x+5y=12 then x and y are respectively.
(a) 2, 1 (b) 1, 2 (c) 2,3 (d) 2,4

83. If the mean of two variables x & y are 3 and 1 respectively. Then the equation of two
regression lines are _____
(a) 5x+7y-22=0 & 6x+2y-20=0 (c) 5x+7y+22=0 & 6x+2y-20=0
(b) 5x+7y-22=0 & 6x+2y+20=0 (d) 5x+7y+22=0 & 6x+2y+20=0

84. Out of the two lines of regression given by x+2y=4 and 2x+3y-5=0, the regression line
of x on y is:
(a) 2x+3y-5=0 (b) x+2y=4
(c) x+2y=0 (d) The given lines can't be regression lines.

85. Equations of two lines of regression are 4x+3y+7 = 0 and 3x+4y +8 = 0, the mean of x
and y are
(a) 5/7 and 6/7 (c) 2 and 4
(b) -4/7 and-11/7 (d) None of these

86. If the regression line of y on x and that of x on y are given by y = -2x + 3 and 8x = -y + 3
respectively, what is the coefficient of correlation between x and y?
(a) 0.5 (b) -1/√2 (c) -0.5 (d) None of these

87. 8x - 3y +7 = 0, 14x - 7y + 6 = 0 are two regression equations then the correlation coefficient,
r=
(a) 0.86 (b) -0.86 (c) 0.45 (d) -0.45

333
CA Foundation
88. If y =3x + 4 is the regression line of y on x and the arithmetic mean of x is -1,
what is the arithmetic mean of y?
(a) 1 (b) -1 (c) 7 (d) None of these.

89. The square of coefficient of correlation ‘r' is called the coefficient of


(a) Determination (b) Regression (c) Both (d) None

90. Coefficient of determination is computed as


(a) r3 (b) 1-r2 (c) 1 + r2 (d) r2

91. The coefficient of determination is defined by the formula


(a) R2 = 1- unexplained variance / total variance
(b) R2 = explained variance / total variance
(c) Both
(d) none

92. If the coefficient of correlation between two variables is - 0.2, then the coefficient of
determination is
(a) 0.4 (b) 0.02 (c) 0.04 (d) 0.16

93. If the coefficient of correlation between two variables is -0.3, then the coefficient of
determination is
(a) 0.3 (b) 0.09 (c) 0.7 (d) 0.9

94. If the coefficient of correlation between two variables is -0.9, then the coefficient of
determination is
(a) 0.9 (b) 0.81 (c) 0.1 (d) 0.19

95. The coefficient of correlation between two variables is 0.5, then the coefficient of
determination is
(a) 0.5 (b) 0.25 (c) -0.5 (d) √0.5

96. If the coefficient of correlation between two variables is 0.6, then the percentage of
variation accounted for is
(a) 60% (b) 40% (c) 64% (d) 36%

97. If the coefficient of correlation between two variables is 0.6, then the percentage of
variation unaccounted for is
(a) 60% (b) 40% (c) 64% (d) 36%

98. If the coefficient of correlation between two variables is 0.7 then the percentages of
variation unaccounted for is
(a) 70% (b) 30% (c) 51% (d) 49%

99. If r = 0.6 then the coefficient of non-determination is


(a) 0.4 (b) -0.6 (c) 0.36 (d) 0.64

334
CA Foundation
100. A relationship r2 = 1 - 580 is not possible
(a) True (b) False (c) Both (d) None

101. Find the coefficient of correlation when its probable error is 0.2 and the number of pairs
of item is 9.
(a) 0.505 (b) 0.332 (c) 0.414 (d) 0.316

101. Which of the following is true:


(a) bXy = r.Sy/Sx (b) byx = r.Sy/Sx ∑𝑥𝑦 ∑𝑥𝑦
(c) 𝑆𝑥
(d) 𝑆𝑦

102. The two regression lines are: 16x - 20y + 132 =0 and 80x -30y - 428 = 0, the value of
correlation coefficient is
(a) 0.6 (b) -0.6 (c) 0.54 (d) 0.45

103. Two regression equations are x+y=6 and x+2y=10, then correlation coefficient between x
and y is
1 1
(a) -1/2 (b) 1/2 (c) - (d)
2 √2

104. The two lines of regression intersect at the point:


(a) Mean (c) Mode
(b) Median (d) None of the these

105. If the two lines of regression are x +2y -5 = 0 and 2x+3y -8=0, then the regression line
of y on x is
(a) x +2y -5 = 0 (c) 2x + 3y - 8 = 0
(b) x +2y = 0 (d) 2x + 3y = 0

106. If the two regression lines are 3X = Y and 8Y = 6X, then the value of correlation
coefficient is
(a) -0.5 (b) 0.5 (c) 0.75 (d)-0.80

107. The regression coefficient is independent of the change of


(a) Origin (b) Scale (c) Scale and origin both (d) None of these
Ans.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A A A C A B A D D B B B C A D C C B A D
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
D A A A B C C B C A B B A C C C A A A B
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
B B C A A A A C A D D B B B C C B A B
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
B C B A D A A B A A A C D D D C D C D C
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
B A A B C A C A D C C B B B D C C D A D
101 102 103 104 105 106 107
B C C A A B A

335
CA Foundation

TEST PAPER

1. The coefficient of correlation r between x and y when :


Cov (x, y) = - 16.5, Var (x) = 2.89, Var (y) = 100 is :
(a) -0.97 (b) 0.97 (c) 0.89 (d) - 0.89

2. Take 200 and 150 respectively as the assumed mean for X and Y series of 11
values, then dx = X - 200, dy = Y - 150,  dx = 13,  dx2 = 2667,  dy = 42,  dy2
= 6964,  dx dy = 3943. The value of r is:
(a) 0.77 (b) 0.98 (c) 0.92 (d) 0.82

3. For some bivariate data, the following results were obtained for the two variables

x and y : x = 53.2, y = 27.9, byx = - 1.5, bxy = - 0.2


The most probable value of y when x = 60 is :
(a) 15.6 (b) 13.4 (c) 19-7 (d) 17.7

4. If the sum of squares of the rank difference in mathematics and physics marks
of 10 students is 22, then the coefficient of rank correlation is:
(a) 0.267 (b) 0.867 (c) 0.92 (d) None

5. For the following data, the coefficient of rank correlation is :


Rank in Botany: 1 2 3 4 5
Rank in Chemistry 2 3 1 5 4
(a) 0.93 (b) 0.4 (c) 0.6 (d) None

6. For 10 pairs of observations, number of concurrent deviations was found to be


4. What is the value of the coefficient of concurrent deviation?
(a) 0.2 (b) 1/3 (c) -1/3 (d) - 0.2

7. If the covariance between two variables is 20 and the variance of one of the
variables is 16, what would be the variance of the other variable?
(a) More than 10 (b) More than 100
(c) More than 1.25 (d) Less than 10

8. Assume 69 and 112 as the mean values for X and Y respectively.


 dx = 47,  dx2 = 1475,  dy = 108,  dy2 = 3468,  dx dy = 2116 and N = 8.
Where dx = X - 69, dy = Y - 112. Then the value of r is :
(a) 0.95 (b) 0.65 (c) 0.75 (d) 0.85

9. The coefficient of rank correlation of marks obtained by 10 students, in English


and Economics was found to be 0.5. It was later discovered that the difference
in ranks in the two subjects obtained by one student was wrongly taken as 3
instead of 7. The correct coefficient of rank correlation is:
(a) 0.32 (b) 0.26 (c) 0.49 (d) 0.93

10. If the sum of square of differences of rank is 50 and number of items is 8 then
what is the value of rank correlation coefficient.
(a) 0.59 (b) 0.40 (c) 0.36 (d) 0.63

336
CA Foundation
11. The correlation coefficient between x and y is - 1/2. The value of bxy -1/8.
Find byx.
(a) -2 (b) -4 (c) 0 (d) 2

12. Which of the following regression equations represent regression line of Y on X


:
7x + 2y + 15 = 0, 2 x + 5y + 10 = 0
(a) 7x + 2y + 15 = 0 (b) 2x + 5y + 10 = 0
(c) Both (a) & (b) (d) None of these

13. If the rank correlation co-efficient between marks in Management and


Mathematics for a group of students is 0.6 and the sum of the squares of the
difference in ranks is 66. Then what is the number of students in the group?
(a) 9 (b) 10 (c) 11 (d) 12

14. Correlation coefficient between X and Y will be negative when:-


(a) X and Y are decreasing (b) X is increasing, Y is decreasing
(c) X and Y are increasing (d) None of these

15. The two regression lines are 7x-3y- 18 = 0 and 4x-y- 11 =0.Find the values of byx
and bxy
(a) 7/3, 1/4 (b) -7/3,-1/4 (c) -3/7,-1/4 (d) None of these.

16. If the two lines of regression are


x + 2y -5 = 0 and 2x + 3y - 8 = 0
The regression line of y on x is
(a) x + 2y - 5 = 0 (b) 2x + 3y – 8 = 0
(c) Any of the two line (d) None of the two lines

17. The ranks of five participants given by two judges are


Participants
A B C D E
Judge 1 1 2 3 4 5
Judge 2 5 4 3 2 1
Rank correlation coefficient between ranks will be
(a) 1 (b) 0 (c) -1 (d) 1/2

18. Given : x = 16,  x = 4.8 y = 20,  y = 9.6


The coefficient of correlation between x and y is 0.6. What will be the regression
coefficient of ‘x’ on ‘y’?
(a) 0.03 (b) 0.3 (c) 0.2 (d) 0.05

19. For a bivariate data, two lines of regression are 40x -18y = 214 and 8x -10y + 66
= 0, then find the values of x and y
(a) 17 and 13 (b) 13 and 17 (c) 13 and-17 (d) -13 and 17

20. Three competitors in a contest are ranked by two judges in the order 1,2,3 and
2,3,1 respectively. Calculate the Spearman's rank correlation coefficient.
(a) -0.5 (b) -0.8 (c) 0-5 (d) 0.8

337
CA Foundation
21. If Y is dependent variable and X is Independent variable and the S.D of X and Y
are 5 and 8 respectively and Co-efficient of co-relation between X and Y is 0.8.
Find the Regression coefficient of Y on X.
(a) 0.78 (b) 1.28 (c) 6.8 (d) 0 32

22. If the regression lines are 8x - 10y + 66 = 0 and 40x - 18y = 214, the correlation
coefficient between ‘x’ and ‘y’ is :
(a) 1 (b) 0.6 (c) - 0.6 (d) -1

23. The coefficient of correlation between two variables x and y is the simple
__________of the two regression coefficients.
(a) Arithmetic Mean (b) Geometric Mean
(c) Harmonic Mean (d) None of the above.

24. If the covariance between variables X and Y is 25 and variance of X and Y are
respectively 36 and 25, then the coefficient of correlation is
(a) 0.409 (b) 0.416 (c) 0.833 (d) 0.0277

25. In Spearman’s Correlation Coefficient, the sum of the differences of rand


between two variables shall be___________
(a) 0 (b) 1 (c) -1 - (d) None of the above

26. Determine the coefficient of correlation between x and y series:


x Series y Series
No. of items 15 15
Arithmetic Mean 25 18
Sum of Squares of Deviations from Mean 136 138
Sum of products of Deviations of x and y series from Mean = 122
(a) -0.89 (b) 0.89 (c) 0.69 (d) - 0.69

27. Price and Demand is the example for


(a) No correlation (b) Positive correlation
(c) Negative (d) None of the above

28. Two regression lines for a bivariate data are: 2x - 5y + 6 = 0 and 5x-4y 4-3 = 0.
Then the coefficient of correlation should be:
2 2 2 2 2 2
(a) 5 (b) 5 (c) 5 (d) 5

29. Out of following which is correct?


x y .xy ..xy
byx  r byx  r byx  b yx 
y x x y
(a) (b) (c) (d)

30. If r = 0.6, then the coefficient of determination is.


(a) 0.4 (b) -0.6 (c) 0.36 (d) 0.64

31. The coefficient of correlation between the temperature of environment and power
consumption is always:
(a) Positive (b) Negative (c) Zero (d) Equal to 1
338
CA Foundation
32. If two regression lines are x + y = 1 and x - y = 1 then mean values of x and y will
be:
(a) 0 and 1 (b) 1 and 1 (c) 1 and 0 (d) -1 and-1

33. If r = 0.6 then the coefficient of non-determination will be:


(a) 0.40 (b) -0.60 (c) 0.36 (d) 0.64

34. The covariance between two variables is


(a) Strictly positive (b) Strictly negative
(c) Always Zero (d) Either positive or negative or zero

35. Rank correlation coefficient lies between


(a) 0 to 1 (b) - 1 to +1 inclusive of these value
(c) - 1 to 0 (d) both

36. If the correlation coefficient between the variables X and Y is 0.5, then the
correlation coefficient between the variables 2x - 4 and 3 - 2y is
(a) 1 (b) 0.5 (c) -0.5 (d) 0

37. Given the following series:


X 10 13 12 15 8 15
Y 12 16 18 16 7 18
The rank correlation coefficient r =
 2 2 m1  m12  1 
2 m1  m12  1 6 d   
6d 2  
12  1i 12 
1 i 1
1
n  n  1
2
n  n2  1
(a) (b)
m1  m12  1
2 m1  m  1
2 3
1  6d 2   1  y d 2   12
1

i 1 n  n2  1 i 1 n  n 2  1
(c) (d)

38. Determine Spearman’s rank correlation coefficient from the given data  d2 = 30,
n = 10:
(a) r = 0.82 (b) r = 0.32 (c) r = 0.40 (d) None of the above

39. Find the coefficient of correlation.


2x + 3y = 2
4x + 3y = 4
(a) -0.71 (b) 0.71 (c) -0.5 (d) 0.5

40. What is the coefficient of correlation from the following data?


x: 1 2 3 4 5
y: 5 4 3 2 6
(a) 0 (b) -0.75 (c) -0.85 (d) 0.82

339
CA Foundation
Answers:
1 A 2 C 3 D 4 B 5 C
6 C 7 B 8 A 9 B 10 B
11 A 12 B 13 B 14 B 15 A
16 C 17 C 18 B 19 B 20 A
21 B 22 B 23 B 24 C 25 A
26 B 27 C 28 C 29 B 30 C
31 A 32 C 33 D 34 D 35 B
36 C 37 B 38 A 39 B 40 A



340

You might also like