UNIT V Correlation and Regression Important Questions and QB
UNIT V Correlation and Regression Important Questions and QB
PART – A
Positive Correlation :
Increase in the values of one variable results an corresponding increase in the
values of other variable. ( Sale and profit, Experience and salary )
Negative Correlation :
Decrease in the values of one variable results to corresponding increase in the
values of other variable. (eg. Bank Interest rate increases and House price
decreases)
Linear Correlation :
In this case plotted points lie on or near about a straight line in a graph.
Curvilinear Correlatin :
In this case the plotted points over a graph form a curve.
1
PANIMALAR ENGINEERING COLLEGE(AUTONOMOUS) CHENNAI
n XY X Y
r
n X 2 ( X )2 n Y2 ( Y )2
7) What is regression ?
A regression is the measure of the average relation between two or more
variable in terms of the original units of the data.
Regression line of Y on X :
𝒏 ∑ 𝒙𝒚−(∑ 𝒙)(∑ 𝒚)
𝒚 − 𝒚̅ = 𝒃𝒚𝒙 (𝒙 − 𝒙̅) Where 𝒃𝒚𝒙 = 𝟐 𝟐 is a regression coefficient
𝒏 ∑ 𝒙 −(∑ 𝒙)
of y on x and
Regression line of X on Y :
𝒏 ∑ 𝒙𝒚−(∑ 𝒙)(∑ 𝒚)
𝒙 − 𝒙̅ = 𝒃𝒙𝒚 (𝒚 − 𝒚
̅) Where 𝒃𝒙𝒚 = 𝟐 𝟐 is a regression
𝒏 ∑ 𝒚 −(∑ 𝒚)
coefficient of x on y
𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥
11) Write the formula for angle between regression lines? (Panimalar March 2022
May 2023 )
Solution:
The angle between two regression lines is given by
(1−𝑟 2 )𝜎𝑥 𝜎𝑦
𝑡𝑎𝑛𝜃 = 2 𝜎2 )
𝑟(𝜎𝑥+ 𝑦
2
−1 (1−𝑟 )𝜎𝑥 𝜎𝑦
𝑖. 𝑒 𝜃 = tan ( 2 𝜎2 ) )
𝑟(𝜎𝑥+ 𝑦
12) When do regression lines coincide and perpendicular to each other ?
2
PANIMALAR ENGINEERING COLLEGE(AUTONOMOUS) CHENNAI
14) Write any two relations between correlation and regression?(Panimalar Feb 2023)
Solution:
1. Geometric mean of the regression coefficients is correlation coefficient.
i.e 𝑟 = √𝑏𝑥𝑦 × 𝑏𝑦𝑥
2. Arithmetic mean of the regression coefficients is greater than or equal to the
𝑏𝑥𝑦 +𝑏𝑦𝑥
correlation coefficient. i.e (
)≥𝑟 2
15) Write the difference between correlation and regression.
Solution:
Correlation Regression
The degree and direction of The nature of relationships is studied.
relationship between the variables are
studied.
If the value of one variable is known If the value of one variable is known then
then the value of the other variable the value of the other variable can be
cannot be estimated. estimated using functional relationships.
Correlation coefficient lies between -1 Only one regression coefficient can be
and 1. greater than 1.
It is used to represent the linear It is used to fit a best line and calculate the
relationship between two variables. value of variable on the basis of another
variable
Correlation Covariance
It is a measure of how closely two random It is a measure of how closely two
variables are connected. random variables change at the same
time.
Correlation coefficient lies between -1 and 1. Covariance can vary from −∞ to +∞
It is a unit free measure of the connection Its unit is assumed to be the product of
between variables since it is dimensionless. the unit two variables.
It can be deduced by dividing the calculated Correlation can be deducted from a
covariance by standard deviation. covariance.
3
PANIMALAR ENGINEERING COLLEGE(AUTONOMOUS) CHENNAI
17) What are the merits of rank correlation method. (Panimalar Feb 2023)
Solution:
1. It is simple to understand and easy to apply compare to Karl pearson method
2. It is only the method of studying correlation once the ranks are given and not
actual data.
3. even if the actual data are given, we may apply this method to study correlation
by assigning ranks to the observations’
4. rank correlation method is distribution free
5. It is not effected by extreme values
2. The result obtained by this method is different from the result obtained from
Pearson's method, when ranks are repeated.
a period of time.
PART – B
1) The following table gives the aptitude test scores and productivity indices of 10 workers
selected at random :
4
PANIMALAR ENGINEERING COLLEGE(AUTONOMOUS) CHENNAI
X : 65 67 66 71 67 70 68 69
Y : 67 68 68 70 64 67 72 70
3) Calculate the value of the Karl pearons coefficient of correlation for the following data
25-35 - 10 25 2 - 37
35-45 - 1 12 2 - 15
45-55 - - 4 16 5 25
55-65 - - - 4 2 6
Total 5 20 44 24 7 100
5) The following table gives, according to age, the frequency of marks obtained by 100
students in an intelligence test.
5
PANIMALAR ENGINEERING COLLEGE(AUTONOMOUS) CHENNAI
6) Obtain the equations of regression lines from the following data. Hence Find the
coefficient of correlation between x and y. Also estimate the value of (i) y when x=38
ii) x when y=18
X 20 26 29 30 31 31 34 35
Y 20 20 21 29 27 24 27 31
(Panimalar May 2023)
7) The following data represent the number of flash drives sold per day at a local computer
shop and their prices.
Price (X): $ 34 36 32 35 30 38
Units sold (Y): 3 4 6 5 9 2
i. Develop the least-squares lines , regression lines.
ii. Compute the coefficient of determination.
iii. Compute the sample correlation coefficient between the price and the
number of flash drives sold. ( AU / 10, 12 )
8) Purchases: 62 72 98 76 81 56 76 92 88 49
Sales: 112 124 131 117 132 96 120 136 97 85
Obtain regression equation of sales. Estimate sales when the purchases equal 100.
9) The following data give the experience of machine operators and their performance
ratings as given by the number of goods parts turned out per 100 pieces.
Operator: 1 2 3 4 5 6 7 8
Experience: 16 12 18 4 3 10 5 12
Performance rating: 87 88 89 68 78 80 75 83
Obtain the regression line of performance ratings on experience and estimate the probable
performance if an operator has 9 years experience.( AU / 09 )
10) Calculate the coeffiecient of correlation , coefficient of determination and standard error of
estimate for the data given below
Sales 33 38 24 61 52 45 65 82 29 63 50 79
No. of sections 3 7 6 6 10 12 12 13 12 13 14 15
(Panimalar May 2023)
11) Calculate the correlation and find the two lines of regression from the following data
X 57 58 59 59 60 61 62 64
Y 67 68 65 68 72 72 69 71
(Panimalar March 2022)
12) Fit a straight line trend by the method of least squares in the following data and also
forecast the earnings for the year 1985
Year 1974 1975 1976 1977 1978 1979 1980
Earnings(Rs.Lakhs) 15 14 18 20 17 24 27
(Panimalar Feb 2023)
13) Find both regression equations from the following ∑ 𝑋 = 60, ∑ 𝑋 2 = 4160, ∑ 𝑌 = 40,
∑ 𝑌 2 = 1720, ∑ 𝑋𝑌 = 1150 , 𝑁 = 10. (Panimalar March 2022)
6
PANIMALAR ENGINEERING COLLEGE(AUTONOMOUS) CHENNAI
14) The two regression lines are 2𝑥 + 3𝑦 = 8 and 4𝑥 + 𝑦 = 10. Compute 𝑥̅ , 𝑦̅ 𝑎𝑛𝑑 𝑟. Also
compute 𝜎𝑦 𝑤ℎ𝑒𝑛 𝜎𝑥 = 2
15) From the following data, Find the equations of regression lines.
Marks in Mathematics Marks in English
Mean 62.5 39
S.D 9.5 10
Coefficient of correlation between marks in Mathematics and English = 0.60
i)Estimate the marks in English when marks in Mathematics is 70
ii) Estimate the marks in Mathematics corresponding to 54 marks in English
16) Calculate the spearman’s rank correlation for the following data
X 48 34 40 12 16 16 66 25 16 57
Y 15 15 24 8 13 6 20 9 9 15
(Panimalar March 2022)
17) Ten competitors in a beauty contest are ranked by three judges in the following order.
I Judge : 1 6 5 10 3 2 4 9 7 8
II Judge : 3 5 8 4 7 10 2 1 6 9
III Judge : 6 4 9 8 1 2 3 10 5 7
Use the rank correlation coefficient to determine which pair of judges has the nearest
approach to common tastes in beauty.