The document discusses correlation, a statistical analysis measuring the relationship between two or more variables, and outlines its types, including positive, negative, simple, multiple, and partial correlation. It details methods for studying correlation, including graphic methods like scatter diagrams and mathematical methods such as Pearson's and Spearman's coefficients. Additionally, it compares correlation with regression, emphasizing their distinct purposes in statistical analysis.
The document discusses correlation, a statistical analysis measuring the relationship between two or more variables, and outlines its types, including positive, negative, simple, multiple, and partial correlation. It details methods for studying correlation, including graphic methods like scatter diagrams and mathematical methods such as Pearson's and Spearman's coefficients. Additionally, it compares correlation with regression, emphasizing their distinct purposes in statistical analysis.
ae ea ~
CORRELATION:
* — Correlation refers relation between two or more variables.
Correlation is a statistical analysis which measures and analyzes two variables, how they
fluctuate with reference to each other.
TYPES OF CORRELATION:
* If two variables tend to move in same direction is called positive or direct correlation.
%* If two variables tend to move in opposite direction is called negative or inverse correlation.
+ Study of only two variables, the relationship is described is called simple correlation.
* Study of more than two variables simultaneously is called multiple correlation.
* Study of two variables excluding some other variables is called partial correlation. (price,
demand, eliminating supply). Study of all the variables is called total correlation.
* The ratio of change between two variables is uniform then there will be linear correlation
between them otherwise non linear correlation,
METHOD OF STUDYING CORRELATION:
1. GRAPHIC METHOD
a) Scatter diagram (or) Scatergram
b) Simple graph
2. MATHEMATICAL METHOD
a) Karl Pearson's Coefficient of Correlation
) Spearman's Rank Coefficient of Correlation
Scanned with CamScanner—————
) Coefficient of Concurrent deviation
d) Method of Least Squares
1. GRAPHIC METHOD
A) SCATER DIAGRAMS:
‘Scater diagram is a chart obtained by plotting two variables to find out whether there is any
relationship between them. In this diagram X variables are plotted on horizontal axis and Y
variables are plotted on vertical axis.
Perfect Positive Correlation Perfect Negative Correlation
re 7
High Degree of Positive Corration igh degree of Negative Correlation
y
g
fe SS
a NN
a Sy
fo “Sy
a \
Ye Ss.
ar Np
a s
wo .
7 *
No Correlation
* —Scater diagram is simple attractive method to find out nature of correlation,
* Tris easy to understand.
* — Arough idea is got at a glance whether itis +'ve or ve.
Scanned with CamScannerB) SIMPLE GRAPH:
In this method two variables are plotted on a graph paper. We get two curves. By comparing
the curves we can decide relation between the variables.
2. MATHEMATICAL METHOD:
In this method, basing on value of correlation coefficient we can decide the relation between
variables.
A) COEFFICIENT OF CORRELATION:
KARL PEARSON'S COEFFICIENT OF CORRELATION:
Karl Pearson, a British statistici
a
= couarionceeh sy (2) p= 22 (3) = Ee
uty ioxty
where X =(x—X) Y=(y—P), &,P are the means of series x and y.
ox = S.D.of series x
oy = S.D.of series y
PROPERTIES OF CORRELATION COEFFICIENT:
1. Limits of correlation coefficient are —1 <7 $1
2. If r= 1, correlation is perfect and +ve.
If
If r=, there is no relationship between variables.
1, correlation is perfect and -ve.
3. Two independent variables are un correlated i.e., x and y are independent then r(xy) = 0
PROBLEMS:
Find if there is any significant correlation between the heights and weights given below:
1.
Sol:
Height [57 |59 [62 |63 | |@s |s5 |58 | 57
in inches
Weight] 113 [117 | 126 [126 | 130 | 129 Jit | 16 | 112
Coefficient of correlation r =
x=2=60; y= =120
suggested a mathematical method for measuring the
magnitude of linear relationship between the variables, This is known as Pearsonian
coefficient of correlation. This is denoted by r. There are several formulas for r.
Height in | Deviation xt Weights (y) | Deviation y xy
inches (x) | from mean from mean
X=(x-¥X) y=y-¥
7 3 9 113 1 49 2
59 ol 1 7 3 9 3
2 4 126 6 36 12
3 3 9 126 6 36 18
Scanned with CamScanner6 4 16 130 10 100 40
65 5 25 129 9 3 45
35 5 25 un 2 st 45
58. 2 4 116 4 16 8
57. 3 9 112 38 6 24
540 o 102 1080, o 472 216
2. _Find Karl Pearson's coefficient of correlation from the following data:
Wages 100_| 101 102 102 100 | 99 97 98 96 95,
Cost of living | 98 99 99 97 95 92 95 94 90 o1
Sol: X= = 99; ¥=*2=95
Wages (x) | Deviation xt Cost of living | Deviation | ¥? XY
from mean ®) from
X=(x-X) mean ¥ =
y-¥
100 1 1 98 3 9 3
101 2 4 99 4 16 8
102 3 9 99 4 16 2
102 3 9 97 2 4 6
100 1 1 95 o ° o
9 o o 2 3 9 o
Eu 2 4 95. o o oO
98 4 1 94 MH 1 1
96 3 9 90 3 25 is
16 31 4 16 Is
54 950 o 92. 61
PRACTICE PROBLEM:
3. Calculate the coefficient of correlation between age of cars and annual maintenance cost and
comment.
Age of cars (years) 2 |4 |e 7 Is 10 2
Annual maintenance cost 1600_| 1500 1800, 1900_| 1700_| 2100 2000
Him: ¥=2=7; ¥=""— 1800
Eay-Lt Exrsn 2
z Een
"pepe Ser coeen
Scanned with CamScannerey
With the following data in 6 cities, calculate the coefficient of correlation by Pearson's
method between the density of population and death rate.
Cities Area in Population | No. of deaths
Sqkm (000)
A 150 30 300
B 180 90 1440
c 100 40 560
D «0 a 840
E 120 n 1204
FE 380. 24 312
Sok Density of population = HAI"; Death rate = “vor deaths
Density (x) 200 500 400 700 600, 300
Death rate (y) 10 16. 4 20 17 3
k=P= 450, ¥=2=15
x) X=(x-X) x? 7) Y=y-¥ y? XY
200 -250 62500. 10 5 25 1250.
500 30 2500 16, 1 1 50
400, 50 2500 cs 1 1 50.
700 250 62500 20 5 25 | 1250
600 150 22500 17 2 4 300
150 22500 1B 2 4 300
o 175000 90 6 oo __| 3200
Jape = 0.988
PRACTICE PROBLEM:
5. _Calculate coefficient of correlation for the following data.
x 12 9 8 10 u 13 17
y 14 8 6 9 u 12. 3
Scanned with CamScanner———————
Calculate Karl Pearson's correlation coefficient for the period
x 28/41 40 38 35. 33. 40 32 36. 33
x 23 34 33 34 30, 26. 28 31 36 38
X=Sa355=35, Paa313=31
&) X=(x-X) xe y) Y=y-¥ y? XY
28 a 49 23 8 64 56.
41 6 36 34 3 9 18
40 5 25 33 2 4 10
38 3 9 4 3 9 9
35 0 o 30 1 1 o
33 2 4 26 5 25 10
40 5 25 28 3 9 15
32 3 9 31 0 0 o
36 1 1 36 5 25 5
33 2 4 38 7 49 14
355 6 162 3 195 n
RE 2 1
PRACTICE PROBLEMS:
2. _ Find suitable coefficient of correlation for the following data:
Fertitizers used 15 18 20 24 30 35 40 50
Productivity 85. 93 95, 105. 130 130 150 160
232 oe
X==29; ¥==119r=0.99
@ a
3. _Find out coefficient of correlation in the following case:
Height of father in inches_| 65__| 66 __| 67 ojos |e (1 |73
Height of son in inches | 67__| 68 _| 64 os |72 |70 [69 __|70
r=0472
4. _Find the correlation coefficient for the following data:
x65 |66_|o7 |os_ joo {7 |
y_|e7_jes_jos_|72_ |72 [69 |
rT = 0.603
5.
x |i 2 |3 |4 I|s 6 |7 s__|o
yj tu fis fas fia 7 |ie [i9_ [18
Scanned with CamScannerr=0.95
RANK CORRELATION COEFFICIENT:
‘A British Psychologist Charles Adward Spearman found out the method of finding the
coefficient of correlation by ranks. This method is based on rank and is useful in dealing with
qualitative characteristics such as Morality, Character, Intelligence and Beauty. It cannot be
measured quantitatively as in the case of Pearson's coefficient correlation. It is based on the
ranks given to the observations, Rank correlation is applicable only to the individual
observations. The formula for Spearman's Rank Correlation is given by
Where p = Rank coef ficient of correlation
D? = Sum of squares of dif ferences of two ranks
N = Number of paired observation
PROPERTIES OF RANK CORRELATION COEFFICIENT:
1.
2.
The value of q lies between -I and I ie. 1 Sp <1
If p = 1 there is complete agreement in the order if the ranks and the direction of the rank is
same,
If p=—1 then there is complete disagreement in the order of the ranks and they are in
opposite directions.
PROCEDURE TO SOLVE PROBLEMS:
1.
When the ranks are given
Step 1: Compute the differences of two ranks and denote it by D.
Step 2: Square D and get D*.
Step 3: Obtain p by substituting figures in formula.
2. When the ranks are not given but actual data are given then we must give ranks. We can give
ranks by taking the highest as | or the lowest value as 1 next to the highest (lowest) as 2 and
follow same procedure for both the variables.
PROBLEMS:
1.
Following are ranks obtained by 10 students in two subjects, Statistics and Mathematics.
To what extent the knowledge of the students in two subjects is related.
Statistics 1ij2 {3 {4 ts 6 [7 slo 10
Mathematics |2__| 4 1 5 3 9 |7 io |6_ |8
Scanned with CamScannerSol:
a fo fur fer
40
= 1-602? 2-40
pat N(N? 1), 1 10(10?=1) 1
A random sample of 5 college students
statistics are found to be
2404 _93-
Sp 1703 =07
is selected and their grades in mathemati
and
Mathematics _| 85. 60 73. 40, 90.
Statistics 93 75, 65 50 80
Calculate Spearman's rank coefficient
Marks in Rank (x), Marks in Rank (y) Rank Dp
Maths (X) Stats (Y) Difference
D=x-
85. 2 93 1 1 1
0 4 15 3 1 1
B 3 65 4 “1 1
40 5 50. 5 0 0
90 1 80 2 “1 1
4
N=5,0D?=4
Pearman's Rank Correlation
=1- S22 2 ot afhe
p= N(NF=a) 3(5?—1) 120
Ten comnetitors in a musical test were ranked by their iudges A. B and C in the following
Scanned with CamScanner@ypruaci wy CULLINUL HRINyS HE HUSIC.
EQUAL OR REPEATED RANKS:
If any two or more persons are equal in any classification or if there is more than one item
with same value in the series then the Spearman's formula for calculating the rank correlation
coefficient break down, In this case common ranks are given to repeated items. The common
rank is the average of ranks which there items would have assumed, if they were different
from each other and the next item will get the rank next to ranks already assumed.
For example: If two individuals are placed in 7 place each of them are given the rank 7.5
next rank will be 9. Similarly if 3 are ranked equal at the 7" place then they are given the
rank *5*° — g which is common rank assigned to each, and the next rank will be 10.
D+
In this case, p = 1 — 6 EP RO mea) —
Where m = the number of items whose ranks are common.
1, From the following data calculate the rank correlation coefficient after making adjustment for
tied ranks.
Scanned with CamScanner|ODS FOR DATA SCIENCE
xX 48 [33 |40 [9 |i6 lie |os [2 [16 [57
ly lis [os te tis [a [a [os [si
Sol: First we have to assign ranks to the variables.
6 Rank (x) w Rank (y) Rank D
Difference
| D=x-y
48 1B
33 13
40 T 24
9 6
16 15
16 4
65 20
24 9
16 6
31 19
16 is repeated 3 times in X items hence m= 3.
Since 13 and 6 are repeated twice in Y items m= 2
= 0.7332.
Obtain rank correlation coefficient for the following data:
64 _|75 {so [64 _|so_|75_ |40_|s5_| 64
58_|os_|45_|s1 |6o jos |as_|so |70
Scanned with CamScannerCOMPARISON BETWEEN CORRELATION AND REGRESSION:
‘The correlation coefficient is a measure of comparability between two variables, while the
regression establishes a functional relation between dependent and independent variables. In
correlation both values x and y are random variable whereas in regression x is random
variable and y is fixed variable. The coefficient of correlation is relative measure whereas
regression coefficient is an absolute figure.
METHOD OF STUDYING REGRESSION:
‘There are two methods to study regression.
1. Graphic method 2. Algebraic method
1. GRAPHIC METHOD:
In this method, the points representing the pair of values of the variables are plotted on a
graph, These points form a scatter diagram. A regression line is drawn between these points
by free hand.
FIT A REGRESSION LINE ON THE SCATER DIAGRAM FOR THE FOLLOWING
DATA:
OQ 8 OO 6 TO
ALGEBRAIC METHOD:
REGRESSION LINE:
A regression line is a straight line fitted to data by the method of least squares. It indicates
best possible mean value of one variable corresponding to mean value of the other. These are
always two regression lines constructed for the relationship between two variables X and Y.
REGRESSION EQUATION:
Regression equation is an algebraic expression of the regression line.
‘The standard form of regression equation is ¥ = a+ bX, a,b are constants. ‘a’ indicates
value of Y when X = 0. It is called Y — intercept. 'b’ indicates value of slope of regression
line. It is also called regression coefficient Y on X. If we know the value of a and b we can
easily compute value of ¥ for given value of X. The values of a, b are found with help of
normal equation.
Scanned with CamScannerVEUIuritve OL
ICS & M
a +bX (Regression equation of ¥ on X)
FOR DATA SCIENCE,
Normal equations
DY = Na+bEX
BAY = aDX+ bE Xx?
For X =a+bY (Regression equation X on Y)
UX =Na+bEy
DXxY =aLy¥+byyY?
Determine equation of a straight line which best fits the data:
x 10 12 13 16 17 20 25
¥ 10 22 24 27 29 33 37
a= 0.82, b=156
DEVIATION TAKEN FROM ARITHMETIC MEAN OF X AND Y:
‘This method is easier and simpler than the previous method to find values of a and b.
‘We can find out deviation of X and Y series from their respective means.
Regression equation of X on Y
X-R=r2v-7)
x
‘The regression coefficient of X on Y = re = & = Dy
The regression coefficient of Y on X = 2 = r? = byy. byy
REGRESSION:
EXAMPLE PROBLEMS:
1, Determine the equation of a straight line which best fits the data or find regression line of y
onx.
x 10 12 13 16 17 20 25
¥y fio [22 24 27 29 33 [37
Sol: Let the required straight lines Y = a + bX
“The two normal equations are
LY =bYX+Na
DXY = bY xX? +aDX
x x ¥ } |
10 100 10 100
12 144 22 264
13 169 24 312
16 256 27 432
17 289 29 493
20 400. 33 660
25 625 37 925
Scanned with CamScannerSol:
Substituting the values
LY =bYxX+Na
LY = 182; YX =113;N=7
113b + 7a = 182 a)
DAY = 3186; YX? = 1983; YX = 113
1983b + 113a = 3186........2)
Multiplying (1) by 113;
12769b + 7914 = 20566 .......(3)
Multiplying (2) by 7;
13881b + 7914 = 22302....... (4)
Subtracting (4) from (3); b = ed =
‘The equation of straight line is
Y¥=a+bx
a= 082; b= 156
Y =0.82 +156X
‘The equation of the required straight line is ¥ = 0.82 + 1.56X
This is called regression equation of ¥ on X.
.56 > a = 0.82
Calculate the regression equations of Y on X from the data given below, taking deviations
from actual means of X and Y.
Price (Rs.) to [a2 32 ae 5
Amount demanded ao [38 [aa [as [37__[43
Estimate the likely demand when the price is Rs. 20.
Calculation of Regression equation,
x (x - 13) xt y (y-41) = y? XY
=x
10 3 9 40 -1 1 3
12 =I 1 38. 3 9 3
B 0 0 43 2 4 0
12 =I 1 45 4 16 ~4
16 3 9 37 4 16 12
15 2 4 43 2 4 4
Regression equation of Y on X is
y-Y=r2(x-%)
oy _ IXY
ra = 025
Y¥ —41 = -0.25(X — 13) > ¥ = -0.25X + 44.25
When X is 20, ¥ = 39.25
‘When the price is Rs 20, the likely demand is 39.25.
We have r = \[Byx X Byy
a | —be Jas be
Ee SRE [Em abe < aby
For the following data, find equations of the two regression lines.
x [1 2 3 4 5
yY_ [5 25 35. 45 55
Scanned with CamScannerSol: Calculation of regression coefficients
x Y | x=XxX-X y? xy
1 15, 2 400 40
2 | 25 =I 100 10
3 [35 0 0 0
4 [45 1 100 10
5 55, 2 400 40
is_ [175 1000 100
We have X === 3 = 3 and?
= Bay _ 100
Now bys = Sar = 59 = 10
+ Regression equation of y on x is given by,
YT = dyx(x—X) > y —35 = 10(x-3)
aia _ 100 a
Now bry = 553 = yo00 = 30
«+ Regression equation of x on y is given by
X-2=by(y-J) 2x-3=507-35)
4, In the following table S is weight of Potassium bromide which will dissolve in 100 gms of
water at V°C. Fit an equation of the form $ = mT + b by the method of least squares. Use this
relation to estimate S when T= 50°
T lo 20 ‘| 40 60 30
34 [65___[ 75 85 96
Sol:
T ae 5 a ards
0 16_| 34 al 3a
20 4 65 1.00 2.0
40 0 75 0.00 3.0
60 4 85 1,00 2.0)
30 16 | _96 4.41 84
200 40__|_375 10.82 208
Now, m = Eérés = 208 — 9.52
Tdzz 40
is given by the equation 5 = m¥T + Nb
(0.52)(200) + 5b => b = 54.2
50°C, 5 = 0.52 x 50 + 54.2 = 80.2
5. A panel of two judges P and Q graded seven dramatic performances by independently
awarding marks as follows:
Performance | 1 2 3 4 5 6 7
Marks by P__| 46 42 44 40 43 4l 45
Marks by [40 [38 36 35 39 37/41
The eight performance, which judge Q would not attend, was awarded 37 marks by judge P.
If judge Q had also been present, how many marks would be expected to have been awarded
by him to the eight performance.
Scanned with CamScannerHence if the judge Q would have been present, he would have awarded 33.5 marks to the
eight performance.
6. _Find the most likely production corresponding to a rainfall 40 from the following data.
Rainfall (X)__| Production (Y)
Average 30 500 Kgs
‘Standard deviation 5 100 Kgs
Coefficient of correlation | 0.8
Sol: We have to calculate the value of Y when X = 40.
So we have to find the regression equation of Y on X
Mean of X series, ¥ = 30; Mean of Y series, 7 = 500
a of X series,a,=5; a of Y series,oy = 100
Regression of Y on X is
¥-Y=r.2@-¥) = ¥-500 = (0.8)-% (x - 30)
ey
When X = 40,Y — 500 = 7540-30) =ite Y= 500+4= 500.4
Hence the expected value of ¥ is $00.4 kg.
7, Froma sample of 200 pairs of observation the following quantities were calculated.
UX = 11.34, DY = 20.78, YX? = 12.16, DY? = 84.96, DXY = 22.13
From the above data show how to compute the coefficients of the equation Y =a+bX
Sol: We can compute the coefficients of the equation Y = a + bX by solving the normal equations:
BY =natbyX and DX¥ =aLX+aLx?
‘Substituting the values
00a + 11.346 34b and 22.13 = 11.13a+12.16b
BOTBALID ag = 22,13=11.34(0.1036-0.567b)+12.16b, using (1)
=0.1036-0.0567b...(1) = 22,13=1.175-0.643b+12.16b
0.1036-0.0567(1.82) = 20.955=11517b
20955 _ go
FEST
0.0005
Scanned with CamScannerSol:
Sol:
se the following:
Regression coefficient of Y on X is 0.7 and that of X on ¥ is 3.2
7 = Dy Dyy = VES O07 = V2.24 = 155 (approximately)
But correlation coefficient cannot exceed 1
Hence there is some inconsistency in the information given,
US & METHOI
S FOR DATA SCIENCE,
Write the relation between correlation and regression coefficients. Is it possible to have two
variables x and y with regression coefficient as 2.8 and -0.5? Explain,
We have correlation coefficient, 7 = JO % yx = 28% —(05) = V=T4 which is not a
real number.
Thus it is not possible.
DEVIATIONS TAKEN FROM THE ASSUMED MEAN:
If the actual mean is fraction this method is used.
In this method we take deviations from the assumed mean instead of Arithmetic Mean.
xX-R=r2v-¥)
%y
We can find out the value of r= by applying the following formula,
¥y
pte 2 Eee dy-Laexbey
Sy aye GE
‘The regression equations of Y on X is ¥-—¥ =r2(X—X)
*y
where dx =X—A;dy=Y-A
‘We can find out the value of r “ by applying the following formula:
¥y
Sax ay-2dar
=e
yy =r -
Noy Saxe Ea
SOLVED PROBLEMS:
1.
Sol:
‘The following data, based on 450 students, are given for marks in statistics, economics at a
certain examination.
‘Mean marks in Statistics = 40
Variance of marks (Economics) = 256
Sum of the products of deviations of marks from this respective mean 42075. Give the
equations of the two lines of regression and estimate the average marks in Economics of
candidates who obtained 50 marks in Statistics.
Given X = Mean of marks in Statistics = 40
¥ = Mean of marks in Economics = 8
.D. of marks in Statistics = 12
.D. of marks in Economics = 16
Ear _ 12075 _ 4g
Nayar ~ A50x12«I
Regression equation of X on Y isk — X = "“X(¥—Y) = X = 0.37Y + 22.24
) = ¥ = 0.65% + 22
Coefficient of correlation, r =
Regression equation of Y on X is¥ ~ ¥ = “2 (x —
When X = 50,¥ = 54.5
Scanned with CamScannerwns OHA NISIILS _UESUKI LIVE SIAL
|S & METHODS FOR DATA SCIENCE
Price indices of cotton and wool are given below for the 12 months of a year. Obtain the
‘equations of lines of regression between the indices.
Prive index ofeonon x) | 78 [77 [as Tos [or [ee [a1 [77 [76 [us [or Jos
Price index ot wootyy [si [x2 [a2 [ss [9 [oo [ss [or as [av [ox |
Sol; Calculation of Regression Equation.
Priceindex of (—e4) ] da™ | Prien | (—aB)= dy] ay? aay
cotton (X) | Sax of wool (¥)
in és a ma 2
7 ee 36 ‘2
5 1 1 2 6 3 6
ma ae 55 3 3 a
87 3 9 89 1 1 3
2 2a 0 2 4 ri
a1 3-3 a o. 7
Tr 7s 4 is 2e
765 [6] as 5 25 x
3 [41 #9 1 1 X=
Eardy-(EERESY) 207 (88
Now by, = yas nee = — ar = 0.59
708
¥ = byx(X —X)
Now Regression equation of Y on X :¥ —
= ¥ — 88.42 = 0.59(X — 83.67) > Y = 0.59X + 39.05
3. From the following data, calculate (i) Correlation coefficient (ii) Standard deviation of Y(ov)
Duy = 0.85; Bye = 0.89; oy = 3
Sol: (i) Coefficient of correlation:
Day % Dyx = VOBS X 0.89 = 0.87
(ii) Standard deviation of Y:
rx = 085 =087x2=
oy ty
07
85 = 0,
4. _ Given the following data, calculate the expected value of Y when X = 12
x y
Average 7.6 14.8
‘Standard deviation 3.6 25
r= 0.99
Sol: We have to calculate the expected value of Y when X = 12
So we have to find out the regression equation of ¥ on X
‘Mean of X series, X = 7.6; Mean of Y series, 7 = 14.8
a of X series,ay =3.6; a of Y series, a,
Coefficient of correlation (r) = 0.99
Regression of Y on X is
Scanned with CamScannerSol:
Sol:
Sol:
y-7 X)=y
When ¥ = 12,¥ =0, 688(12) + 9.57 = 17.826
Hence the expected value of Y is 17.83.
rE
The heights of mothers and daughters are given in the following table. From the two tables of
regression estimate the expected average height of daughter when the height of the mother is
64.5 inches,
Height of mother (inches) | 62_|63_|os [6s _|es__| 65 _| 6s 0
Height of daughter inches) | 64 _|65_|o1__|oo _[e7_|es [7 65
Let X = Height of the mother
And Y < Height of the daughter
Let dx = X — 65 and dy = Y ~ 67. Then
522,¥ dx = 2,.dx? = 50,2 Y = 530m Ddy = -6, Edy? =74, Vdxdy = 20
Sideay (BEL)
Sar Lee
Hence Regression equation of Y on X:¥ —¥ = byx(X—X)
= Y = 37.93 + 0.4342 when X = 64.5 then Y = 65.923.
The following calculations have been made for prices of 12 stocks (X) in stock exchange, on
a certain day along with the volume of the sales in thousands of shares (Y). From these
calculations find the regression equation of prices of stocks, on the volume of the sales of
shares.
LX = 580, LY = 370, LXY = 11499, YX? = 41658, DY? = 17.206
EX = 30 = 48.33 and ? = 2t = 72 = 30.93
-1.101
‘We have Mean =
94 22.208
17206=12 (30.83)
n equation of X on ¥ is
(—1.101)(Y — 30.83)
= X = -1101Y + 82.27.
Given the following information regarding a distribution N = 5,X = 10,¥ = 20,
YX -¥)? = 100, (¥— 10)? = 160. Find the regression coefficients and hence the
coefficient of correlation.
Here dx = X- 4, dy =¥-10
R=A+?S 5 10=7 +22 > Ydx=30 (here =4)
Also 7 = 84225 20-1042” ay = 50, (8 = 10)
‘We know that
Taray-E2EY) wo) _ gonaoo _ -220
byx = a 100-G82 ~ 300-180 a0 Tao = 2.75
by = ) _ 22 _ 065
Teo EE ~ 340
Coefficient of correlation = ty/byy + by = V0.65 X 2.75 = V1.7875 =
We have byy and by is positive, so ris also positive.
337
Scanned with CamScannerHere we get the coe
ANGLE BETWEEN TWO REGRESSION LINES:
Let the lines of regression of X on Y and Y on X are respectively given by
xo F=rZo- YY). (Mandy = JarBq-H).. 1 Q)
Slope of the line (1) =m, =22; Slope ofthe ine (2) = mz = re
Let 6 be the angle between two regression lines X on ¥ and Y on X."Then
my-me tia =a) (1) _ ey (at?) of _ (ict) one;
Jemma ~ 14(E2)(1 ne aCe ats >) atop
tani
NOTE:
If Gis acute, tan = Paac -lsy<)
2. If is obtuse, tand = e 1) 25,
7) afeo3
3. Ifr=0 then tan6 = 0 = 6 = 1/2
‘Thus if there is no relationship between the two variables (ie., they are independent) then
tand = 1/2
4, fret] then tand = 00 =Oorn
Hence the two regression lines ate parallel or coincident. The correlation between two
variables is perfect.
SOLVED PROBLEMS:
1. If @ is the angle between two regression lines and S.D. of Y is twice the $.D. of X and r=
0.25, find tan0.
Sol: Given g, = 2a, and r = 0.25
If 0 is the angle between two regression lines, then
= (127) 2307 _ (1-(025)2) op(204) _ 30.0625 2 _
tand = (= aa “CS gai = 028 3 = 1°
2. If, = 0, = ond the angle between the regression lines is Tan“ (4). Find r.
Sol: tand = (2) 3% = 9 = tan-* [|
+e} a
Here 0, = 9,
@ = tan * EE] = tant (SE)...
By data, 6 = tan“ [4] 2)
From (1) and (2), we have
At 45 3-3r? -8r = 0331? +8r—3=0
= Gr-1)(r+3)=02r=forr=-3
Since —1
= } (= r= —2 isnot possible).
PRACTICE PROBLEMS:
MEASURE OF CENTRAL TENDENCY :
1. Calculate the A.M of the following data
RollNo [1 [273 [4 [5 [6 [778 [9 [10
Marks(x) [40 [so [55 [78 [58 [60 [73 [35 | 43 [48
[Ans : 54]
2. From the following data find the mean profits.
Profits for shop | 100-200] 200-300| 300-400] 400-500 | 00-600 | 600-700 | 700-800
Number of 10 18 20 26 30 2B 18
shops
[Ans : 486]
3. Calculate median from the following data
Marks | 10-25 | 25-40 | 40-55 [55-70 | 70-85 | 85-100
Frequency | 6 2 | 44 [ 26 3 1
TAns :48.18]
4, Find the mode of the following distri
Class interval | 0-10
Frequency [5
40-50] 50-60 | 60-70] 70-80
23 | 20 10 | 10
TAns : 46.666]
5. Find the geometric mean of following data,
Yield of wheat (kg)[ 75-105] 105-135] 135-165] 165-195] 195225] 225255] 255.285]
Frequency 3 a BD B 7 4 7
[Ans : 16.02kg]
6. Calculate the H.M of the following data
Size ofitems [6 | 7 8 9 10 ul
Frequency [ 4 | 6 9 5 2 8
TAns: 8.23]
MEASURE OF DISPERSION:
1. Calculate the mean deviation from the median.
Class 0-10 [10-20 [20-30 [30-40_[ 40-50
Frequencies | 5 10 20 5 10
[Ans : 9]
2. Find the variance and standard deviation for the following frequency distribution,
x [6 [io | i4 [ig [24 | 28 30
f{[2 [47 [2s fa 3
[Ans : 43.4]
3, Find the mean and variance using step deviation method for the following data.
Age in years | 20-30 | 30-40] 40-50 | 50-60 | 60-70] 70-80] 80-90
No. of numbers | 3 ot [132 | 153 [140 [51 2
TAns : 140.89]
Scanned with CamScannerwrsuiur iyi STAT & METHODS FOR DATA SCIENCE.
OR DATA SCENE
CORRELATION AND REGRESSION:
1. _ Find the coefficient of correlation between X and Y for the following data
X/1[2 3 [4 Ts Te6éj7][s [9
es EC
[Ans : 0.8833]
2. A Sample of 12 fathers and their elder sons gave the following data about their elder sons.
Calculate coefficient of rank correlation.
Fathers] 65_[ 63 To7 [4 [8 [@ [70 | 6 ]@ [a | @ |] 7
Sons | 68 | 66 | 68 | 65 [ 69 [66 [68 | 6s | 71 | 67 | 8 | 70
[Ans:0.722}
3. Calculate the Regression equations of Y on X from the data given below taking deviations
from actual mean of X and Y.
Price(Rs) 0 | 2 73 [2 | i6 15
Amount Demanded [40 [38 | 43 | 45 | 37 43
Estimate the likely demand when the price is Rs 20 [Ans : 29.15]
seeesaes
Scanned with CamScanner