0% found this document useful (0 votes)
16 views17 pages

Bivariate Data

The document discusses Karl Pearson's coefficient of correlation, detailing its properties, assumptions, and formulas for calculation. It explains the relationship between two variables, emphasizing that the correlation coefficient ranges from -1 to +1, indicating perfect negative to perfect positive correlation, respectively. Additionally, it includes examples and regression analysis concepts, highlighting the importance of linear relationships and the independence of correlation coefficients from changes in scale.

Uploaded by

Aniket Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
16 views17 pages

Bivariate Data

The document discusses Karl Pearson's coefficient of correlation, detailing its properties, assumptions, and formulas for calculation. It explains the relationship between two variables, emphasizing that the correlation coefficient ranges from -1 to +1, indicating perfect negative to perfect positive correlation, respectively. Additionally, it includes examples and regression analysis concepts, highlighting the importance of linear relationships and the independence of correlation coefficients from changes in scale.

Uploaded by

Aniket Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 17
88 ‘Mables yw hich comelat Known as Karl Pea {ficient of HON co-efficient), | bol‘, The " ic i He symbol he Wo varinblog ). Wis d y hy Where eov (My) oy Similarly a Thus ry, can also be written in the following forms DEi-DOi- 89 | xt oe n Zxy — (Ex) (Zy) ndx? — (Ex)? nBy? = (y)? when actual mean Is 1n decimal, the calculations become very tedius and in such cases we may Tike help of the following formula, nEuy ~ (Zu) (Zv) ey pea Ba Eien) (20) ah ede Asa (si aye) where A and B are assumed means. 7.6 PROPERTIES OF KARL PEARSON’S CO-EFFICIENT OF CORRELATION (r) 4.-1 Ben ee ap yi = 2 od a (x-5) zzy-y) eucrs Z(x-H(-yP) 20 y y Dividing both sides by 1 2G@-x cr A FIRST COURSE IN grap. La — ten, * pare aa {vp evels re? Varal . . tort 2 Interpretation Hitt e 1 there is perfect Positive corny The lenst value of ris T and the most 1S jorrelation i egative © perfect newati cn two variables. Wf r= — 1, there is P able. Howevey Sane va ee con between the vari . there re is no linear fe Wy = 0, we say that ther a Ye ne-lincar relationship between the variable x orrelation and if F is clogg weal Pp Wo vero, we have Jk positive © If r is positive but close we have strong positive correlation, Bee fend scale 2. Correlation co-efficient + is independent of chan jes and after changing origin and sca : ey 1¢ original variables eC and Fp 5 th Kate all conga, x 2 Y-b=KV cae > yab+ wv = X = a+hU > Y =b+kV x-X =h(U-0) y-¥ =k(V-V) Sie) = y)) Dh(u-w).k (V—¥) Now, 5, = _ En way? [ER (v— 5)? S(x- 37 E(y- HAE (uit) (V=¥) hk [= (= a)?.2(v- 5? * ry = Tx Hence Proved. 3. Two independent variables are uncorrelated but the converse is not true. Proof: If two variables are independent then their covariance is zero i.e. cov (X, ¥) =) con(XG) nes Og ty = = Oy Ox.dy Thus, if two variables are independent their co-efficient of correlation is zero i.e. independent variables are uncorrelated. But the converse is not tue. If ry, = 0, we say that there does not exist any linear correlation between the variables as Karl Peanson’s co-efficient of correlation r,, is a measur of only linear relationship. However, there may be strong nonlinear or curvilinear relationship even though ry = 0. Is eo 3 a o> conse ~ illustration consider ‘the bivariate dlstHbuton on 3 2 1 0 1 2 \ ; ° 4 1 0 1 4 9 og the formula of ry, WE get ry = 0. But X and ¥ are not independent ane they AT the non-linear relation y= x2, Hence Proved ore Ee eicuton co-efficient ris a pure mumber independent of unit of measurement ation : Correlation co-efficient is symmettic Le. ry = Px 7 ASSUMPTIONS OF KARL PEARSON'S CO-EFFICIENT OF my 71 CORRELATION 41, ‘There are three assumptions (i) the variable x and y are linearly related, (ii) there is a cause and effect relationship between factors affecting the values of the ew variable x andy. (iii) the random variables x and y are normally distributed ts, EXAMPLE 1. Calculate the co-efficient of correlation ry from the following data =X = 71, LY = 70, EX? = 555, TY? = 526, IXY= 527, n=10 (AHSEC 1997) SOLUTION. 7 = VOY) _ nEXY ~ (2X) (ZY) ? O,0y |nzx? - (Ex)? Indy? = (EY)? = 10527-7170 5270-4970 __ 300 lox 555—(71)? 10x 526-(70)? 509 360 ~ 42796 =.70 EXAMPLE 2. Find the correlation eee between X and_¥ from the following data: z 2 4 a 18 12 a ae 72. (AHSEC 1995) ee ‘A FIRST COURSE IN stan. 2 sue) ev? Bey) = hw (su) ra om 492. 2 97 area 50823 gx 50-0 aT ei m the following data , a a ¥ fro ny X am (AHSEC 24 EXAMPLE 3. de Find the correlation co-efficlest be ee interpret the result. all 7 X 16 20 : 5 : 25 3 ¥: 30 40 SOLUTION. pe iin. oS ae . ie 5 follows (0 Since X and Y are whole numbers, We can proceed aS @ : i x Y xox [vee | =X" On: aaa po 30 8 5 64 2 a 24 2 a i nt 100 0 ea 25 0 10 2) A 0 28 35 4 6 2 45 8 10 oe 400 80 =X=120 | EY=175 Zax? | 20 Hy? | 2cK- HW-7 = 160 = 250 = 100 r Recs coe 100 100 _, SKK EY-y) vi60%250 = 300 Interpretation: Since r = .5 we find positive correlation between the variables X and) EXAMPLE 4. Calculate correlation co-efficient from the following data: n = 10, Ex = 140, Zy = 150, Zoe — 10)? = 180, Zy - 15)? = 215, Xx = 10) (= 15) = 60, SOLUTION. Let us take u =x-10 vy =y-15 Zu = Ze —10)= Bx-n x 10 = 140 - 100 = 40 Zv = XQ - 15) = By—n x 15 = 150 — 150 = 0 Su? = Zor — 10 = 180 Xv? = Ly - 15)? = 215 | 10) (= 15) nEnv ~ (Zu) (Zv) © dnd? = ny? fndv? ~ vy? pe 10 x 60 - 40x0 © lox 180-40? Jidx 213-6 600 6 * J200% 2150 ~ 6557 > =91 EXAMPLE 5. show the correlation co-efficient between x and a — x is ~ 1, (AHSEC 1997, 2000) SOLUTION. We know that fy = £0¥(8 9) f Ody Pea cuGia=2) 00g _ 42 (@~¥)(a~x-a+5) LE (x—z)? JL E(a—x-a+z) (x — x)? X(x- x)? EXAMPLE 6. If a, 5, c, d are constants, then show that the co-efficient of correlation between ax +8 and cy + d in numerically equal to that between x and y. (AHSEC 1997, 2000) SOLUTION. Let u=artb and v=otd a u =axt+b > v u-@ =a(x-3) ie Now ep OS) “Eu-a Bo 2a (x~ 2) e(y—y) ac & (x~%)c(y—J) Tac (eG=5' JE0-9F A FIRST COURSE IN STATIS 94 EXAMPLE 7. 1) = 16, find the standard deviation of, Given that ry = .6, cov (x, 9) = 7.2, var ( LAS ESTey SOLUTION. We know that ry S 6 . a EXAMPLE 7. A computer while calculating correlation SS two variables X ang Y from 25 paris of observation obtained the following r¢ Ror ead ‘he Pn EXY= 508 paler: reas i airs of observati It was however, discovered at the time of checking that two pi ‘ations d (8, 6) while the correct value, ied. They were taken as (6, 14) an r mes @ Paaavenan ‘What Is the correct value of the correlation co-efficient ? SOLUTION. Corrected EX = 125-6-8+8+6 = 125 Corrected ZY = 100- 14-6+12+8=1 Corrected =X? = 650 - 6? — 82 + 82 + 62 = 650 Corrected ZY? = 460 — 256 - 36 + 144 + 64 = 436 Corrected EXY = 508-6 x 14-8 x 6+8 x 12+6 x 8 = 520 * Corrected correlation co-efficient is sh NEXY ~ (2X) (BY) NEX? — (2x)? Jnzy? —(ay)2 25 520~125x 100 aS x 100 25% 650— (125)? 1/25 x 436 —(100)2 eni500) 25x 30 = 500 _ 0 wis = 67 7.8 REGRESSION The concept of regression was first used by Sir Francis general meaning of ‘Regression’ j ie Galton in his study of heredity Sieejretita] ot) 80] Usk to thelaverage Valve Telationship between ‘Wo variables and from this average err... Regression indicates the average ‘AND REGRESSION 95 ation A the average value of one variable is estimated corresponding to a given value of other ast process is KTOwn as simple regression. In re r sis prow © Bression analysis there are two types gible «dependent variable and independent variable, A dependent variable is one whose value writes ted. It is also known as ex; plained variable. An independent variable is the variable ivan egicted. It P js Trences the value of the other variable, It is also known as explanator, afteh je words of MM. Blair, “Regression analysis is a mathematical meas ja the hip between two oF More Variables in terms of the original units of on ti ac of Regression is! regression is the fine which gives the best estimate of one variable for any specific Brie other variable, For bivariate distribution we have two lines of 4 ° gression The regression line of Y on X- this gives the best estimated value of ¥ corresponding to a given value of X. ia) The regression line of X on Y- This gives the best estimated values of oa given value of Y. ure of the average the data.” sabe o X corresponding ‘There are always two lines of regression since each variable may be treated as the ident as well as the independent variable. When we consider X as independent variable and epen dependent variable we get the regression line of Y on X Again, when we consider ¥ erie variable and X as the dependent variable we get the regression line of X on Y. si 7.9 REGRESSION EQUATIONS ‘The regression equations express the regression lines. 1. Regression equation of Yon X is Y-Y = by (X-X) where by, = Regression co-efficient of Yon X icon tae) enc OE Ox = Z@7%)(¥~¥) _ nBxy- (2) By) L(x)? nx? — (Ex)? 2. Regression equation of X on Y is X-X =b, (Y-¥) where by, = Regression co-efficient of X on Y = COV(X,Y) _ oy _ E(e- 3) (9-5) oe Gy Z(y-y)? = 22 xy ~ (Ex) (By) ny? —(2y)? Note : When actual mean is in decimal, we may take help of the formula given below: nEuv ~ (Su) (Ev) eS 2 Bee Ve nZu? — (Zu) =x-A ye =y-B A FIRST COURSE IN Statigy, 4 and B are assumed means .EFFICIENTS ¢ correlation co-efficien, REGRESSION co. co-efficients 1s th 7.10 PROPERTIES OF mn tric mean of the regressio % 1. The geome Proof. We have byx ill be the sign , en i e sign and that will sien y vill have the sam ete Both the regression co-efficients Ww ae then ris also positive if both 6, ag Se ae Peart a > 0, a, > 0, the sign of each of 7, by. 5, hg are negative then ris also negative. a eee are ffici en the other must b 2. If one of the regression co-efficient is > 1, th 5 = <2 ue Proof. Let bye > 1 (ay) ae ie Also, Dyebry = wa pols. Se (Since “=i See = rst) 1 - ce aa Hence, by <1 3. Regression co-efficients are independent of the change of origin but not of scale Proof. Let X and Y be the original variables and after changing origin and scale nex variables. X~a Ua @, 6, h, k are all constants. h> 0, k> 0. Since correlation co-efficient are independent of chan; SD is independent of chang f : Y-b Vea a k € of origin and scale, ry = ryy. Agais e in origin but not of scale ©, = ho, and, = ko, Now Also Hence proved. 97 ation coefficient, provided *Meyoat, We want 1 show by tO > . obi Phy 2 FF potent 2 or > ml Ee 5g > eae G CE ° ap toy & 200, i dy toe 20cm : (,- oF 20 pich s always true, Henee proved. whic me {mportant Remarks J, The regression co-efficient of Yon A denoted by by, gives the change in Y for a unit change in the value ofX The regression co-efficient of X on Y denoted by by, gives the change in the value of X for a unit change in the value of Y. Sot Both the regression lines passes through the point (x, )) 3, When two variables are uncorrelated (7 = 0), the lines of regression become perpendicular to each other. In case of perfect positive or perfect negative correlation (r = 1), the lines of regression coincide since they can not be parallel. EXAMPLE 9. x and y are two variables for which 10 pairs of values are available. Further Ee = 10, Ty =0, Tet = 148, By? = 164, Dy = 123 Find the regression co-efficient of y on x. (AHSEC 1992) SOLUTION. The regression co-efficient of y on x — Cov(x,y) 15 Dye Ox _nday Ex EXAMPLE 10. Given that by = .25, var (x) = 4, var (y) = 36, find the correlation between x and y (AHSEC 1993) SOLUTION. Given by = 25 we know that ere 75 98 A FIRST COURSE IN STATISTiog EXAMPLE 11. H ry = 6 and by = .8 what is the value of byy ? bad. | SOLUTION. : We know that Aim Bein boys canteen Dane eeu es cea eto 5 EXAMPLE 12. what would be the value of the If two regression co-efficient are .8 and 1.2, (AHSEC 1958 co-efficient of correlation ? SOLUTION. We know that 1 = [Bypby = VEX = 196 = 98 ee i —5=0 and 2x + 3y- 8 = 0. Fi L a pass lines have the equations x + 2y ~5=0 yeas 3 SOLUTION. Given the regression lines x+2y-5 =0 2x +3y-8 =0 Since both the regression lines passes through the point (¥, 7), we have E+2y =5 () 2x+3y =8 (ii) @ x22 27447 = 10 iif) (i) - (i) > y=2 “(> ¥+2X25—5 > =1 pa4} EXAMPLE 14. Give the following regression line of y on x is y = 10 ~ 6x. Derive the condition under which the regression line of x on y can be written as x= 1 (a0 — y), (AHSEC 1998) SOLUTION. Given the regression line of y on x y = 10 - 6x i. 26 If the regression line of y on x be > 99 maby ID REGRESSION jon ANI ce 1 2 <== (los 6 ( y) 101 oe : 6 since both the regression co-efficient are negative now & “5 —6) (- Henee oe sie 45. 6 the required condition is that there must be perfect negative correlation between x rind the Hine of regression of y on x from the following data: in icra | ass | i | WSO ea We | RTT | a Fe 55 60 hat will be the value of y for x= 48? (AHSEC 2003) IN. soLuTio! — = Z y il u=x-30 vey3o | ve uw 3 25 25 ~14 625 196 350 10 32 -20 7 400 49 140 5 4 15 5 225 25 15 B 32 5 7 25 49 35 % 39 0 0 0 0 0 35 49 5 10 25 100 50 40 55 10 16 100 256 160 Acme \e 60 15. 24 225 4 315 zus35 | xve24 | Sut=1625| sv=1116 | Suv=975 Now, ¥ =u7+30 Y =7+39 =— 3430 =74 439 8g = 25.63 = 42 bye y 42 = 75 (x - 25.63) > y = 15x + 22.78 Now x = 48, 5X 48 + 22.78 = 36 + 22.78 = 58.78 EXAMPLE 16. Heche Sx + 180 = 0. Given that P= 4,05 = eo? (AHSEC 2005) The regression equation of x on y is 3y— and n= 4, Find rand ¥. SOLUTION. : Given the regressuion equation of x on is 3y — Sx + 180 = 0 uv 1 a Sy RR From (i), EXAMPLE 17. Suppose b,, is the regression’ co-efficient of y on x What does it indicate? Interpret the meaning of the statement b,, = -.53. i (AHSEC 2005) SOLUTION. bye indicates the increment in the value of the dependent variable y for a unit change in th value of the independent variable x. Interpretation: When byx= -.53, we mean that the change in the value of y is -.53 for th unit increase in the value of °x. EXAMPLE 18, ‘The Equation of two lines of regression are given below: 8x — 10y + 66 = 0, 40x — 18) - 214 = 0 Find the co-efficient of correlation between x andy. (AHSEC 199 wea eLATION AND REGRESSION a yTion. at the equation of regression of y on x be Br- 10y + 66 = 0 and that of x on be ey 214 =0 ay - 2 = Ree AGy +68 = 0 40x ~ 18y - 214 = 9 = 1Oy = Bx + 66 a Wi as 4548 eek =oxt a a yee 10 = esate : 9 bye = ak : ; 22 20) Co-efficient of correlation a9 r =+/By,by a =t6 since both the regression co-efficients are positive, therefore r= .6 note; Here ~ 1 <. 6 <1, our supposition is correct. But if r goes outside the limits, we have to interchange the lines. EXAMPLE 19. Is the following statement correct? Give reasons. ‘The regression co-efficient of x on y is 3.2 and that of y on x is 8” (AHSEC 1996) SOLUTION. Given Now, r = fby-by = [32*3 = is Since r is out of the limit— 1 < r < 1, the statement is incorrect. EXAMPLE 20. You are given the following data * = AM 36 85 sD u 8 Correlation coefficient between x and y= .66 () Find two regression equations (i Estimate the value of x when y= 75 SOLUTION. Given, ¥ =36 ye 8o -» A FIRST COURSE IN stan, (9 Regression equa jon of von r 8 ¥-F wp tee fy cd y aS 66 Cr = %4) o, " ps y-Bs ah in « yp ® Ae * 67.72 Regression equation of x on y 0 x-€ erl2(y—5y %, u - - x- % = 66 e078) > x 36 = 908) Tid > x = OORy ~ 4114 (ii) when y = 75, x= 908 x 75 - 41.14 = 68.1 — 41.14 = 26.96 EXAMPLE 21, Af the two lines of regression are de S430 20 20x - oy ~ 107 = 0 which of these is the line of regression of x on y, Find rand 0, when o, = 3, SOLUTION. tet the regression of y on x be 20x — 9y— 107 = 0 and that af x on y be 4x Sy + 0. 20x ~ 9» - 107 = 0 4x - Sy +30 =0 9% =20r- 107 = 4x = 5y — 30 - ee OT ey oe Bi Oe 9 meats 4 20 5 oe es 5 R Now. > re Vr-b5y Since regression co-efficient are positive. 20 5 T Vomaee Gd Since, r goes outside — | < 7 og, [* Our supposition is wrong. So, the required regression ine of x on y is 20x - 9y — 107 = 9, Calculation of + Let us interchange the regression lines 20x — 9y - 107 = 0, ax - Sy + 30 =0 ORRELATION AND REGRESSION o ape . Explain what do you mean by positive correlation and negative correlation, . Define Karl Pearson’s co-efficient of correlation. What does it measut . If r= 0, what is the value of cov ae . Show that-1 < Show that correlation co-efficient is independent of change . If the correlation co-efficient between two telated variables x and 103 > 20x = 9y 4 197 SS aeag tae. in 107 A id 20°" 20 e Tea re ees 5 r 9 9 Now, by. 50 > oy 322 > som 20 en EVIEW QUESTIONS Write a short note on scatter diagram, (AHSEC 1998) Give examples. (AHSEC 1992) (AHSEC 1996) (Ans. Need not] (AHSEC 1997) (Ans. independent] ie? (AHSEC 1999) ¥) and how are x and y related? (AHSEC 2000) [Ans. o, independent] S x = 1, where r is the correlation coefficient Can two uncorrelated variable be independent? If ry, = 0, how are x and y related? (AHSEC 2004, 2007, 2015) of origin and scale, (AHSEC 2003, 2008) y be 0.5, what will be (AHSEC 1992) [Ans. .5] xand y are two independent variables, show that they are uncorrelated, the correlation co-efficient between y and x? (AHSEC 1992,2006) - The correlation co-efficient between x and y be r Ifa, 5 are constants, then show that Correlation co-efficient between ax and by is numerically equal to r, (AHSEC 1992) Suppose the correlation co-efficient (r = 0) between two variables x and y are a ee it mean that x and y are independent? Explain by means of an example,(AHS! i it scatter Define correlation. Discuss positive, negative and zero correlation with the help of sc diagram, What do you mean by bi-variate data? i i -efficients. . Write down the relationship between correlation and regression co-effici LATION AND REGRESSION 105 coRRE! te sea 21 28. 29. 30. 31. 32. 33. 35. |. What is the value of geometric mean between the regression coefficients ? en the expenditure on lodging Rs X and on fooding Rs ¥, led the following results y= 8500, 27 = 9600, 0, = , vy = Wer= 6 To study the relationship betw an enquiry into 50 families re x imate the expenditure on fooding when the expenditure on lodgin, Es Rs. 200, TAns. 198} Find the regression equation of'y on x and that of x on y: Also find the value of y when y= 3and the vlane of x when y = 5 x: 10 20 30 40 50 60 y 4 12 20 24 32 38 TAs. y= 18 + 52x, x = 14 1.7% y = 336, x = 9.5] Find the co-efficient of correlation of the heights of mothers and daughters from the following: Height of Mothers: 65 66 67 68 69 70 Height of Daughters: 67 68 66 69 72 nA n 69 [Ans. .67 (approx)] (@) Given by = 85, by, = 89 and oy = 6, find the value of + and a, [Ans. r= 87, o,, = 6.14] (®) You are given a = 6, ¥ =10, ¥ = 20, o, = 1.875, Oy = 2.5, find the line of regression of x on y. [Ans x - 1+ 45 y} (©) Two lines of regression are x + 2y- 5 = 0 and 2x + 3y - 8 = 0 and var(z) = 12. Calculate X, y, Gi and r. [Ans. 1, 2, 35.998, —87] What is the relationship between correlation coefficient and regression coefficients? (AHSEC 2008) (Ans, r=+ [By.-By ] Define Karl Pearson’s correlation coefficient. State its properties. [AHSEC 2006] What are correlation and regression’? (AHSEC 2007) (AHSEC, 2015) [Ans Correlation coefficient] ‘What is regression coefficient ? Show that regression coefficient is independent of change of origin but not of scate. (AHSEC 2015)

You might also like