0% found this document useful (0 votes)
9 views

correlation & Regression counters!!

The document discusses correlation, a statistical analysis measuring the relationship between two or more variables, and outlines its types, including positive, negative, simple, multiple, and partial correlation. It details methods for studying correlation, including graphic methods like scatter diagrams and mathematical methods such as Pearson's and Spearman's coefficients. Additionally, it compares correlation with regression, emphasizing their distinct purposes in statistical analysis.

Uploaded by

screative845
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views

correlation & Regression counters!!

The document discusses correlation, a statistical analysis measuring the relationship between two or more variables, and outlines its types, including positive, negative, simple, multiple, and partial correlation. It details methods for studying correlation, including graphic methods like scatter diagrams and mathematical methods such as Pearson's and Spearman's coefficients. Additionally, it compares correlation with regression, emphasizing their distinct purposes in statistical analysis.

Uploaded by

screative845
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 21
ae ea ~ CORRELATION: * — Correlation refers relation between two or more variables. Correlation is a statistical analysis which measures and analyzes two variables, how they fluctuate with reference to each other. TYPES OF CORRELATION: * If two variables tend to move in same direction is called positive or direct correlation. %* If two variables tend to move in opposite direction is called negative or inverse correlation. + Study of only two variables, the relationship is described is called simple correlation. * Study of more than two variables simultaneously is called multiple correlation. * Study of two variables excluding some other variables is called partial correlation. (price, demand, eliminating supply). Study of all the variables is called total correlation. * The ratio of change between two variables is uniform then there will be linear correlation between them otherwise non linear correlation, METHOD OF STUDYING CORRELATION: 1. GRAPHIC METHOD a) Scatter diagram (or) Scatergram b) Simple graph 2. MATHEMATICAL METHOD a) Karl Pearson's Coefficient of Correlation ) Spearman's Rank Coefficient of Correlation Scanned with CamScanner ————— ) Coefficient of Concurrent deviation d) Method of Least Squares 1. GRAPHIC METHOD A) SCATER DIAGRAMS: ‘Scater diagram is a chart obtained by plotting two variables to find out whether there is any relationship between them. In this diagram X variables are plotted on horizontal axis and Y variables are plotted on vertical axis. Perfect Positive Correlation Perfect Negative Correlation re 7 High Degree of Positive Corration igh degree of Negative Correlation y g fe SS a NN a Sy fo “Sy a \ Ye Ss. ar Np a s wo . 7 * No Correlation * —Scater diagram is simple attractive method to find out nature of correlation, * Tris easy to understand. * — Arough idea is got at a glance whether itis +'ve or ve. Scanned with CamScanner B) SIMPLE GRAPH: In this method two variables are plotted on a graph paper. We get two curves. By comparing the curves we can decide relation between the variables. 2. MATHEMATICAL METHOD: In this method, basing on value of correlation coefficient we can decide the relation between variables. A) COEFFICIENT OF CORRELATION: KARL PEARSON'S COEFFICIENT OF CORRELATION: Karl Pearson, a British statistici a = couarionceeh sy (2) p= 22 (3) = Ee uty ioxty where X =(x—X) Y=(y—P), &,P are the means of series x and y. ox = S.D.of series x oy = S.D.of series y PROPERTIES OF CORRELATION COEFFICIENT: 1. Limits of correlation coefficient are —1 <7 $1 2. If r= 1, correlation is perfect and +ve. If If r=, there is no relationship between variables. 1, correlation is perfect and -ve. 3. Two independent variables are un correlated i.e., x and y are independent then r(xy) = 0 PROBLEMS: Find if there is any significant correlation between the heights and weights given below: 1. Sol: Height [57 |59 [62 |63 | |@s |s5 |58 | 57 in inches Weight] 113 [117 | 126 [126 | 130 | 129 Jit | 16 | 112 Coefficient of correlation r = x=2=60; y= =120 suggested a mathematical method for measuring the magnitude of linear relationship between the variables, This is known as Pearsonian coefficient of correlation. This is denoted by r. There are several formulas for r. Height in | Deviation xt Weights (y) | Deviation y xy inches (x) | from mean from mean X=(x-¥X) y=y-¥ 7 3 9 113 1 49 2 59 ol 1 7 3 9 3 2 4 126 6 36 12 3 3 9 126 6 36 18 Scanned with CamScanner 6 4 16 130 10 100 40 65 5 25 129 9 3 45 35 5 25 un 2 st 45 58. 2 4 116 4 16 8 57. 3 9 112 38 6 24 540 o 102 1080, o 472 216 2. _Find Karl Pearson's coefficient of correlation from the following data: Wages 100_| 101 102 102 100 | 99 97 98 96 95, Cost of living | 98 99 99 97 95 92 95 94 90 o1 Sol: X= = 99; ¥=*2=95 Wages (x) | Deviation xt Cost of living | Deviation | ¥? XY from mean ®) from X=(x-X) mean ¥ = y-¥ 100 1 1 98 3 9 3 101 2 4 99 4 16 8 102 3 9 99 4 16 2 102 3 9 97 2 4 6 100 1 1 95 o ° o 9 o o 2 3 9 o Eu 2 4 95. o o oO 98 4 1 94 MH 1 1 96 3 9 90 3 25 is 16 31 4 16 Is 54 950 o 92. 61 PRACTICE PROBLEM: 3. Calculate the coefficient of correlation between age of cars and annual maintenance cost and comment. Age of cars (years) 2 |4 |e 7 Is 10 2 Annual maintenance cost 1600_| 1500 1800, 1900_| 1700_| 2100 2000 Him: ¥=2=7; ¥=""— 1800 Eay-Lt Exrsn 2 z Een "pepe Ser coeen Scanned with CamScanner ey With the following data in 6 cities, calculate the coefficient of correlation by Pearson's method between the density of population and death rate. Cities Area in Population | No. of deaths Sqkm (000) A 150 30 300 B 180 90 1440 c 100 40 560 D «0 a 840 E 120 n 1204 FE 380. 24 312 Sok Density of population = HAI"; Death rate = “vor deaths Density (x) 200 500 400 700 600, 300 Death rate (y) 10 16. 4 20 17 3 k=P= 450, ¥=2=15 x) X=(x-X) x? 7) Y=y-¥ y? XY 200 -250 62500. 10 5 25 1250. 500 30 2500 16, 1 1 50 400, 50 2500 cs 1 1 50. 700 250 62500 20 5 25 | 1250 600 150 22500 17 2 4 300 150 22500 1B 2 4 300 o 175000 90 6 oo __| 3200 Jape = 0.988 PRACTICE PROBLEM: 5. _Calculate coefficient of correlation for the following data. x 12 9 8 10 u 13 17 y 14 8 6 9 u 12. 3 Scanned with CamScanner ——————— Calculate Karl Pearson's correlation coefficient for the period x 28/41 40 38 35. 33. 40 32 36. 33 x 23 34 33 34 30, 26. 28 31 36 38 X=Sa355=35, Paa313=31 &) X=(x-X) xe y) Y=y-¥ y? XY 28 a 49 23 8 64 56. 41 6 36 34 3 9 18 40 5 25 33 2 4 10 38 3 9 4 3 9 9 35 0 o 30 1 1 o 33 2 4 26 5 25 10 40 5 25 28 3 9 15 32 3 9 31 0 0 o 36 1 1 36 5 25 5 33 2 4 38 7 49 14 355 6 162 3 195 n RE 2 1 PRACTICE PROBLEMS: 2. _ Find suitable coefficient of correlation for the following data: Fertitizers used 15 18 20 24 30 35 40 50 Productivity 85. 93 95, 105. 130 130 150 160 232 oe X==29; ¥==119r=0.99 @ a 3. _Find out coefficient of correlation in the following case: Height of father in inches_| 65__| 66 __| 67 ojos |e (1 |73 Height of son in inches | 67__| 68 _| 64 os |72 |70 [69 __|70 r=0472 4. _Find the correlation coefficient for the following data: x65 |66_|o7 |os_ joo {7 | y_|e7_jes_jos_|72_ |72 [69 | rT = 0.603 5. x |i 2 |3 |4 I|s 6 |7 s__|o yj tu fis fas fia 7 |ie [i9_ [18 Scanned with CamScanner r=0.95 RANK CORRELATION COEFFICIENT: ‘A British Psychologist Charles Adward Spearman found out the method of finding the coefficient of correlation by ranks. This method is based on rank and is useful in dealing with qualitative characteristics such as Morality, Character, Intelligence and Beauty. It cannot be measured quantitatively as in the case of Pearson's coefficient correlation. It is based on the ranks given to the observations, Rank correlation is applicable only to the individual observations. The formula for Spearman's Rank Correlation is given by Where p = Rank coef ficient of correlation D? = Sum of squares of dif ferences of two ranks N = Number of paired observation PROPERTIES OF RANK CORRELATION COEFFICIENT: 1. 2. The value of q lies between -I and I ie. 1 Sp <1 If p = 1 there is complete agreement in the order if the ranks and the direction of the rank is same, If p=—1 then there is complete disagreement in the order of the ranks and they are in opposite directions. PROCEDURE TO SOLVE PROBLEMS: 1. When the ranks are given Step 1: Compute the differences of two ranks and denote it by D. Step 2: Square D and get D*. Step 3: Obtain p by substituting figures in formula. 2. When the ranks are not given but actual data are given then we must give ranks. We can give ranks by taking the highest as | or the lowest value as 1 next to the highest (lowest) as 2 and follow same procedure for both the variables. PROBLEMS: 1. Following are ranks obtained by 10 students in two subjects, Statistics and Mathematics. To what extent the knowledge of the students in two subjects is related. Statistics 1ij2 {3 {4 ts 6 [7 slo 10 Mathematics |2__| 4 1 5 3 9 |7 io |6_ |8 Scanned with CamScanner Sol: a fo fur fer 40 = 1-602? 2-40 pat N(N? 1), 1 10(10?=1) 1 A random sample of 5 college students statistics are found to be 2404 _93- Sp 1703 =07 is selected and their grades in mathemati and Mathematics _| 85. 60 73. 40, 90. Statistics 93 75, 65 50 80 Calculate Spearman's rank coefficient Marks in Rank (x), Marks in Rank (y) Rank Dp Maths (X) Stats (Y) Difference D=x- 85. 2 93 1 1 1 0 4 15 3 1 1 B 3 65 4 “1 1 40 5 50. 5 0 0 90 1 80 2 “1 1 4 N=5,0D?=4 Pearman's Rank Correlation =1- S22 2 ot afhe p= N(NF=a) 3(5?—1) 120 Ten comnetitors in a musical test were ranked by their iudges A. B and C in the following Scanned with CamScanner @ypruaci wy CULLINUL HRINyS HE HUSIC. EQUAL OR REPEATED RANKS: If any two or more persons are equal in any classification or if there is more than one item with same value in the series then the Spearman's formula for calculating the rank correlation coefficient break down, In this case common ranks are given to repeated items. The common rank is the average of ranks which there items would have assumed, if they were different from each other and the next item will get the rank next to ranks already assumed. For example: If two individuals are placed in 7 place each of them are given the rank 7.5 next rank will be 9. Similarly if 3 are ranked equal at the 7" place then they are given the rank *5*° — g which is common rank assigned to each, and the next rank will be 10. D+ In this case, p = 1 — 6 EP RO mea) — Where m = the number of items whose ranks are common. 1, From the following data calculate the rank correlation coefficient after making adjustment for tied ranks. Scanned with CamScanner |ODS FOR DATA SCIENCE xX 48 [33 |40 [9 |i6 lie |os [2 [16 [57 ly lis [os te tis [a [a [os [si Sol: First we have to assign ranks to the variables. 6 Rank (x) w Rank (y) Rank D Difference | D=x-y 48 1B 33 13 40 T 24 9 6 16 15 16 4 65 20 24 9 16 6 31 19 16 is repeated 3 times in X items hence m= 3. Since 13 and 6 are repeated twice in Y items m= 2 = 0.7332. Obtain rank correlation coefficient for the following data: 64 _|75 {so [64 _|so_|75_ |40_|s5_| 64 58_|os_|45_|s1 |6o jos |as_|so |70 Scanned with CamScanner COMPARISON BETWEEN CORRELATION AND REGRESSION: ‘The correlation coefficient is a measure of comparability between two variables, while the regression establishes a functional relation between dependent and independent variables. In correlation both values x and y are random variable whereas in regression x is random variable and y is fixed variable. The coefficient of correlation is relative measure whereas regression coefficient is an absolute figure. METHOD OF STUDYING REGRESSION: ‘There are two methods to study regression. 1. Graphic method 2. Algebraic method 1. GRAPHIC METHOD: In this method, the points representing the pair of values of the variables are plotted on a graph, These points form a scatter diagram. A regression line is drawn between these points by free hand. FIT A REGRESSION LINE ON THE SCATER DIAGRAM FOR THE FOLLOWING DATA: OQ 8 OO 6 TO ALGEBRAIC METHOD: REGRESSION LINE: A regression line is a straight line fitted to data by the method of least squares. It indicates best possible mean value of one variable corresponding to mean value of the other. These are always two regression lines constructed for the relationship between two variables X and Y. REGRESSION EQUATION: Regression equation is an algebraic expression of the regression line. ‘The standard form of regression equation is ¥ = a+ bX, a,b are constants. ‘a’ indicates value of Y when X = 0. It is called Y — intercept. 'b’ indicates value of slope of regression line. It is also called regression coefficient Y on X. If we know the value of a and b we can easily compute value of ¥ for given value of X. The values of a, b are found with help of normal equation. Scanned with CamScanner VEUIuritve OL ICS & M a +bX (Regression equation of ¥ on X) FOR DATA SCIENCE, Normal equations DY = Na+bEX BAY = aDX+ bE Xx? For X =a+bY (Regression equation X on Y) UX =Na+bEy DXxY =aLy¥+byyY? Determine equation of a straight line which best fits the data: x 10 12 13 16 17 20 25 ¥ 10 22 24 27 29 33 37 a= 0.82, b=156 DEVIATION TAKEN FROM ARITHMETIC MEAN OF X AND Y: ‘This method is easier and simpler than the previous method to find values of a and b. ‘We can find out deviation of X and Y series from their respective means. Regression equation of X on Y X-R=r2v-7) x ‘The regression coefficient of X on Y = re = & = Dy The regression coefficient of Y on X = 2 = r? = byy. byy REGRESSION: EXAMPLE PROBLEMS: 1, Determine the equation of a straight line which best fits the data or find regression line of y onx. x 10 12 13 16 17 20 25 ¥y fio [22 24 27 29 33 [37 Sol: Let the required straight lines Y = a + bX “The two normal equations are LY =bYX+Na DXY = bY xX? +aDX x x ¥ } | 10 100 10 100 12 144 22 264 13 169 24 312 16 256 27 432 17 289 29 493 20 400. 33 660 25 625 37 925 Scanned with CamScanner Sol: Substituting the values LY =bYxX+Na LY = 182; YX =113;N=7 113b + 7a = 182 a) DAY = 3186; YX? = 1983; YX = 113 1983b + 113a = 3186........2) Multiplying (1) by 113; 12769b + 7914 = 20566 .......(3) Multiplying (2) by 7; 13881b + 7914 = 22302....... (4) Subtracting (4) from (3); b = ed = ‘The equation of straight line is Y¥=a+bx a= 082; b= 156 Y =0.82 +156X ‘The equation of the required straight line is ¥ = 0.82 + 1.56X This is called regression equation of ¥ on X. .56 > a = 0.82 Calculate the regression equations of Y on X from the data given below, taking deviations from actual means of X and Y. Price (Rs.) to [a2 32 ae 5 Amount demanded ao [38 [aa [as [37__[43 Estimate the likely demand when the price is Rs. 20. Calculation of Regression equation, x (x - 13) xt y (y-41) = y? XY =x 10 3 9 40 -1 1 3 12 =I 1 38. 3 9 3 B 0 0 43 2 4 0 12 =I 1 45 4 16 ~4 16 3 9 37 4 16 12 15 2 4 43 2 4 4 Regression equation of Y on X is y-Y=r2(x-%) oy _ IXY ra = 025 Y¥ —41 = -0.25(X — 13) > ¥ = -0.25X + 44.25 When X is 20, ¥ = 39.25 ‘When the price is Rs 20, the likely demand is 39.25. We have r = \[Byx X Byy a | —be Jas be Ee SRE [Em abe < aby For the following data, find equations of the two regression lines. x [1 2 3 4 5 yY_ [5 25 35. 45 55 Scanned with CamScanner Sol: Calculation of regression coefficients x Y | x=XxX-X y? xy 1 15, 2 400 40 2 | 25 =I 100 10 3 [35 0 0 0 4 [45 1 100 10 5 55, 2 400 40 is_ [175 1000 100 We have X === 3 = 3 and? = Bay _ 100 Now bys = Sar = 59 = 10 + Regression equation of y on x is given by, YT = dyx(x—X) > y —35 = 10(x-3) aia _ 100 a Now bry = 553 = yo00 = 30 «+ Regression equation of x on y is given by X-2=by(y-J) 2x-3=507-35) 4, In the following table S is weight of Potassium bromide which will dissolve in 100 gms of water at V°C. Fit an equation of the form $ = mT + b by the method of least squares. Use this relation to estimate S when T= 50° T lo 20 ‘| 40 60 30 34 [65___[ 75 85 96 Sol: T ae 5 a ards 0 16_| 34 al 3a 20 4 65 1.00 2.0 40 0 75 0.00 3.0 60 4 85 1,00 2.0) 30 16 | _96 4.41 84 200 40__|_375 10.82 208 Now, m = Eérés = 208 — 9.52 Tdzz 40 is given by the equation 5 = m¥T + Nb (0.52)(200) + 5b => b = 54.2 50°C, 5 = 0.52 x 50 + 54.2 = 80.2 5. A panel of two judges P and Q graded seven dramatic performances by independently awarding marks as follows: Performance | 1 2 3 4 5 6 7 Marks by P__| 46 42 44 40 43 4l 45 Marks by [40 [38 36 35 39 37/41 The eight performance, which judge Q would not attend, was awarded 37 marks by judge P. If judge Q had also been present, how many marks would be expected to have been awarded by him to the eight performance. Scanned with CamScanner Hence if the judge Q would have been present, he would have awarded 33.5 marks to the eight performance. 6. _Find the most likely production corresponding to a rainfall 40 from the following data. Rainfall (X)__| Production (Y) Average 30 500 Kgs ‘Standard deviation 5 100 Kgs Coefficient of correlation | 0.8 Sol: We have to calculate the value of Y when X = 40. So we have to find the regression equation of Y on X Mean of X series, ¥ = 30; Mean of Y series, 7 = 500 a of X series,a,=5; a of Y series,oy = 100 Regression of Y on X is ¥-Y=r.2@-¥) = ¥-500 = (0.8)-% (x - 30) ey When X = 40,Y — 500 = 7540-30) =ite Y= 500+4= 500.4 Hence the expected value of ¥ is $00.4 kg. 7, Froma sample of 200 pairs of observation the following quantities were calculated. UX = 11.34, DY = 20.78, YX? = 12.16, DY? = 84.96, DXY = 22.13 From the above data show how to compute the coefficients of the equation Y =a+bX Sol: We can compute the coefficients of the equation Y = a + bX by solving the normal equations: BY =natbyX and DX¥ =aLX+aLx? ‘Substituting the values 00a + 11.346 34b and 22.13 = 11.13a+12.16b BOTBALID ag = 22,13=11.34(0.1036-0.567b)+12.16b, using (1) =0.1036-0.0567b...(1) = 22,13=1.175-0.643b+12.16b 0.1036-0.0567(1.82) = 20.955=11517b 20955 _ go FEST 0.0005 Scanned with CamScanner Sol: Sol: se the following: Regression coefficient of Y on X is 0.7 and that of X on ¥ is 3.2 7 = Dy Dyy = VES O07 = V2.24 = 155 (approximately) But correlation coefficient cannot exceed 1 Hence there is some inconsistency in the information given, US & METHOI S FOR DATA SCIENCE, Write the relation between correlation and regression coefficients. Is it possible to have two variables x and y with regression coefficient as 2.8 and -0.5? Explain, We have correlation coefficient, 7 = JO % yx = 28% —(05) = V=T4 which is not a real number. Thus it is not possible. DEVIATIONS TAKEN FROM THE ASSUMED MEAN: If the actual mean is fraction this method is used. In this method we take deviations from the assumed mean instead of Arithmetic Mean. xX-R=r2v-¥) %y We can find out the value of r= by applying the following formula, ¥y pte 2 Eee dy-Laexbey Sy aye GE ‘The regression equations of Y on X is ¥-—¥ =r2(X—X) *y where dx =X—A;dy=Y-A ‘We can find out the value of r “ by applying the following formula: ¥y Sax ay-2dar =e yy =r - Noy Saxe Ea SOLVED PROBLEMS: 1. Sol: ‘The following data, based on 450 students, are given for marks in statistics, economics at a certain examination. ‘Mean marks in Statistics = 40 Variance of marks (Economics) = 256 Sum of the products of deviations of marks from this respective mean 42075. Give the equations of the two lines of regression and estimate the average marks in Economics of candidates who obtained 50 marks in Statistics. Given X = Mean of marks in Statistics = 40 ¥ = Mean of marks in Economics = 8 .D. of marks in Statistics = 12 .D. of marks in Economics = 16 Ear _ 12075 _ 4g Nayar ~ A50x12«I Regression equation of X on Y isk — X = "“X(¥—Y) = X = 0.37Y + 22.24 ) = ¥ = 0.65% + 22 Coefficient of correlation, r = Regression equation of Y on X is¥ ~ ¥ = “2 (x — When X = 50,¥ = 54.5 Scanned with CamScanner wns OHA NISIILS _UESUKI LIVE SIAL |S & METHODS FOR DATA SCIENCE Price indices of cotton and wool are given below for the 12 months of a year. Obtain the ‘equations of lines of regression between the indices. Prive index ofeonon x) | 78 [77 [as Tos [or [ee [a1 [77 [76 [us [or Jos Price index ot wootyy [si [x2 [a2 [ss [9 [oo [ss [or as [av [ox | Sol; Calculation of Regression Equation. Priceindex of (—e4) ] da™ | Prien | (—aB)= dy] ay? aay cotton (X) | Sax of wool (¥) in és a ma 2 7 ee 36 ‘2 5 1 1 2 6 3 6 ma ae 55 3 3 a 87 3 9 89 1 1 3 2 2a 0 2 4 ri a1 3-3 a o. 7 Tr 7s 4 is 2e 765 [6] as 5 25 x 3 [41 #9 1 1 X= Eardy-(EERESY) 207 (88 Now by, = yas nee = — ar = 0.59 708 ¥ = byx(X —X) Now Regression equation of Y on X :¥ — = ¥ — 88.42 = 0.59(X — 83.67) > Y = 0.59X + 39.05 3. From the following data, calculate (i) Correlation coefficient (ii) Standard deviation of Y(ov) Duy = 0.85; Bye = 0.89; oy = 3 Sol: (i) Coefficient of correlation: Day % Dyx = VOBS X 0.89 = 0.87 (ii) Standard deviation of Y: rx = 085 =087x2= oy ty 07 85 = 0, 4. _ Given the following data, calculate the expected value of Y when X = 12 x y Average 7.6 14.8 ‘Standard deviation 3.6 25 r= 0.99 Sol: We have to calculate the expected value of Y when X = 12 So we have to find out the regression equation of ¥ on X ‘Mean of X series, X = 7.6; Mean of Y series, 7 = 14.8 a of X series,ay =3.6; a of Y series, a, Coefficient of correlation (r) = 0.99 Regression of Y on X is Scanned with CamScanner Sol: Sol: Sol: y-7 X)=y When ¥ = 12,¥ =0, 688(12) + 9.57 = 17.826 Hence the expected value of Y is 17.83. rE The heights of mothers and daughters are given in the following table. From the two tables of regression estimate the expected average height of daughter when the height of the mother is 64.5 inches, Height of mother (inches) | 62_|63_|os [6s _|es__| 65 _| 6s 0 Height of daughter inches) | 64 _|65_|o1__|oo _[e7_|es [7 65 Let X = Height of the mother And Y < Height of the daughter Let dx = X — 65 and dy = Y ~ 67. Then 522,¥ dx = 2,.dx? = 50,2 Y = 530m Ddy = -6, Edy? =74, Vdxdy = 20 Sideay (BEL) Sar Lee Hence Regression equation of Y on X:¥ —¥ = byx(X—X) = Y = 37.93 + 0.4342 when X = 64.5 then Y = 65.923. The following calculations have been made for prices of 12 stocks (X) in stock exchange, on a certain day along with the volume of the sales in thousands of shares (Y). From these calculations find the regression equation of prices of stocks, on the volume of the sales of shares. LX = 580, LY = 370, LXY = 11499, YX? = 41658, DY? = 17.206 EX = 30 = 48.33 and ? = 2t = 72 = 30.93 -1.101 ‘We have Mean = 94 22.208 17206=12 (30.83) n equation of X on ¥ is (—1.101)(Y — 30.83) = X = -1101Y + 82.27. Given the following information regarding a distribution N = 5,X = 10,¥ = 20, YX -¥)? = 100, (¥— 10)? = 160. Find the regression coefficients and hence the coefficient of correlation. Here dx = X- 4, dy =¥-10 R=A+?S 5 10=7 +22 > Ydx=30 (here =4) Also 7 = 84225 20-1042” ay = 50, (8 = 10) ‘We know that Taray-E2EY) wo) _ gonaoo _ -220 byx = a 100-G82 ~ 300-180 a0 Tao = 2.75 by = ) _ 22 _ 065 Teo EE ~ 340 Coefficient of correlation = ty/byy + by = V0.65 X 2.75 = V1.7875 = We have byy and by is positive, so ris also positive. 337 Scanned with CamScanner Here we get the coe ANGLE BETWEEN TWO REGRESSION LINES: Let the lines of regression of X on Y and Y on X are respectively given by xo F=rZo- YY). (Mandy = JarBq-H).. 1 Q) Slope of the line (1) =m, =22; Slope ofthe ine (2) = mz = re Let 6 be the angle between two regression lines X on ¥ and Y on X."Then my-me tia =a) (1) _ ey (at?) of _ (ict) one; Jemma ~ 14(E2)(1 ne aCe ats >) atop tani NOTE: If Gis acute, tan = Paac -lsy<) 2. If is obtuse, tand = e 1) 25, 7) afeo3 3. Ifr=0 then tan6 = 0 = 6 = 1/2 ‘Thus if there is no relationship between the two variables (ie., they are independent) then tand = 1/2 4, fret] then tand = 00 =Oorn Hence the two regression lines ate parallel or coincident. The correlation between two variables is perfect. SOLVED PROBLEMS: 1. If @ is the angle between two regression lines and S.D. of Y is twice the $.D. of X and r= 0.25, find tan0. Sol: Given g, = 2a, and r = 0.25 If 0 is the angle between two regression lines, then = (127) 2307 _ (1-(025)2) op(204) _ 30.0625 2 _ tand = (= aa “CS gai = 028 3 = 1° 2. If, = 0, = ond the angle between the regression lines is Tan“ (4). Find r. Sol: tand = (2) 3% = 9 = tan-* [| +e} a Here 0, = 9, @ = tan * EE] = tant (SE)... By data, 6 = tan“ [4] 2) From (1) and (2), we have At 45 3-3r? -8r = 0331? +8r—3=0 = Gr-1)(r+3)=02r=forr=-3 Since —1 = } (= r= —2 isnot possible). PRACTICE PROBLEMS: MEASURE OF CENTRAL TENDENCY : 1. Calculate the A.M of the following data RollNo [1 [273 [4 [5 [6 [778 [9 [10 Marks(x) [40 [so [55 [78 [58 [60 [73 [35 | 43 [48 [Ans : 54] 2. From the following data find the mean profits. Profits for shop | 100-200] 200-300| 300-400] 400-500 | 00-600 | 600-700 | 700-800 Number of 10 18 20 26 30 2B 18 shops [Ans : 486] 3. Calculate median from the following data Marks | 10-25 | 25-40 | 40-55 [55-70 | 70-85 | 85-100 Frequency | 6 2 | 44 [ 26 3 1 TAns :48.18] 4, Find the mode of the following distri Class interval | 0-10 Frequency [5 40-50] 50-60 | 60-70] 70-80 23 | 20 10 | 10 TAns : 46.666] 5. Find the geometric mean of following data, Yield of wheat (kg)[ 75-105] 105-135] 135-165] 165-195] 195225] 225255] 255.285] Frequency 3 a BD B 7 4 7 [Ans : 16.02kg] 6. Calculate the H.M of the following data Size ofitems [6 | 7 8 9 10 ul Frequency [ 4 | 6 9 5 2 8 TAns: 8.23] MEASURE OF DISPERSION: 1. Calculate the mean deviation from the median. Class 0-10 [10-20 [20-30 [30-40_[ 40-50 Frequencies | 5 10 20 5 10 [Ans : 9] 2. Find the variance and standard deviation for the following frequency distribution, x [6 [io | i4 [ig [24 | 28 30 f{[2 [47 [2s fa 3 [Ans : 43.4] 3, Find the mean and variance using step deviation method for the following data. Age in years | 20-30 | 30-40] 40-50 | 50-60 | 60-70] 70-80] 80-90 No. of numbers | 3 ot [132 | 153 [140 [51 2 TAns : 140.89] Scanned with CamScanner wrsuiur iyi STAT & METHODS FOR DATA SCIENCE. OR DATA SCENE CORRELATION AND REGRESSION: 1. _ Find the coefficient of correlation between X and Y for the following data X/1[2 3 [4 Ts Te6éj7][s [9 es EC [Ans : 0.8833] 2. A Sample of 12 fathers and their elder sons gave the following data about their elder sons. Calculate coefficient of rank correlation. Fathers] 65_[ 63 To7 [4 [8 [@ [70 | 6 ]@ [a | @ |] 7 Sons | 68 | 66 | 68 | 65 [ 69 [66 [68 | 6s | 71 | 67 | 8 | 70 [Ans:0.722} 3. Calculate the Regression equations of Y on X from the data given below taking deviations from actual mean of X and Y. Price(Rs) 0 | 2 73 [2 | i6 15 Amount Demanded [40 [38 | 43 | 45 | 37 43 Estimate the likely demand when the price is Rs 20 [Ans : 29.15] seeesaes Scanned with CamScanner

You might also like