Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
9 views
Correlation Analysis
Gg
Uploaded by
ravendrachauhanmba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Correlation Analysis For Later
Download
Save
Save Correlation Analysis For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
9 views
Correlation Analysis
Gg
Uploaded by
ravendrachauhanmba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Correlation Analysis For Later
Carousel Previous
Carousel Next
Save
Save Correlation Analysis For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 34
Search
Fullscreen
Correlation Analysis | LEARNING OBJECTIVES ‘After studying this chapter, you should be able to * express quantitatively the degree and direction of the covariation or association between two variables. ‘+ determine the validity and reliability of the covariation or association between two variables. «provide a test of hypothesis to determine whether a linear relationship actually exists between the variables. 13.1 INTRODUCTION The statistical methods, discussed so far, are used to analyse the data involving only one variable. Often an analysis of data concerning two or more quantitative variables is needed to look for any statistical relationship or association between them that can describe specific jumerical features of the association. The knowledge of such a relationship is important inake inferences from the relationship between variables in a given situation. Few instances iiore the knowledge of an association or a relationship between two variables would be helpful to make decision are as follows: ome and expenditure on luxury items. Id ofa crop and quantity of fertilizer used. «Sales revenue and expenses incurred on advertising Frequency of smoking and lung damage. Weight and height of individuals. jrection Alew the strength (magnitude) and di A statistical technique that is used to analyse bles is called correlation analysi of the relationship between two quantitative varia definitions of correlation analysis are as follows: . 1c relationship of two or more variables is usually called correla ‘An analysis of the relationship of two or more variables is usually called 87" ye a + When the relationship is ofa quantitative nature, the appropriate statistical wool discovering and measuring the relationship and expressing it in brief formule —Croxton and Cow known as correlation.sient of correlation is a number that indy RELATIONS 458 | ine fetip DEED 0 abe, tes the strength and, ; is the rene direction of . closeness of variables are . for e ‘The direction is determined by whether one + ADS cases when the other variable increases, following questions determine the importance of . or m ‘a a te of examining the statistical ip between tWO OF more variables and accordingly 5 ewer these questions: accordingly requires the statistical the points to plotted on a graph. A iting the relationship. wiable generally increases or ee ro answer thes ice as two or more variables? If yes, what is the form and a ih ip strong or significant enough to be useful to an ee ie peru at ndepenicr as ea eed @ reata desirable -The first two questions will be answered in this chapter, whi aoe wa seanswered in next chapter. is chapter, while the third question will For correlation analysis, the data on values of two variables ne fr ing sa pairs, one for each ofthe two variables, hes must come from sampling 152_SIONAGANGE OF MEASURING CORRELATION ‘The objective of any scientific research is to establish relationships between two oF more sas of observations or variables to artive at some valid conclusion. Few advantages of tnessuring an association (or correlation) between two or more variables are as under: 1. Correlation analysis contributes to the understanding of economic behaviour, aids Coefficient of in locating the critically important variables on which others depend, may r tothe economist the connections by which disturbances spread and suggest to him measure of the degree the paths through which stabilizing forces may become effective fof association between nEWA. Neiswanger §%0 variables. 2. The effect of correlation is to reduce the range of uncertainty of our prediction. ‘The prediction based on correlation analysis will be more reliable and near co reality. — Tippett 3, Ineconomic theory, an association (or correlation) between two or more variables, such as price, supply and quantity demanded; customers retention is related to convenitnee, amenities and service standards; yield of a crop is related to quantity Gferilizer applied, type of soil, quality of seeds, rainfall and so on i established, 4. In healtheare, an association (or correlation) between two or more variables such as validity and reliability of clinical measures; effect on health due to certain biological or environmental factors, blood pressure and age of person: inter observer reliability for two doctors who are asessing a patient's disease, and so on is established. || Correlation: A statistical 13.3. CORRELATION AND GAUSATION Correlation is one the three criteria for establishing a causal relationship between two oF more variables, While correlation coefficient only measures the strength of a linear relationship but it does not necessarily imply a causal relationship. ‘The following factors should be examined to interpret the nature and extent of relationship between two oF more variables: ) Chance Coincidence: ‘The inferences drawn from the value of correlation coefficient may not be of any statistical significance because variables might be[454 cwprersa entirely different and unrelated. Any association between them may be a chance. For example, (i) a positive correlation between growth in pero) and wheat production in the country has no statistical significance. aay ulstion correlation in sales revenue and expenditure on advertisements over 4 (he of time should be statistically significant and not just due to biased samy h et sampling error. Pg on Influence of Third Variable: Clinically, it has been proved that smoking ¢ lung damage. However, there are often multiple reasons such as stresy’ cau of food and air pollution, of health problems. Similarly, the yild of rice git ®? is positively correlated because both the crops are influenced by the amon rainfall, But the yield of any one is not influenced by other. tof Mutual Influence: Although two variables might be highly correlated, si $ to which variable is influencing the other. For example, varity supply, and demand of a commodity are mutually correlated. As price ofa commodity increases, its demand decreases, so price influences the deman level. But when demand of a commodity increases, its price also increases na demand influences the p' 13.4 TYPES OF CORRELATIONS ‘There are three broad types of correlations: (i) Positive and negative (ii) Linear and non-linear (iii) Simple, partial and multiple In this chapter, we will discuss simple linear positive or negative correlation analysis, 13.4.1 Positive and Negative Correlations ‘The positive (or direct) correlation refers to their values change (ic., increasing or decreasing) in the (or inverse) correlation refers to an association between two change (ic. increasing or decreasing) in the opposite direc jon between two variables where me direction. The negative iables where their values Illustration Positive Correlation ox: 5 8 10 15 17 Increasing > y : 10 12 16 18 20 Decreasing 2x: 17 15 10 8 5 Decreasing >y : 20 18 16 12 10 Negative Correlation Increasing x: 5 8 10 15 17 Decreasing >y : 20 18 16 12 10 Decreasing +x : 17 15 12 10 6 Increasing >y > 2 7 9 13 14 Remarks: The change (increasing or decreasing) in values of both the variables may net be proportional or fixed. 13.4.2 Linear and Noninear Correlations A linear correlation refers to an association between two variables where variation values is either proportional or fixed. The following pattern of variation in the values ‘wo variables x and y reveals linear correlation. theiro rr a | 0 1399 «170150 sen r a 60 600 a girs of values of x and y are weer vest ‘hot be a straight line, free Pe wu ooo partial and Multiple Correlations petween simple, partial and am a er eraies involved in the correation analy poe ANG variables are chosen to sidy correlation between them, then sx if aely 00 (ered wo a simple correlation. A stud the yd of rerop nah cespet rt i erie wed oF sales evene wth expe tn aot ot esney peas ceo pet ate af examples of simple correlation " a ‘al correlation, wo variables are chosen to study the correlation between them a of aber influencing variables is kept constant. For example (i) yield of a ‘ed by the amount of fertilizer applied, whereas effect of other influencing as rainfall quality of seed type of oi and pesticides i bept constant, and sao tro a pric influenced by the lve of advertising experdture, whereas se feaerinflvencing vatiables such as quality of the prec, price campers eration and so on is kept constant, nite correlation, more than two variables are chosen to study the « soon nem For example, (i) employer-employee relationship in any organiza ied with reference to, training and development faites; medical, be ae rermaton to children facts; salary structures grievances handling system Mt a) aks revenue from a product may be examined in relation with the level of sheng expenditure, quality of the product, price, competitors, distribution and so on raph paper, the line joining curyy linear le correlations is based upon the 13§ METHODS OF CORRELATION ANALYSIS Ne corelation between two ratio-scaled (numeric) variables is represented by the letter, ‘vlad: ales on values between -1 and +1 only and is referred to as “Pearson product ¢ correlation coefficient is a relative eement correction’ or correlation coefficient. Thi casurement sale fee) number and so its interpretation is independent of the units of me: so varables, say x and 9. Jn ts chapter, the following methods of calculating a correlation coefficient between Se vanables x and y are discussed: * Sater Diagram method * Karl Pearson's Coefficient of Correlation method : Spearman's Rank Correlation method Method of Least-squares i hte ; is re ed n't 13.1 shows how the strength of the associauion between two variables is represent * coe ficient of correlation. CORRELATION ANALYSIS ce[456 cmaprenss Figure 13.4 Interpretation of Corelation Cootcent Seatter Diagram: A graph of pairs of values ‘of two variables that is plotted to indicate a visual display of the pattern of their relationship. ‘Moderate posit correlation No correlation Perfect negative Moderate negative ‘correlation ‘correlation 18.5.1 Scatter Diagram Method atavglance method to understand an apparen, he scatter diagram method is an Piss eel Ng nn ag anh na gray 1 by plotting pairs of values of les, sandy, taking values of varia aan re ey of sarable.yon the axis The horizontal and vertical aes og Sealed in units corresponding to the variables x andy, respectively. A straight fine dan through these pair of values describes different types of Felationships between the iyq variables. . “gqure 132 shows examples of different types of relationships based on pairs of Tincar relationships since the patterns are described by straight lines. The pattern in Fig 15.2{a) shows a postive relationship since the value of y tends to increase as the value of x increases, whereas pattern in Fig, 13.2(b) shows a negative relationship since the value of tends to decrease as the value of x increases. . “The pattern shown in Fig, 18.2(¢ illustrates very low or no relationship between the values of x and y, whereas Fig. 13.2(d) represents a curvilinear relationship since it is described by a curve rather than a straight line. The wider scattering indicates that there is a lower degree of association between the two variables x and y than there is in Fig. 13.2. Interpretation of Correlation Coefficients While interpreting correlation coefficient x the following points should be taken into account + A low positive or negative value of correlation coefficient, r, indicates that the traight line. A non-linear relationship may relationship is poorly described by also exist. ‘+ Acorrelation is an observed association and does not indicate any cause-and-effect relationship. ‘Types of Correlation Coofficients ‘Table 13.1 shows several types of correlation coefficients used in statistics along with the conditions of their use. All of them are appropriate for quantifying linear relationship between two variables x and y. Table 13.1: Types of Corelaton Coefficients Coefficient Gonditions Applied for Use = 6 (phiy Both x and y variables are measured on a nominal scale © p (rho) Both x and y variables are measured on, or changed to, ordinal scales (rank data) wr Both x and y variables are measured on an interval or ratio scale scales (numeric data) ——____seles (numeric dara) ‘The correlation coefficient, denoted by n(eta), is used for quantifying non-line relationships (It is beyond the scope of this text). In th chapter, methods of calcul:ation coefficients, | corte T and Spe, | Zs (9 Potive Linear Relaontin (©) No Relationship (8) Non-Linear Relationship" fates ofthe Corelation Coefficient Thefollowing are the common features among all correlation coefficient: {) The value of correlation coefficient, x depends on the slope of the line passing through the data points and the scattering of the pair of values of variables > 4 about this line, The sign of the correlation coefficient indicates the direction of the relationship. The positive correlation denoted by + (posi sign) indi «the direction of increase (or decrease) in the value of two variables is same, While negative correlation denoted by ~ (minus sign) indicates that direction of increase (or decrease) in the value of two variables is opposite. {Gi) The values of the correlation coefficient range from + 1 to~ 1 regardless of the units of measurements of x and Phat is, correlation coefficient is independent of the unit of measurement. (©) The value of correlation coefficient r = +1 or-I indicates perfect linear association (relationship) between two variables, x and y. A perfect correlation implies that every observed pair of values of x and y falls on the straight line, . ©) The value of correlation coefficient indicates the strength of association (elationship) between two variables, ic.,a closeness ofthe observed pair of values of and to the straight line. The sign of the correlation coe! Strength of the linear relationship. 7 F ‘) Theralue of correlation coefficient remains unchanged when a constant ae subtracted from every pair of values of variables x and y (abo referred 2 a oforigin), also when a pair of values of variables x and y are divides byaconstant (also referred to as change of scale). and pure number ient indicates the[458 carers + Figure 13.3 Scatter Diagram (sii) ‘The value of co the data points is horizontal, am two variables x and y. il therefore no assoc (ii) The square, 72, of correlation coefficient, ¥ val determination. Example 13.1: Given the following data: Student 1 2 4.5 6 7 8B 9 yw wepaude 400 675 475 350 425 G00 550 325 675 459 aptitude score Grade pointaverage : 18 38 28 17 28 3.1 26 19 52 95 fa). Draw this data on a graph paper. ; w Isthere any correlation between per capita ri jonal income and per capita consume, expenditure? If yes, what is your opini i is iate ind y axes, the pairs of obsery Solution: By taking an appropriate scale on the x and y h vations are plotted areal paper as show! in Fig. 13.3. The scatter eee in Fig. 13.3 with stratgha line represents the relationship between x and y “ited through it Grade Point average c 40 30 20 10 300 400 500 60700 800, * Management Aptitude Score bles are very close to a straight line passing a high degree of association between two tes a high degree of linear positive nce pairs of values of tt mn, therefore it appears that ther The pattern of dotted points also indica 13.5.2 Karl Pearson's Correlation Coefficient antitatively measures the degree of association ues of x and 3, Karl Pearson’s correlation coefficient q) ationship) between wo variables x and y. For a set of n pairs of son's correlation coefficient, r, is given by Covariance (x,y) _ Cov(s, y) where Covle,) = 1 S(e-%y-5) n = EES standard a a ease © standard deviation of sample data on variable Substituting values of Cov(r, ), ¢, and o,, we have ation of sample data on variable*CORRELATIONANALYSIS 459 n i EQ Ex) Ey) -ExF Inne UD are in fra oan Method fr Ungrouped Data se al mean values of variables x and y 11 3C0 gn coefficient can be simplified by « cote alues from their assumed means Aang pecan oe » then calculation of Pearsons 8 deviations, d= x-Aandd, = 9 Fespectively. he formulaieengow ne r= Edad, ~ Ed) a) z neds Ed nbd cay (13.2) ap bvation Method for Grouped Data yes ofvariablesx and y values are classified i ipa) modified as Med intoa frequency distribution, then formula arti =F fa.) & fa) ie fs pa? ne eae (13-3) {ssumptions for Using Pearson's Correlation Coefficient 1, Pearson's correlation coefficient is used only when both vari onan interval or a ratio scale. ly th vat iables x andy are measured 2, Pearson's correlation coefficient is used only oe Lea sed only when two variables x and y are linearly Advantages and Disadvantages of Poarson’s Correlation Coefficient ‘The numerical value of correlation coefficient between as well as the direction (positive or negative) of associ limitations of Pearson's method are as follows: 1 and 1 indicates the strength jon between two variables. Few 1. Pearson's correlation coefficient is used only when two variables x and y are linearly related. 2. The value of the coefficient is unduly affected by the extreme values of two variable values. 3. Comparatively, the computational time required to calculate the value of Pearson's correlation coefficient, r, is lengthy. 43.5.3 Probable Error and Standard Error of Coefficient of Correlation ‘The probable error (PE) of Pearson's correlation cocfficient,r, indicates the extent .o which its value depends on the condition of random sampling. If isthe value of correlation coefficient in a sample of n pairs of observations, then its standard error SE, is given by ‘The probable error of the coefficient of correlation is calculated as follows: a = S| 1.6745, PE, = 0.6745 SE, = 0.6745 —-— ‘The amount of Pe, is helpful to determine the range, P, = 7 * Pew within which population coefficient of correlation is expected to fall where 1p,(rho) represents population coefficient of correlation.\e proportion of m in the the ¥: dependent variable independent variable, Figure 13.4 Interpretation of Coefficient of Determination ue of ris not significant, ie. there is no relationship betwee, 2 Mfr > 6PE, then bles. i.e. there exists a relationships be iP between ten lustration: Ifr = 0.8 and n = 25, then PE, becomes =a? poras 110.80 0.36, 5 = 0.008 0.6745, slation correlation coefficient (p,) should ft jy 18 * 0.018 or 0.752 Sp, 50.848, 13.5.4 Coefficient of Determination “The coefficient of determination, 1 always hasa value between O and I. While suai the value of correlation coefficient, the information about the strength ofthe relation is retained but the information about the direction is lost. The value of crfiins determination represents the proportion (or percentage) of the total variability im the depen triable, that t explained by the independent earabe, x. The proportion (or percentage) o ‘ariation in that x can explain determines precisely the extent or strength of asosion between two variables andy (See Chapter [4 for detail). According to Tattle, the confcient of comelaton, x hasbeen grassy overated ard is wed xin too much. Tis square, confcient of determination 7, is a much more wsefil measure of eins, Cavariance of two variables. The reader should develop the habit of sparing exer corn teoffitien he finds cited or sated before coming to ans conclusion about the extent of he ny relationship between to corelated variables. Interpretation of Coefficient of Determination “The knowledge of coefficient of determination is helpful in interpreting the strength of ation in terms of percentage between two variables. Figure 13.4 illustrates proportion ,c) of explained variation in the value of dependent variable, . (percent 9 variation in values of x. That is, there ism # Ifr? = 0, then no variation in y du association between xand y «© Iff= 1, then entire variation in y is due to variation in values of x. That is thereis perfect association between x and y, « 1f0<12¢ 1, then degree of variation in y due to variation in values of x dependson the value of 2, Value of 2 close to zero shows low proportion of variation in due to variation in values of x. On the other hand, value ofr? close to one shows tht the entire variation in y is due to variation in values of x. 00 1.00 Moderate Strength of association betw Proportion (percentage) of explained variation in y Mathematically, the coefficient of determination is determined as Explained variability in y Total variability in y By- pt Ly _ nest aaty tee “r0-7 | nby-G) where 5 =a + bvis the estimated value of y for given values of x. || ter correlation between variable x (height) and w | Zeamles rent of determination 12 = gag er yn i Fo" en tHE ON tions (changes) in value of variable een c a able (neigh t Ae OF ng 1 per cent of the variations may Yeas, ® aU 1 variable } Mea fatty foods. a er factors, say ‘rant «© know that the ‘variability’ refers to the dispersion of rund its mean value. The greater the correlation coelitient the woe le jetermination, and the variability i the greater the in depend . Spendent variable. lependent variable can be accounted CORRELATIONALYSSS 464 | iable y (weight) be . - The following table gives indices of industri ste The of industrial producti umber wa epee nH Cate a ofthe cotelatoncotiiene on 1991 1992 1993 Tejex of Production : 100 102 104 impor unemployed: 151213 1994 1995 1996 1997 1998 107 10511210399 Moo eg Calculations of Karl Pearson's correlation coefficient are shown below: Gor Production x= (8-3) a? Unemployed hag=p - " ead, Foor 100 4 «6 15 0 oe 102 oon ae oe -3 1998104 0 o 13 “2 199107 4g ay v4 1995105 +1 1 oR -3 1996112 +800 to -3 1997103 -1 1 19 +4 199899 3 (6 + =35 Total 832 0120120 Ex 2882 © hog, and 5 nEdyl, ~ 2d, (2d) 8x-92 © JB 120 f8 «184 — = 0.619 148.580 i = -0.619 is moderately negative, it Interpretation: Since coefficient of correlation it Facets a tide that as the production index increases, the number of unemployed 1 decreases and vice versa. : owing table gives the distribution of items of production andl also Fe ets 018 ‘them, according to size groups. Find the correlation coefficient between size and defect in quality. Si ds : W-16 16-17-1718 B19 19-20 20-21 SS iN goo 270 40860400, 300 Jo. of i ive items @)) i505 162, 170) 18018014 No, of defeive ies ©) [Delhi Univ, B.Com, 2007} sour Let group size be denoted by variable x and number of defective items by we * alcatations for Karl Pearson's correlation coefficient are shown below:je EE Pewent of d= 3-30 Difedive Noms Size Group 4% 1 60 050 150 A 4 45 th 3 9 8 f et Substituting values in the formula of Karl Pearson's correlation coefficient r, we haye i n¥dgd, ~ (Ed) (Edy) ; re nEd? ~(Ed,)* nEdy - (dy 6 x-106= 3418 036-54 [gx 19 ~(3 Yo 894 — (18) Vios J5010 Interpretation: Since value of r is negative, and is close to =1, association (relationship) between x(size group) and y(percent of defective items) is moderate and negative. Hence, it may be concluded that when size of group increases, the number of defective items decreases and versa. gs Example 13.4: The following data relate toage of employees and the number of days they reported sick in a month. Employes: 1 2 3 4 5 6 7 8 9 10 , Age 30 3203540 ABO ZH ST Sick days: 1 0 2 5 2 4 6 5 Calculate Karl Pearson's coefficient of correlation and interpret it. [Kashmir Univ, B.Com, 2005) 5 ily, respectively \ge_and sick days be represented by variables x Juc of correlation coefficient are shown below: ee Age Sick days x xe y d=y-5 a dul, 30 -16 1 3 9 48 32
e it nEded, - (Edy) (Zd,) ne $ r= = rnd ca? re Gay fOxi093 Jidxe 230 = 230 = 0.87 264.363 u yalue of ris positive, therefore age of em related toa high degree. Hence, we maj 463 = iployees and number of sick ce veclated t e y conclude th a ey pe is ikely (0 go on sick leave more often than otha 8 3 oye . et ce following table shows the frequency, according tothe marks, obtained Of ge in an intelligence test. Measure the degree of relationship between age and sj st Sis Age in years Tal in years Taal qaMois 18190 fat Marks TE 200 - 250 4 4 2 1 u g50-300 8 5 4 2 4 00 ~ 350 2 6 8 5 sso-400 EA 6 Total 10 19 20 18 67 [Allahabad Univ., B.Com., 2007) Let age of students and marks obtained by them be represented by variables respectively. Calculations for correlation coefficient for this bivariate data are sao ‘how below: ‘Age in years x| 18 19 20 21 a\ -1 0 1 2 Total, f | fay fay fad, J 7 0 .
6 6 | 14 instead ot 8 | a2 6 8 nian the correct value of correlation coefficient between x and y (MD Unis, M.Com., 2006; Kimazom Univ, MBA, 2007] von: The corrected values of variables required for the formula of Pearson’ solution: The tern ave determined as follows: the la of Pearson's correlation coe { Correct Ex = 125- (6 + 8-8-6) = 125 | Correct By = 100-(14 + 6- 12-8) = 100 Correct Ex? = 650 — {(6)® + (8)* - (8)* - (6)2} 650 ~ (36 + 64-64-96) = 650 Correct Ey? = 460 - {(14)? + (6)? - (12)? - (8)} 460 - (196 + 36 ~ 144-64} = 436 Correct xy = 508 - {(6x 14) + (8x6) - (8X12) ~ (6 x 8)} 508 ~ {84 - 48-96 ~ 48} = 520 Applying the formula n&xy ~ (Ex) (Ey) - 25 x 520-125 100 rex (Ex)? Yndy— Ey (25% 650 -(125)* y25% 436 = (100)? = 13,000 - 12,500 _ a = 0.667 Je25 900 25x 30 Thus, the correct value of correlation coefficient between x and y is 0.667. Self-practice Problems 13A 18.2 Find the correlation coefficient by Karl Pearson's method between x and y and interpret its valu neha y2B7 42 40 33 42 45 42 44 40 56 44 43 181 Making use of the data summarized below, calculate the coefficient of correlation. Ge ae 5 1 10 60 30 41 29 97 27 19 18 19 31 29 ae ® E 124g Gajeatate the coefficient of correlation from the B 6 4 ro ob OB following data: ; - c 6 G AL 8 x: 100 200 300 400 500 600 700 yr 94H 9 4 Fi (30 50 60-80 100 110 130 > 9CHAPTER 13 [466 13.4. Calculate the coefficient of correlation between x and 9 from the following data and calculate the probable ‘errors, Assume 69 and 112 as the mean value for andy, respectively. x: 78 89 99 60 50 79 68 61 y + 125 137 156 112 107 136 123 108 Find the coefficient of correlation from the following data Cost : 39 G5 62 90 82 75 25 9B 36 78 Sales: 47 53 58 86 62 G8 60 91 51 84 {Madras Uni, B.Com, 2005] Calculate Karl Pearson's coefficient of correlation between age and playing habits from the data below. Also calculate the probable error and comment on the valu Age 20 21 22 23 24 2 No. of students : 500 400 300 240 200 160 Regular players: 400 300 180 96 60 24 (HP Univ, MBA, 2005] Find the coefficient of correlation between age and the sum assured (in 1000 %) from the following table: 13.85 13.6 13.7 ‘Age Group Sum Assured (a) Co 10 20 30 40 50 20-80 46 3 7 1 30-40 2 8 b 7 1 40-50 3 9 2 6 2 50-60 8 4 2 = = [Delhi Univ, MBA, 2007) [Hints and Answers 13.2 r= -0.554 133 r= 0.997 134 r= 0.014 13.5 r= 0.780 13.6 r= 0.005 13.7 r= -0.256 13.8 Family income and its percentage spent o the eave of one hundred farnilies pre the ne in bivariate frequency distribution. Cateulye™* coefficient of correlation and interpret its value, Food ‘Monthly Family Income @—~ Expenditure 2000- 3000- 4000-5000- {in percent) 3000 4000 3000 6000 wi —- — — 8 20 —- 4 #9 4 45 2 7 6 WB & g-30 3 10 19 8 _ ee (Dathi Unie, Mi, With the following data in 6 cities, calculate Pearson, coefficient of correlation between the density fe population and death rate: 13.9 City Area in Pofrilation No, A 150 30 300 B 180 90 1440 c 100 40 560 D 60 42 0 E 120 72 1224 [Subhadia Uni, B.Com. 206] 13.10 The coefficient of correlation between two variables: and y is 0.3. The covariance is 9. The variance of xis 16. Find the standard deviation of y series 13.8 r= -0438 13.9 r= 0.988 13.10Given 6, = VIG Conley) 2,0, By or or 15. 43.5.5 Spearman's Rank Correlation Coofficient In 1904, a British psychologist Charles Edward Spearman developed a method to ionship) between two variables, say x and y, when only lable, This implies that Spearman’s rank correlation mn where quantitative measure of qualitative factors such s, beauty, intelligence, honesty, efficiency, TV pros™ . colour and taste cannot be fixed but individual observations can be arran ical association (rel (or rank) data are avi method is applied in a situa judgment, brands personali leadershi in a definite order (or rank). The ranking is done by u: measure ordinal ent mime, ig a set of ordinal rank num with 1 for the individual observation ranked first; 2 for the individual observation r="Spearman's rank . (3-4) crank correlation coefficients Ry i rank of oye nk of observations with "vations with pete Ris the ral ic th respect tone Tespect to first ple: Pe air of ranks; and n is the e1 Second variable; d = Ry — mec na PY ‘© number of paired observations op inti is 2 per ‘6° in the formula as scaling « ‘This method is easy se ais ust is method is useful for correlation analysis when vari i a Tis cho i lysis when vatiables are expressed in ‘This method is developed to measure emeen two variables, say.xand y, when to understand and its application is simpler than Pearson's @ the statistical association (relationship) Only ordinal (or rank) data are available. ages { Values ofboth the variables are assumed to be h the v normally distributed and describi alinear relationship rather than non-linear relationship. ne (yA lage computational time is required when pairs of values of two variables exceed 30. ii) This method cannot be applied on grouped data to measure the association between two variables. tase When Ranks Are Given observations in a data set are already arranged in a particular order (rank), then take the diferencesin pairs of observations to determine the difference, d. Square these differ% and obtain the total. Apply the formula to calculate Spearman's correlation coeflicie: Example 13.8: The coefficient of rank correlation between debenture prices and share prcsis found to be 0.143. Ifthe sum of the squares of the differences in ranks is given to 448, then find the values of n. \ces Satin: Apply the formula of Spearman's corr ion coefficient: st R a 6x noe =) Given R = 0.143, S42 = 48 and n=7. Substituting values in the formula, we get 6x48 288 eee 0.143 cop 0.143(u3 =n) = (aS—n) = 288 n3—n—336 =0 or (n-7)(n? + 7n + 48)=0 This implies that either n =7 = 0, that is,» = 7 or n® + Jn + 48 = 0. But x? + 7m + * on simplificati ir fn be ¢ plification gives undesirable value of n because "tave. Hence, » =7, : is discriminant b? = 4ac is(CHAPTER 13, “The ranks of 15 tudents in two subjects Aand B are given below. The aaa within brackets denote the ranks of a student in A and B subjects, respectively” 4), 6 (3 BH 1, 10), (27 (2% (46, G4 ; 1 & i fa 15), (11, 9, (12, 5)» (13, 14), (14, 12), (15, 13) 1d Spearman's rank correlation coefficient. (Sulla Uni 4, 2095 ect to their performance in two subjects ar are shown below: Camere dion: Since ranks of students with resp Solut Sc ven, calculations for rank correlation coefficient Rank in A ‘Rank inB Difference @ R Ry d=R,-Ry ms 1 10 -9 81 2 7 -5 25 3 2 1 1 4 6 -2 4 5 4 1 1 6 8 -2 4 7 3 4 16 8 1 7 49 9 u -2 4 10 15 5 25 a 9 2 4 12 5 7 49 13 “4 -1 1 4 12 2 4 15 13 2 4 Xd? = 272 ele eee 6x 272 Ay the fo la, R -_—— ply the formal 15{(15)" -1) = 1 - 1632 = 1 ~ 0.4857 = 0.5143 3360 correlation between performances of students ‘The result shows a moderate po: in two subjects. Example 13.10: There are 12 clerks working in a office. The long-serving clerks feel that they should get seniority increment based on length of service built into their salary structure, Based on assessment of their efficiency by the HR department a ranking of ‘efficiency was developed. The ranking of efficiency together with a ranking of their length of service is as follo’ Ranking according to length of service : 1 Ranking according to efficiency 12 Do the data support the clerks’ claim for seniority increment? {[Suthadia Uni, MBA, 20001 2 3 4 5 6 7 8 9 10 3 5 1 9 10 Il 1 8 7 6 4 t are Solution: Since ranks are already given, calculations for rank correlation coef shown below:Rank According a Di i ene 8 Efficiency ay z Ry ~R, ye a 2. ' = — 2 3 -1 1 1 3 i -2 4 4 8 5 2) “4 9 6 10 a 16 7 a m4 16 8 12 a4 16 9 16 10 n 2 .¢ result shows a low degree positive correlation between length of service and gency, the aim ofthe clerks for a seniority increment based on length of service ma abe usilied. y ple 13.11: Ten competitors in a beauty contest are ranked by three judges in the falowing order? y joel: 1 6 5 SBT eS ne judge: 64 8 eaeG =O 9 8 1 2 8 W 5 7 Use the rank correlation coefficient to determine which pair of judges has the nearest approach for judgment of beauty [MD Univ, MBA, 2004] Solution: The pair of judges who have the nearest approach for judgment of beauty can deobtained in °C, = 3 ways as follows: (@ Judge 1 and judge 2. (ii) Judge 2 and judge 3. (ii) Judge 8 and judge 1. Caleulations afier comparing the ranking of judges are shown below: Judge 1 Judge 2 Judge 3 4? = (Ri ~ Rye dP = (Rp RP dF = (Rs- RY R Ry Rs er a ——— 1 3 6 4 9 % 6 5 4 1 1 4 5 8 9 9 1 16 10 4 8 LS ue 4 3 7 1 16 36 4 2 10 2 “ “ q 3 : fa a 81 1 1 6 5 1 1 4 8 9 7 1 4 1 200 214 G70 CHAPTER 9 “Applying the formula ord. —5%200_ .,_ 1200. Rye l-Tgt—y 1000-1) ‘990 7 ~O212 xd _,__6x214_ _y_ 1284 noe -D 10(100-1) ‘999 ~ 70-297 6Edj 6x60 _ . wit ~ ' ToUoO-H ‘999 ~ 0.636 1 Rjy = 0.686 is highest, the judges 1 and 3 have nearey Since the correlation coeff approach for judgment of beauty. aso 2: When Ranks Are Not Given jcular order (rank), then ranks are fobservations in a data set are not arranged in a parti assigned by taking either the highest value or the lowest value as rank one and s0 on for values of both the variables. Example 18.12: Quotations of index numbers of security prices of certain joint stock company are given below: Year _Debenture Price Share Price 1 97.8 73.2 2 99.2 85.8 3 98.8 78.9 4 98.5 75.8 5 98.4 712 6 96.7 87.2 ai 97.1 83.8 Use rank correlation method to determine the relationship between debenture prices [Calicut Uni, B.Com, 2005) and share prices. Solution: Let us start ranking from the lowest value for both the variables as shown below: Rank Difference @? = (Ry - RP Debenture ‘Rank Share Price Price (8) 0) Ry - Rp 78 3 732 1 2 99.2 a 85.8 6 1 98.8 6 78.9 4 2 98.3 4 75.8 2 2 98.4 5 772 3 2 96.7 1 87.2 7 -6 97.1 2 83.8 5 -3 Applying the formula, The result shows a low degree of negative correlation between the debenture pric and share prices of a certain joint stock company.‘economist wanted to find 0 CORRELATION 73.18 AM erent rate ina country and its hee? there is any rela wees 471 | nae cig it inflation ane 92, SRY Featonship + sie S09) moore fom 7 countries woe Country Srenbl iia ate (Per cen tion Rate ce A 4.0 B 85 3 c 55 7 D 08 fy E 13 101 F 5B 78 ee eee hue degree of linear association between i eat ‘unemployment rate in a country and wit Bw: Ranking fom the lowest value for ba the variables as shown bow: Gaenplynent Rank — Inflation Rank Difference z ‘ees ROR) Ryd at, @ = (Ry RP Rae 40 3 32 1 2 ; 85 7 82 5 3 ‘ 53 4 O4 6 2 a 08 1 Bl 3 2 4 13 6 10.1 7 “1 7 58 5 78 4 1 1 cay 2 47 2 0 0 Be = 18 Applying the formula, = 18x18 2 118 a os (7-7) 336, ‘The result shows a moderately high degree of positive correlation between unemployment rate and inflation rate of seven countries. tase 3: When Ranks Are Equal Ifmore than one observations of equal size are found at the time of ranking ‘observations | inthe data set by taking either the highest value or lowest value as rank one, then rank tobe assigned to individual observations is an average of the ranks that these individual abservations deserved. For example, if two observations are ranked equal at third place, then the average rank of (3 + 4)/2 = 3.5 is assigned to these two observations, Similarly, if three observations are ranked equal at third place, then the average yank of (3 +4 + 53 isassigned to these three observations. . The modified Spearman rank correlation coefficient formula for such a case is given ~ ole? +5 (nt -m)* (nm) + 2 nee) / where m{i'= 1, 2,3, .) stands for the number of imes an observation © repeated in the R tata set for both variables.PIERS 1d out whether inventory tu A financial analyst wanted 0 fin = De cet ving er sare Ginper cent). Arandom sample of7 company influences any comange wa sled andthe following data was recoded foreach, sted in a soc ir Earnings per Invent Earnings per coapey Stumm Share (re) (Number of Times) u x 4 B . : é 7 13 D . : + 6 13 F 2 : G 5 8 —————_————— Find the strength of association between inventory turnover and earnings per share, Interpret this finding. i bles. Since observations of equal Solution: Ranking from lowest value for both the variat of eq size are found at the time of ranking in the data set, therefore rank to be assigned to repeat observations isan average of the ranks that these individual observations deserved as shown below. Tnventory Rank ‘Earnings ‘Rank — Difference d? = (R,~R,)? Furnooer(s) Ry" PerShareG) Ryd = RyRy 4 2 u 5 -3.0 9.00 5 35 9 4 05 0.35 7 6 13 65 05 0.35 8 7 7 1 60 36.00 6 5 13 65-15 2.25 3 1 8 25 0-15 2.35 5 35 8 25 10 1.00 Bd? = 51 I may be noted that a value 5 of variable xis repeated twice (m, = 2) and values 8 and 13 of variable y is also repeated twice, so my = 2 and my = 2. Applying the formula: 1 1 1 : feat tigen ~m)+75 (im ~ ms) + (m§ - i} n(n? =1) R= Peel eter oy 6fs1+ 4 e941 gs _ { Cig cies mele 79H _ 81514054 0.540.5) _ 336 1S - 0.9375 = 0.0625 ‘The result shows a very week positive association between inventory turnover and earning per share, Example 13.15: Obtain the rank correlation coetfi from the following pairs of observed values, x: 50 35 65 50 5560 y+ NO 10 115 125 40115 icient between the variables x and J 50 65 705 180 120 115 160 [Mangalore Unix, B.Com, 20051(CORRELATION ANALYSIS bles. Since observations of equal therefore rank to be assigned to se individual observations deserved 2 15 025 45 15 O00 we i 12.25 2 u 25.00 a 0 20.25 6 4.00 e 8 36.00 15 6 a e 4 25.00 iy a 00.00 134.00 ee i may be noted that for variable x, 50 is repeated thrice (m, = 3), 55 is repeated nce my = 2)-and 65 is repeated twice (my = 2). Also for variable y, 110 is repeated twice foe 2)and 115 thrice (ms = 3). Applying the formula: 1 1 ofeat ej ont =m nh — ma) + 2 nd =) 4 mh =m nd ma) BEM tg mt Tmt ig (mem) pg (msm) ig mt MY pg MM) ae 12 a n(n? =I) 3 Logs 16s L 3 Los _, Sbets H+ GE -I+ VE -D+G@-VM+ ge -3} 10(100=1) 6 (134 +2+0.540.540.542) _) _ 6x139.5 _ |) _ 837 7 990 “'> 990 990 = 10.845 = 0.155. The result shows a weak positive association between variables x and y. 15.5.6 Method of Least Squares The method of least squares to calculate the correlation coefficient requires the values of regression coefficients 6, and b,,, $0 that 7 = ly Xb In other words, correlation coefficient is the geometric mean of two regression coefficients (see Chapter 14 for details). 135.7 Auto Correlation Coefficient The auto correlation coefficient describes mutual dependence between values of the same variable tan ve different time periods. Thus, it provides information on how a variable relates te self for a specific time lag. The difference in the period before a cause-and- ellect relationship is established is referred to as Tead time or lag. While computing the74 CHAPTERS arration, the time gap must be considered; otherwise misleading conc be arrived at. For example, the immediately reflect on its price, it may take s "The formula for auto-correlation coelficient a nok x, (5) - 7) G44 -¥) , 3 ‘decrease of increase in supply of a commodiyy "3 ™4y ike some lead time or time lag. 'Y may not time lag kis stated as: where kis length of time lag; n is the number of observations; and ¥ is the mean of ay observations. Example 13.16: The monthly sales ofa product, in thousands of units, inthe lst 6 mon are given below: s Month: 1 2 3 « 5 6 3.1 3.0 4.2 34 coefficient up to lag 2. What conclusion can be deri rrend in the data? mae Sales : 18 25 Compute the auto-correlation from these values regarding the presence of @ t Solution: ‘The calculations for auto-correlation coefficient are shown below: Time Sales, = One Time Lag Xy = Two Time Lags @) Variable Constructed Variable Constructed From x From x 1 1.8 25 3.1 2 25 3.1 3.0 3 3.1 3.0 42 4 3.0 42 34 5 42 34 = ° 34 = 6 ss ee Fork= 1,8 = LB +25 +0 +34)= {(1.8-3)2.5- 3) + 2.58) (8.1 -3)+(3.1-3) (3-3) ne 1 8=942-9+42-984-9) (8-3 + 25-3) +(B.1-3y + (3-3 + (42-3 + 8.4-3" = 1.2) (0.5) + (-0.5) (0.1) + (0.1) (0) + (0) 1.2) + 0.2) (04) 144+ 0.25 + 0.014 0+ 144+ 0.16 — (0.6-0.5+0.48) _ = eon 312 (1.8 3) (3.1 - 3) + (2.5 - 3) (3- 3) +(3.1- 3)(4.2- 3)+8- 9B4-9) (8-37 + 25-3) +..+ (84-3 (1.20.1) + (-0.5 x0) + (0.11.2) + (00.4) _ -0.12+0.12 _ 9 33 33s danse the value of is positive itimplies that there isa seasonal pattern of G months = O implies that there is no significant change in sales.gel y jent of rank correlation of the marks st, Of py 10 students in statistics and account it pate Pg be 0:2. It was later discovered thatthe cos funy ranks in two subjects obtained by one of aifererus was wrongly taken as 9 instead of 7. Find {he sder coefficient of rank correlation, he (Delhi Unia, B.Com, 2004) ranking of 10 students in accordance with their ss janet Oo subjects A and B are as follows: -practice Problems 13B CORRELATONAUALYSSS 475 | + 13.16 An in ing data with 3.11 sea gitor collected the following. data wit espect to the socio-economic status and severity of rs Ae scio-economic status and severity of Socio-economic status (rank) Severity of illness rank) 67235418 534371 cep 8 wk 4 OT Ba 2 ey ese mOMnEscEnOMtriustea Calculate the rank correlation coefficient and comment on its value. ae the rank correlation coefficient and 13-17 You are given the following data of marks obtained fomment on its value by 11 students in statistics in two tests, one before and spetae Spearman's coeffident of corration other after special coaching: heen marks assigned to ten students by judges x Fa Tea Send Tet {na certain competitive test as shown below: we coathi i andy in a certain compet Stas shown below: (Before coaching) (After coaching) Grndent_ Marks by Judge x Marks by Judge y 23 um 1 52 6 ” os 19 2 2 58 68 21 1B 4 42 43 18 20 4 60 38 20 22 5 45 7 ue ES 6 41 48 0 a 7 37 35 7 20 8 38 30 3 23 9 25 25 16 20 10 27 50 3 a oS 18.4 An examination of eight applicants for a clerical post was taken by a firm. From the marks obtained by the applicants in the accountancy and statistics papers, ‘compute the rank correlation coellicient, apes ie ‘Applicant A BODE FGH Marks in accountancy: 15 20 28 12 40 60 20 80 Marks in statistics; 40 30 50 30 20 10 30 60 statistics +40 305080 18.15Seven methods of imparting business education were ranked by the MBA students of two universities as follows: ree Weed feng 1 2 3 4 5 ST Do the marks indicate that the special coaching has benefited the students? (Delhi Uni, M.Com, 2000] 13.18 Two departmental managers ranked a few trainees according to their perceived abilities. The ranking are given below: Traine =: AB CDE 1 Manager A: 19) 1 Manager B: 310 ° x 62 81 0 9 Calculate an appropriate correlation coefficient measure the consistency in the ranking, 18.19 In an office some keyboard operators, who were already ranked on their speed, were also ranked oon accuracy by their supervisor. The results were as follows: 8 Rank by swdents gg 4 7 6 Opraor :ABCDEFGHI J of Univ. A : Speed ae omay ech ater 000 Rank by students : 32475 6 Acuray :7 9 3 4 1 6 8 2 10 5 Gaatae the rank correlation coefficient and ‘comment on its value- Calculate the appropriate correlation coefficient between speed and accuracy.{478_oueren ny 13.20 The personnel departmentisimerested in comparing leant the tangs of fob applicants when menered by nH variety of standard test. The ratings of 8 applicants Interview views and standard. pychological test are Standard test show below: mele Caleulate Spearman's rank correlation coeffi comment ont vale. scien Hints and Answers 1S. Given R= 02. = 10; R= 1- 24 or fed? (n?-m)+ ty (mt-m)h nin?) 1314R ao 2 nF ie OF dor Ed? = 100 02= =o 82100 0,394 8OI=D Correct value of R= 1 = 1099 13.158 = 0.50 13.16 R= 0477 13.42R = 1 8435 2 o7s2 13.178 = 0.71 13.18 R= 0.812 Loe 13.19 R = 0.006 13.20 R= 0.817 0276 _ 4.539 ISSR = 1- Toa00- 13.6 HYPOTHESIS TESTING FOR CORRELATION COEFFICIENT ator to test whether the possible exits In other hypothesis ion coelficient is often used as an e between two random variables in the populat words, simple correlation coefficient, i used tor for testing a about true population correlation corfficient (Greek tetier tho) with the assumption that two ly distributed. random va 13.6.1 Hypothesis Testing About Population Correlation Coefficient (Small Sample) two variables x andy exis 1. The test of null hypothesis whether there is ne whether there exist any icant correlation between hypothesis that the value of the population correlation coefficient, p, is equal to zero. ‘The population correlation pp. measures the degree of association between two variables in a population of ineerest. The null and alternative hypotheses are expressed as ‘Two-tailed Test Hy: p=0(0 Hy: p 0 (Correlation exists between variables x and y) The null hypothesis that a requires the knowl relationship between x o correlation between variables x and y) One-tailed Test Hy: p=0, and Hy: p>0 (orp <0) The test statistic for testing the null hypothesis is given by r rxJn-2 rPe = 5 terval where r is sample correlation coefficient; 5 is standard error of correlation coefficient and nis sample size.follows t-istribution with » — 9 syatsti zi (og tne standard error of correlation coethre se reedom: If the Aicemtisgienbys = (re spre ae nn ga value of est statistic is compared with ig sh et dom and level of significance a to arrive at x dany of (or table) value at n -9 bs " ww 5 decision as follows: WE Fpeaniled DS — a —__ Two-tailed Test - Tet + ee oF tay <~ Reject Hy itt > 4 ita 7 fant fot ee lane orwd it Hy | other accept Ho + Others sept Otherwise cept Hy je18.17: Arandom sample of 27 pairs of observation He eaton coflcient of O42. Is key tha the varables ee eopulatin iables in the population are gre ated? otter ke a null hypothesis that there is no signi og, WDethi Uni, M.Com,, 2005} saat correlation coefficients, that ignificant difference in the sample and ” Hy:p = civenn = 27, df = "2 = 25,1 = 0.42. Applying t-test statistic as follows: t= TPs e = 0.42 Fg MPln-2) Yo-(0ss}y/e7-2 = Ga0875 2512 since the calculated value of f,,) = 2.312 is more than its critical value, , = 1.708 at a= 0.05 level of significance and df = 25, the null hypothesis is rejected. Hence, it may teconcluded that there is significant difference in the sample and population correlation coefficients. Example 13.18: Is a correlation coefficient of 0.5 significant which is obtained from a random sample of 11 pairs of values from a normal population? (Maira: Uni, B.Com, 2005) Solution: Let us take the null hypothesis that the given correlation coefficient is not sufficient. Applying t-test t= ie = 05 321,732 Yaeryn-2 fa-osrai-2) 0866 where r = 0.5, = 11 ‘The calculated value of fy than the table value t, = 2.26 and hence ): yairs of observations must be included in a sample so that than 2.7: Solution: Given, r = 0.42, 1.732 at a = 0.05 level of significance and df, v = 9 is less the given correaltion coefficient is not significant. 49,79. Applying t-test statistic, we get -2 2p r etor®x 25 = Wore ret fa-r?ykn-) (n-2)_ 22-72 (0.42)? x 7-049" . (2.72 (1-049) _ 7.3984(0.8236) n= Oa 0.1764 = 0:0988 = g4.542 0.1764 a2 + 34.542 = 36.542 37 ould be of 37 pairs of observations. jze 3 Hence, the sample aa CORRELATIONS — 4CHAPTER 13 fs the stature of father and son, Example 13.20: To study the correlation between th son, a sampic 1600 Faken from the universe of fathers and son The sample study gives the correo! between the two to be 0.80. Within what limits does it hold true for the universes rd error of the correlation coeficien Solution: Since the sample size is large, the stan is given by ' one = 0.8 and n = 1600. Thus, 1-(0.8)? _ 1-064 _ 0.36 . SE, = - = 938 x 999 Standard error SE, = “FET = 10 “ 0 The limits within which the correlation coefficient should hold true is given by 1% 3SE, = 0.80 + 3(0.009) or 0.773 << 0.827 Given correlation coefficient, 13.6.2 Hypothesis Testing About Population Correlation Coefficient (Large Sample) distribution of sample correlation coefficient, ris not normal and its probability curve iy {kewed inthe neighborhood of population correlation coefficient, p = +1, even for lange sample size n, then use Fisher's z-transformation for transforming r into z as follows: Ake value of: for different values of r can be seen from the standard table given in the Appendix. Changing natural logarithm to the base ¢ to the base 10 by multiplying with the constant 2.3026 as follows: loge x = 2.3026 log gx where x is a positive integer: Thus the transformation formula becomes zs ; (2.8026) logit = 1.1513 logo Fisher's z-transformation for transforming r into z with: Mean z, and Standard de ya 1 2is approximation is useful for lange sample sizes. However, it can also be used for small sample sizes of at least n 2 10, ‘The zest statistic to test the null hypothesis Hy: p = 0 and Hy: p #0 is given by where ois the standard error of Z, Decision Rule + H[Zcal| < Table value of Zy», then accept null hypothesis Hy. + Otherwise reject Hy, 13.6.3 Hypothesis Testing About the Difference Between Two Independent Correlation Coefficients ‘The test statistic for testing a hypothesis about correlation coefficient in the single Population can be generalized to test the hypothesis of two correlation coefficients r, and 72 derived from two independent samples as follows:Ce CORReL ANON ANALYSIS oy ia Men iy 1 4-3" cy Ke = LISI Joy z ~ we HBT io and 1, eho po Fhe TE = BIS tog Lt I-y sximacely normally distributed with 2 appro th zero mean and it standard deviation, 9M shwolute value |Z call is less thaw its we H its table. value, Z, ‘ae then accept the null 4 otherwise reject Hy. esi aS 3 is 0.9? correlation OTs et Given, = 0.75,n = 30, and ‘elation: © = 0.9. Applying Fisher's, transformation, we get = 1.1513 logy, 175 1.1513 logy) 1+7 0235 = 1.1513log 61.75 — 1000.25] = 1.1513(0.24304 ~ 1.39794 ) = 0, ah ti . 3 = 0.973 ‘The distribution of zis normal around the true population correlation value p = 0.9. Thus, _ Mean, 2) = 1 1513 Logig 2 1.1513 (logo1-90 - log, 0.10) = “The Z-test statistic is given by lel] de=31 _ 0.973-1.47) ope 3, Vins “Vos ren 0.498 X 5.196 = Hence p (r $0.75) = P[Z s 2.59] — 0.9952 = 0.0048. Example 13.22: ‘Test the significance of the correlation, r = 0.5 from a sunple of size 18 hypothesized population correlation, p = 0.70. Solution: Take the null hypothesis that the difference is not significant, that is, Ho: p = 0.70 and Hy: p#0.70 .5. Applying z-tr =115 1+0.90 1.1513 logyo. “ = 11513 topo 320 +1513 (0.27875 + 1) = 1.47 Given, n = 18, r= nsformation, we have 1+0. 2 = 1.1513 logy 122 = 1.1513 logyo~ I-r I- 1.50 _ 1513 logy = 1,1513(0.4771) = 0.5492 140.70 Mean zp = 1.1513 logyg4=£ = 1.1513 losi07— I-p 1-070 = 1.1513 logyg—— 5: 81005 and = 1.70. 21,1513 10g 95.67 = 1.1513 logi07 55 = 1.1513(0.7536) = 0.8676 Applying Z-test statistic, we get ge Foal = le-al 2 =p yi ore eae = | 0.5492 - 0.8676 | = 0.3184(3.872) = 1.233 479[480 cHpter 1 significance level, i ulated value of Z , ; Since calc eset null hypothesis is accepted. Hence, it may 62 concluded gant hat 3 is less than its table value Za 9 = 1.9615 difference (if any) is due to sampling error. 13.23: Two independent samples of size 23 and 21 Pairs of observation, Example nd their coefficient of correlation was found as 0.5 and 0.8, Tespectively net an oo analysed ¢ this value differ significantly? Solution: Take the null hypothesis that two values do not differ signify, ya. samples are drawn from the same population. Given m = 23, 7; = 0.55 My = 28, 79 p= Boel 0 ince calculated value of) .8. Applying Z-test statistic as follows: 145 5 2, = 1.1513 lomo = 1.1513 log, 1405, 1-05 = 1.1513 logy93 = 0.55 lin 1513 log, 1408 08 = 1.1513 logig9 = 1.10 = 1.833 is less than its table value yg = 1.96 at5 per cent significance level, the null hypothesis is accepted. Hence, the difference in correlsin values is not significant. 1, What is the meaning of the coefficient of correlation? 2. Explain the meaning and significance of the term correlation. [Delhi Unie, MB4,2003) 8. What is meant by ‘correlation’? Distinguish between positive, negative, and zero correlation. [Ronchi Uni, MA, 2004] its of r? and ? What does zero? minus one? 4, What are the numerical li it mean when r equals on 5. What is correlation? Clearly explain its role with suitable illustration from simple business problems. Wethi Unie, A404, 2005) 6. What is the relationship between the coefficient of determination and the coefficient of correlation? How is the coefficient of determination interpreted? 7. Does correlation always signify a cause-and-effect relationship between the variables? [Osmania Unie, MBA, 2000) 8. What information is provided by the coefficient of correlation of a sample? Why is it necessary to Perform a test of a hypothesis for correlation? 9. When the result of a Conceptual Questions 13A 10. u. 12. 13. 14. What is the ¢-statistic that is used in a test for correlation? What is meant by the number of degrees of freedom in a test for correlation and how isit used? What is coefficient of rank correlation? Bring out its usefulness. How does this coefficient t from the coefficient of correlation? [Delhi Univ., MBA, 2006) What is Spearman's rank correlation coefficient? How does it differ from Karl Pearson's coefficient of correlation? (a) What is a scatter diagram? How do you interpret a scatter diagram? (b) What is a scatter diagram? How does it help in studying the correlation between two variables in respect of both its direction and degree? [Delhi Univ,, MBA, 2007] Define correlation coefficient ‘r’ and give its limitations. What interpretation would you give if told that the correlation between the number of truck accidents per year and the age of the drivers (90.60 if only drivers with at least one accident a considered?gf sation between the price of two, econ Sample of 60 is 0.68, Could ina ease? ‘ue NE “correlated population? rom X population in which true correlation was 0 gp ‘ ing data give sample size . wing data give sample sizes and 1 est the significance of the oF ee values using Fisher's 2 ‘commodities the observed correlation difference transformation, be Value of + ‘Sample Sie ~s 0.870 12 0.560 a suapany wants t0 study the relationship between 8 onFenditure (it €1000's) and annual prof tin 860 oF Te following table presents the information ft ast 8 Yeats. 1988 87 86 85 84 83 82 81 ipepenses: 9 7 510 4 5 3 8 Kanal profit: 45 42 41 60 30 34 25 99 { fsimate the sample correlation coefficient. ‘iar Hints and Answers aera 140.68 ler 216) 2 = 1151S ogg ** = 1.1513 logo TE 1.68 = 1.1513 log) “2% = 0.829 11515 181059 = 013 Sandard error, 6, = ts . e Test statistic: = 2B = 0820-0 2 0.13 6.38 Since deviation of z from zp is 6 times more than ., the hypothesis is not correct, that is, population is correlated. l+p t= E41 _ [0829-1099] _ 998 > 2 times o, 0.13 andard error, p is likely to be less than 0.8. *Let Hy samples are drawn from the same population. 140.87 4 = 1.1513 logig a = 1.1513 logy 597 “hi = 1.333 13.254 small retail bus yepract ce Problems 13¢ CORRELATIONANALYSIS _481 sre (©) Tes the signicance of correlation coefciet ata © 5 per cent level o 13.24; eof significance. : a the least value of rin a sample of 27 pairs from ‘variate normal population at a = 0.05 level of Significance, where gq, = 2.06 atdf = 25. 4 siness has determined that the correlation coefficient between monthly expenses and Profits for the past year, measured at the end of each month, is = 0.56. Assuming that both expenses and Profits are approximately normal, test at a = 0.05 level of significance the null hypothesis that there is, ‘no correlation between them. 18.26 The manager of a small shop is hopeful that his sales are rising significantly week by week. Treating the sales for the previous six weeks as a typical example of this rising trend, he recorded them in 2100's and analysed the results. Has the rise been significant? Wek: 1 2 3 4 «5 6 Sales + 2.69 262 280 2.70 2.75 2.81 Find the correlation coefficient between sales and ‘week and test it for significance at a = 0.05. = LisISogig} 2 = 11815 o6F “5g = 0633 a 7 = 0,895 Since the calculated value Z = 0.895 is less than its table value Z, = 2.58 at a = 0.01 level of significance, His accepted. 13.23 (a) r= 0.95 (8) Let Ho: = Oand Hy:r #0 r . 0.95 =P yi(m-2 — YU- 0.95" /8-9 = 7512 447 for df= 6, the Hy is Since ty = 7.512 > 4 rejected. rjn-2 St > 2.06 =r re or |r |= 0.381 19.25 r = 0.560 and (,,) = 0.576, Hp is rejected. 13.26 1 = 0.656 and f,,, = 0.729, Hy is rejected.[482 carrer‘ _[Formutae used 1, Karl Pearson's correlation coefficient Covariance between x and y Covariance between = an Beem YEU -9" ion from assumed mean nSdy d, ~ Ed.) Edy) © Yaid? ~ Gd.) fndd? - (2d, 4, =2-A.d,=y-B AB = constants + Bivariate frequency distribution nE fll ~ (© fil.) (© fy) Vref fa? \nd fi fay Using actual values of x andy nEsy- (2s) @y) nEx? — (Ex) Yn¥y? - (Ey)? 2. Standard error of correlation coefficient, r * Probable error of correlation coefficient, 2 PE, = 0.674 lchapter Concepts Quiz True or False 1. [T][F] There are several types of correlat coefficients, the selection of which is determined by the level of scaling of the two variables. 2. [T][F] When both variables use measured on an interval or ratio scale, Pearson's correlation coefficient is most appropriate. 3. [T][F] To use Pearson's correlation coefficient, it is assumed that both variables are continuous and normally distributed, (TI[F] When there is no linear association between two variables, the value of r will be close to zero. (T][F] A correlation coefficient r = -1 represents a very low linear correlation. [7] [F] The coefficient of determination is the square of the correlation coefficient. 4. 5. 6. 3. Coefficient of determination Explained variance “Total variance ~ 4. Spearman's rank correlation coeticien, © + Ranks are not equal 6rd? a(n? 1) Re © Ranks are equal 2. 6) xd" +g (h-m) n(n? 1) t Bp 5. Hypothesis testing + Population correlation coefficient r for sample asmal + Population correlation coefficient fora la sample 7 7. [TIF] As the correlation coefficient approaches zero, the possible error in Tinear predicion increases. 8. (T][F] The closer the correlation coefficient is zero, the greater the predictive validity of test, 9. (T][F] Ifa correlation coefficient for reliability f@ | testis close to 1, then the testis unreliable 10. [T]{F] Even a high correlation is not necessiit indicative of a casual relationship bet? two variables. 11. [T][F] As the value of r increases, the propor? | of variability of one variable y that am" accounted for another variable x deers 12, (T][F] If the relationship between wo ah is nonlinear, the value of the com coefficient must be negative.jonan’s correlation coe i SPE" one oF both variables 1 oer scaling. inter iagFAM iS Used 10 het spe ad Pt decide questions reese 1 er 6 at oof zand y values | 9 ooship between variables x andy ig |, eran the scatter diagram? # linear, yet exactly on a straight ph yfllonacurve | OM crepresent population parameters 0 eves represented by a straight line ct ceationship between x and the oy decreases, variable x a () decreases (@) changes linearly (b) must be linear 7 is positive, ag na nega ; 1 sinereases,y increases decreases, y decreases | ward ae () perinereases,j decreases | {i both (a) and (b) ‘he lowest strength of association is reflected by stich ofthe following correlation coefficients? | oss () ~0.60 | 035 (a) 0.29 |. The highest strength of association is reflected by hich of the following correlation coefficients? @)-10 (b) -0.95 oa (@) 0.85 2, There isa high inverse association between measures ‘overweight’ and ‘life expectancy’. A correlation coeficent consistent with the above statement is: a) r= 0.80 (b) r= 0.20 (©) r=-0.20 (@) r= -0.80 4% OF the following measurement levels which is the required evel for the valid calculation of the Pearson correlation coefficient 6) nominal (0 internal (b) ordinal (@) ratio ‘OF the ‘following measurement levels, which is tired for the valid calculation of the Spearman Correlation coefficient? (@) nominal f () ordinal _ fimeral (@) ratio Teresa high direct association between measures comgetete smoking’ and ‘lung damage’. ‘The ration coefficient consistent with the above Menem 130 (b) 0.80 (80 (030 ger telation coefficient appropriate for imlshing the degree of correlation between the “ables (assuming a linear relationship) he lating 15. cry lear orensanliP between wwo variables (ry When Glevlating ‘oefficieny Eat ing ‘Spearman's correlation ifference ie ithe timatinesqunchie cen he means aT of 8 which measure of correlation to eee ct of data, you should consider relationship is i the et tM eationship is linear or nonines 6) (© bau ey meatal ‘of measurement for each variable (@) neither (nor 28. The propor 8 INE Proportion of variance acount for bythe level (ay arrelation between two variables i ealelated by ¥ br © =x (@) not possible 29. The value of correlation «x i ficiem (3) depends on the origin (©) depends on the unit of scale (© depends on both origin and unit of scale @ Ss itdlependent with respect to origin and unit of 0. Which ofthe fotiowing statements is false? (@) In a perfect positive correlation, each individual obtains the same z value on each variable (0) Spearman's correlation coefficient (d) A correlation of r = 0.85 implies a stronger association than r = = 0,70 31. The strength of a linear relationship between two variables x and y is measured by @r oe © R (@) bor by 32, If value of r?=0.64, then what is the coefficient of correlation? (a) 0.40 (b) 0.04 (©) 0.80 (@) 0.08 33. If both dependent and independent variables increase in an estimating equation, then coelficient of| correlation falls in the range (a) -1srst ) Osrsi (©) -38rs3 (€) none of these 34, If unexplained variation between variables x and y is, 0.28, then 1? is, (a) 0.25 () 0.50 © 075 (@) none of these 35, What type of relationship between the two variablesis indicated by the sign of r? {a) direct relation (©) both (a) and (b) (by indirect relation (@) none of these[484 cmprer 3 Concepts Qulz Answers - TT 27 ST 47 S&F aT TBR ee oer ae TM) ag gy Oo mw he ew BO MO BO RO me ao) 3390) SL@ SO ED © 3 @ Review Self-practice Problems eer he services of any sale 1327 The following are the monthly figures of the terminate th y salesman who enerally found that advertising expenditure has is Sn ses Gi 10008) made by nin slesmen [mnpact on sales generally after 2 months. Allowing for the last one year this time lag, calculate the coefficient of correlation, Test sores : 4 19 24 21 26 22 15 a9 1g “Months Advertsing Sales Months Advertsing Sales Sales 31 36 48 37 50 45 39 4) 4p Expenditure Expenditure Compute the coefficient of correlation between y ea 2400 scores and sales. Does it indicate that termine oe 501200 July p00 Sf the services of salesman with low test soreee Feb. 60 1500 Aug. 160-2600 ote nits low test ores March 70-1600 Sep. 170-2800 i BA 2 Api” 902000 Oc. 1902900 18.88 Calculate the coeicient of, corelaion andy May 120-2200 Nov. 2003100 probable error from the following: nic emit. ETS June___150_2500 Dec. _250_3000 Subject ‘Percent Marks in Per cent Mary 13.28 The coefficient of correlation between two variables Final Bons i Sesionaly ‘xandy is 0.64. Their covariance is 16. The variance of 75 a ris 19. Find the standard deviation of series ar a 13.29Given r = 0.8, Zay = 60, o, = 2.5 and Ex? = 90, fa & find the number of observations, items. x and y are re deviations from arithmetic mean. 60 [Delhi Unis, B.Com, 2006) Maths ” 69 13.80 Calculate the Karl Pearson’s coefficient of correlation Statisties 81 n between age and playing habits from the data given Botany 84 a 75 2 below. Comment on the value ‘Zoology 2122 23 24 25 15.34 Following figures give the rainfall in inches forthe year and the production (in 100's kg) for the Rati Grop and Kharif crops. Calculate Karl Pearsons nt of correlation, between rainfall and tal ‘Age No, of students : Regular players : 500 400 300 240 200 160 400 300 180 96 60 24 colic {Osmania Uni, M4, 2006] aon 13.81 A survey regarding income and savings provided the oe following data: Rainfall : 20 22 24 26 28 30 32 a ——_ Income @) Saving @) Rabi production : 15 18 20 32 40 39 40 Su 1000 1500 2000 Kharif production: 15 17 20 18 20 21 15 40,000. 8 4 a a [Pune Univ, BA, 2004) 6000 ae 12 4 1885 President of a consulting firm is interested in te 8000 = 5 A A relationship between environmental work faces aida and the employees turnover rate. He defins = _ 10 5 environmental factors as those aspects of a Jeb 12,000 9 4 other than salary and benefits. He visited (0 ‘ini K; 7 ants and gave each plant ting 1 to 25 on Compute Karl Pearson's coefficient of correlation and aa ee ea ramhjan obtained each pl? interpret its value. [Kurukshetra Unie, MBA,2005) turnover rate (Annual in percentage) €2 od the 18.32 company gives on-the-job training to its salesmen, relationship. followed by a tes. It is considering whether it should1 CORRELATION ANALYSIS. 485 | vironmental 7 | rating 1119 71213 10 16 22 14 49 ae 5 mower: 6 48378 39 55 4 10 9 amputethecorreation cocicient between turnover " 2 cand environmental rating and test it L 12 5 UGNou, 2004, M 13 1 sgssoeen companies ina slate hive been ranked N 4 6 wseTeording to profit earned during a particular 2 15 y financial Year, ae the working capital for that year, P 16 2 salculate the rank correlation coefficient. Le Cs 5 13.37 Following are the percentage figures of expenditure Company Rank(Profit) Rank(Working capita incurred on clothing (in 2100's) and entertainment a fl 3 (in 7100's) by an average working class family in a 7 ' period of 10 years 6 c 3 ra Year + 1989 9091 9293 94 95 9697 98 D 4 15 Expenditure E 5 10 on clothing: - 24 2731322025 33302822 F 6 2 Expenditure on oe entertainment : 11 8 5 31310 27 9 2 G 7 4 H 8 u Hints and Answers 1327 r = 0.918 Compute Spearman's rank correlation coefficient and comment on the result. (60) sts 3600 (90m) x6.25 90nx 6.25 n=10 13.30 r=-0.991 13.31 7 = 0.0522 13.32 r= 0.947 13.33 1 = 0.623, PE, = 0.146 13.34 r= 0.917 13.35 r=-0.801 1336 R=-08176 18.37 R=-0.60
You might also like
Correlation
PDF
No ratings yet
Correlation
28 pages
Correlation and Regression
PDF
No ratings yet
Correlation and Regression
60 pages
Statistics
PDF
No ratings yet
Statistics
21 pages
Business Statistics Unit 3-5
PDF
No ratings yet
Business Statistics Unit 3-5
113 pages
Peter
PDF
No ratings yet
Peter
48 pages
CORRELATION
PDF
No ratings yet
CORRELATION
22 pages
Additional Textual Learning Material - B5
PDF
No ratings yet
Additional Textual Learning Material - B5
114 pages
Unit 3-1
PDF
No ratings yet
Unit 3-1
12 pages
1504677559module-33 Quadrant-I
PDF
No ratings yet
1504677559module-33 Quadrant-I
17 pages
Correlation and Regression: Jaipur National University
PDF
No ratings yet
Correlation and Regression: Jaipur National University
32 pages
Correlation: Definitions
PDF
No ratings yet
Correlation: Definitions
24 pages
Strategic Management
PDF
No ratings yet
Strategic Management
114 pages
Basics of Correlation_nh
PDF
No ratings yet
Basics of Correlation_nh
6 pages
Correlation Regreesion Sums
PDF
No ratings yet
Correlation Regreesion Sums
50 pages
Correlation Analysis PDF
PDF
No ratings yet
Correlation Analysis PDF
30 pages
Chapter 7
PDF
No ratings yet
Chapter 7
43 pages
Stat-CORRELATION & REGRESSION ANALYSIS
PDF
No ratings yet
Stat-CORRELATION & REGRESSION ANALYSIS
79 pages
Chapter 6 PDF
PDF
No ratings yet
Chapter 6 PDF
3 pages
1 Correlation
PDF
No ratings yet
1 Correlation
5 pages
Simple Correlation Converted 23
PDF
No ratings yet
Simple Correlation Converted 23
5 pages
Correlation Bmlt
PDF
No ratings yet
Correlation Bmlt
5 pages
Data Analysis
PDF
No ratings yet
Data Analysis
16 pages
Concept of Correlation (1)
PDF
No ratings yet
Concept of Correlation (1)
17 pages
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
PDF
No ratings yet
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
21 pages
Correlation Notes
PDF
No ratings yet
Correlation Notes
45 pages
Chapter - Six
PDF
No ratings yet
Chapter - Six
8 pages
Correlation and Regression-1
PDF
No ratings yet
Correlation and Regression-1
32 pages
Business Statistics Project On Correlation: Submitted by N.Bavithran BC0140018
PDF
No ratings yet
Business Statistics Project On Correlation: Submitted by N.Bavithran BC0140018
17 pages
Business Project 12 Content
PDF
No ratings yet
Business Project 12 Content
33 pages
Hypothesis Testing With Anno 1694271763686
PDF
No ratings yet
Hypothesis Testing With Anno 1694271763686
45 pages
Correlation and Regression
PDF
No ratings yet
Correlation and Regression
59 pages
FODS Unit-3
PDF
No ratings yet
FODS Unit-3
25 pages
Statics
PDF
No ratings yet
Statics
41 pages
12 Correlation Analysis_25_02_28_23_17_56
PDF
No ratings yet
12 Correlation Analysis_25_02_28_23_17_56
28 pages
Correlation
PDF
No ratings yet
Correlation
19 pages
Correlation and Regression
PDF
100% (1)
Correlation and Regression
17 pages
Unit 2 - (A) Correlation & Regression
PDF
No ratings yet
Unit 2 - (A) Correlation & Regression
15 pages
Correlation
PDF
No ratings yet
Correlation
64 pages
Unit 3 Fod
PDF
No ratings yet
Unit 3 Fod
21 pages
Correlation and Regression
PDF
No ratings yet
Correlation and Regression
64 pages
Correlation Analysis Notes-2
PDF
No ratings yet
Correlation Analysis Notes-2
5 pages
Qt Module II Correlation and Regression Analysis
PDF
No ratings yet
Qt Module II Correlation and Regression Analysis
10 pages
Correlation
PDF
No ratings yet
Correlation
17 pages
BSADM Module 4 Session 17 22 KSR
PDF
No ratings yet
BSADM Module 4 Session 17 22 KSR
28 pages
5-Correlation and Rank Correlation-03!02!2025
PDF
No ratings yet
5-Correlation and Rank Correlation-03!02!2025
60 pages
Correlation Analysis
PDF
No ratings yet
Correlation Analysis
30 pages
CBSE Class 11 NCERT Book Statistics Correlation Chapter 7 PDF
PDF
No ratings yet
CBSE Class 11 NCERT Book Statistics Correlation Chapter 7 PDF
16 pages
Ch 7 Correlation 1.Pptx
PDF
No ratings yet
Ch 7 Correlation 1.Pptx
40 pages
Lecture 29
PDF
No ratings yet
Lecture 29
5 pages
Correlation Analysis
PDF
No ratings yet
Correlation Analysis
48 pages
CH 7 Correlation 1
PDF
No ratings yet
CH 7 Correlation 1
40 pages
Correlation Notes
PDF
No ratings yet
Correlation Notes
15 pages
Chapter 6 Correlation and Regression
PDF
No ratings yet
Chapter 6 Correlation and Regression
29 pages
Correlation
PDF
No ratings yet
Correlation
83 pages
Correlation
PDF
No ratings yet
Correlation
22 pages
Simple Linear Correlation and Regression
PDF
No ratings yet
Simple Linear Correlation and Regression
21 pages
Earthquake Microzonation of Yogyakarta City
PDF
No ratings yet
Earthquake Microzonation of Yogyakarta City
23 pages
Correlation Notes
PDF
No ratings yet
Correlation Notes
9 pages