0% found this document useful (0 votes)
20 views19 pages

Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
20 views19 pages

Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 19
UNIT 2- SEC-T 1. CONCEPT AND DEFINITION OF CORRELATION In many practical applications, we might come across the situation where observations are available on ‘two or more variables. The following exampleswill illustrate the situations clearly: a) Heights and weights of persons of a certain group; b) Sales revenue and advertising expenditure in business; and ) Time spent on study and marks obtained by students in exam. If data are available for two variables, say X and Y, it is called bivariate distribution, Let us consider the example of sales revenue and expenditure on advertising inbusiness. A natural question arises in mind that is there any connection between sales revenue and expenditure on advertising? Does sales revenue increase or decrease as expenditure on advertising increases or decreases? If we see the example of time spent on study and marks obtained by students, a natural question appears whether marks increase or decrease as time spent onstudy increase or decrease. Inall these situations, we try to find out relation between two variables andcorrelation answers the question, if there is any relationship between one variable and another. When two variables are related in such a way that change in the value of onevariable affects the value of another variable, then variables are said to be correlated or there is correlation between these two variables. 2. TYPES OF CORRELATION a) Positive Correlation: Correlation between two variables is said to be positive if the values of thevariables deviate in the same direction i.e. if the values of one variable increase (or decrease) then the values of other variable also increase (or decrease). Some examples of positive correlation are correlation between 1 Heights and weights of persons of a certain group: 2. Sales revenue and advertising expenditure in business: and 3. Time spent on study and marks obtained by students in exam. b) Negative Correlation: Correlation between two variables is said to be negative if the values of variables deviate in opposite direction ic. if the values of one variable increase(or decrease) then the values of other variable decrease (or increase). Some examples of negative correlations are correlation between 1. Volume and pressure of perfect gas; 2. Price and demand of goods; 3. Literacy and poverty in a country; and 4, Time spent on watching TV and marks obtained by students inexamination, 3. Simple, Partial and Multiple Correlation: The distinction between simple, partial and multiple correlation is based upon the number of variables studied Simple Correlation: When only two variables are studied, it is a case of simple correlation. For example, when one studies relationship between the marks secured by student and the attendance of student in class, it is a problem of simple correlation. Partial Correlation: In case of partial correlation one studies three or more variablesbut considers only two variables to be influencing each other and the effect of other influencing variables being held constant. For example, in above example ofrelationship between student marks and attendance, the other variable influencingsuch as effective teaching of teacher, use of teaching aid like computer, smart boardete are assumed to be constant. Multiple Correlation: When three or more variables are studied, it is a case of multiple correlation, For example, in above example if study covers the relationship between student marks, attendance of students, effectiveness of teacher, use of teaching aids ete, it is a case of multiple correlation. 4. Linear and Non-linear Correlation: Depending upon the constancy of the ratio of change between the variables, the correlation may be Linear or Non-linear Correlation. Linear Correlation: If the amount of change in one variable bears a constant ratio to the amount, of change in the other variable, then correlation is said to be linear Non-linear Correlation: If the amount of change in one variable does not bear a constant ratio to the amount of change to the other variable, then correlation is said to be non-linear 3. METHODS OF FIND CORRELATION COEFFICIENT a) Scatter Diagram Scatter diagram is a statistical tool for determining the potentiality of correlation between dependent variable and independent variable. Scatter diagram does not tell about exact relationship between two variables but it indicates whether they are correlated or not. Let (x,.y,): @=1.2.....m) be the bivariate distribution. If the values of the dependent variable Y are plotted against corresponding values of the independent variable X in the XY plane, such diagram of dots is called scatter diagram or dot diagram. It is to be noted that scatter diagram is not suitable for large number of observations. In the scatter diagram Diagram -1 Diagram - IT Perfect Positive Correlation Perfect Negative Correlation 25) 125 104m » . 78 e 75 7 . 25 oe as . 0 o 75 10 125 High Positive Correlation High Nogativo Correlation as Diagram - IIL Diagram -1V Low Postive Correlation Lov Negative Correlation Diagram -V Diagram - VI No Correlation 4 asd. 2s w Diagram - VII B) KARL PEARSON’S CORRELATION COFFICIENT Scatter diagram tells us whether variables are correlated or not. But it does notindicate the extent of which they are correlated. Coefficient of correlation gives the exact idea of the extent of which they are correlated. Coefficient of correlation measures the intensity or degree of linear relationship between two variables. It was given by British Biometrician Karl Pearson (1867-1936). If X and Y are two random variables then correlation coefficient between X and ¥ i +) Cor(x,y) is indication of cor coefficient between two variables X and Me Where, Cov(x, y) the covariance between X and ¥ which is defined as: and eV Voo= -x, Similarly, V(x) the variance of ¥ is defined by Vor= « -¥F oat) where, n is number of paired observations. Then, the correlation coefficient “r” may be defined as le Le 7 aoe -=y, -¥) r=Corr(x,y) = REMARK 1: Karl Pearson’s correlation coefficient r is also called product moment. REMARK 2: Karl Pearson’s correlation coefficient is also denoted by p(X, ¥). Correlation coefficient. Expression in equation (2) can be simplified in various forms. Some of them are: Ye -Dy.-9 + Q) @) on 3) = 6) 4, ASSUMPTION for CORRELATION COEFFICIENT 1. Assumption of Linearity Variables being used to know correlation coefficient must be linearly related. You can see the linearity of the variables through scatter diagram. 2. Assumption of Normality Both variables under study should follow Normal distribution. Theyshould not be skewed in either the positive or the negative direction. 3. Assumption of Cause and Effect Relationship There should be cause and effect relationship between both variables, for example, Heights and Weights of children, Demand and Supply of goods,etc. When there is no cause and effect relationship between variables then correlation coefficient should be zero. If it is non zero then correlation is termed as chance correlation or spurious correlation. For example, correlation coefficient between: a) Weight and income of a person over periods of time; and b) Rainfall and literacy in a state over periods of time As correlation measures the degree of linear relationship, different values ofcoefficient of correlation can be interpreted as below: Value of correlation Correlation is coefficient +1 Perfect Positive Correlation “I Perfect Negative Correlation 0 There is no Correlation 0- Weak Positive 0.25 Correlation 0.75 - 1) Strong Positive Correlation -0.25-0 Weak Negative Correlation —0.75- (1) Strong Negative Correlation 5. PROPERTIES OF CORRELATION COFFICIENT. Property 1: Correlation coefficient lies between -1 and +1 Proof: We have to prove that -1<1r@%Y)< +1 ” r& Y= aes is . (2 abF a;= POY Trash zy where ( We have the Schwartz inequality which states that if aj, bij = 1,2, ...41 aré real quantifies then < (SE abi s( 3 a2) S 62) del jel det the sign of equality holding’if and oply if Using Schwarz inequality, we get from (*) _ X,Y) Ss lie, IX, His >*-10,b>0, h> 0 and k>0. We have to prove Corr(x,y) = Corr(u,v) i.e. there is no change in correlation when origin and scale are changed. Coves.y) = E(x—8y-3) =2D (e+ bu-ahalo +ky —b-by) = Lh D (o-aKw— Cov(x.y) = hkCoviu.v) and, voo=15(x-x) =23(e+ mu a-ha VO) =h?V(uy Similarly, Voyy=kVi(vy Cov(x, y) VVCOV) RkCov (u,v) ve Vow Corr(x, y Corr(x.y) Cov(u,v) Jive Corr(x.y) = Corr(u,v) Corr(x.y) i:e. correlation coefficient between X and Y is same as correlation coefficient between U and V Thus, correlation coefficient is independent of change of origin and scale Property 3: If X and Y are two independent variables then correlation coefficient between X and Y is zero, i.e.Corr(x. Proof. If X and ¥ are independent variables, then Cov (X, ¥) =0 ret, 1) =S22 a yn Hence two independent, variables are. poveriicl Hence two independent variables are uncorrelated. But the converse of the theorem is not uue, i.e., two uncorrelated variables may not be independent as the following example illustrates : x “pcacwan HX, Yy LOK ox Oy Thus in the above example, the variables X and Y ~e uncorrelated. But on carful examination we find that X and Y are not independent but they are connected by the relation Y = X?. Hence two uncorrelated variables need not necessarily to be independent. Example 1. Calculate the correlation coefficient for the following heights (in inches) of fathers (x) and their sons (Y) : xX: 65 66 67 | 67 68. 69 70 72 Y: o7 68 65 8 m2 72 o 7 Solution. CALCULATIONS FOR CORRELATION COEFFICIENT x Y __ x Y? xY 67 4225 4489 4355 68 4356 4624 4488 65 4489 4225 4355 68 4489 4624 4556 72 4624 5184 4896 22 4761 5184 4968 6s 4900 4761 | 4830 1 184 5041 $112 552 37028 38132 37560 1 344 6g Felyyel y Kap ik= B= 8, P= PLY = 9x 552 = 69 1 os wey =COeED REXY-XF ; ‘OxSy : = 1 sea Ed ¥ (@ Ex? -#) (zr - 7) i ies 37560 — 68 x 69 8. = 4695 — 4692 - 3 (4628-5 — 4624) (4766-5 = 4761) V4-5x 5-5 Short Cut method: V [me (68y? | 38122 coor = 0-603 Define dy =X — Ay and dy =Y — Ay Where, Ay and Ay are assumed mean of X and Y series respectively. The correlation coefficient is defined as BD dydy — Tey (lze-a) Grea) — |} — Vd — Yd where dy == , y= r= Example 2: Use short-cut method to find coefficient of correlation. x 10 12) 14 18 20 5 6| 7 10 12 Let A,= Assumed mean of X=I4 and A,= Assumed mean of Y =7 x] oy |d@exld 1 [5 |i-4= 2 | 6 |iM=2 4 [7 | I4=0 Tg | 10 | ig-14=4 20 | 12 | 20-14=6 1 = (52) — (0.8)1 r(X,Y) = 5 62)— 81 = 9.6 - 26 (2- 064) a. »} VG3-76)(6.8) 93.568 = 0.99 Example 3 A computer while calculating correlation coefficient between two variables X and Y from 25 pairs of observations obtained the following resulis : n=25, EX = 125, IX? = 650, LY = 100, LY? = 460, IXY = 508 It'was, however, later discovered at the time of checking that he had copied down two pairs as Y while the correct values were X | Y 6 [14 8 [72 . 8lé ole Obtain the correct value of correlation coefficient. - Solution. Comected EX=12- 6 - B+ B+ 6=125 Comected EY=100- 14- 6+ 12+ 8=100 Comected IX7=650 - @- B+ 82+ &=650 Cormected E¥?=460 - 147 - + 12% + B= 436 Corrected EXY = 508 - 6x14 - 8x6 + 8x12 +6x8=520 ¥ wtyyeb = =i = X =) Pk 95x 125 =5, Y=5 BY =55x 100=4 Cov KN) =i Exy -X7¥=

You might also like