0% found this document useful (0 votes)
67 views

Correlation and Regression

This document contains the syllabus for the second semester course 20MAT21 Differential Equations and Numerical Methods taught by Dr. Sowbhagya at Global Academy of Technology, Bangalore. The course covers five modules: differential equations of first order, linear differential equations, partial differential equations, statistical methods and numerical solution of equations, and finite difference and interpolation. Module 4 focuses on topics like correlation, regression analysis, curve fitting using the method of least squares, and numerical solutions of equations using methods like Regula-Falsi and Newton Raphson. It provides examples and definitions of correlation, discussing Karl Pearson's coefficient of correlation and how it measures the strength and type of relationship between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Correlation and Regression

This document contains the syllabus for the second semester course 20MAT21 Differential Equations and Numerical Methods taught by Dr. Sowbhagya at Global Academy of Technology, Bangalore. The course covers five modules: differential equations of first order, linear differential equations, partial differential equations, statistical methods and numerical solution of equations, and finite difference and interpolation. Module 4 focuses on topics like correlation, regression analysis, curve fitting using the method of least squares, and numerical solutions of equations using methods like Regula-Falsi and Newton Raphson. It provides examples and definitions of correlation, discussing Karl Pearson's coefficient of correlation and how it measures the strength and type of relationship between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Global Academy of Technology, Bangalore.

Department of Mathematics

Second semester 2020-2021 -20MAT21 Online classes

BY
Dr Sowbhagya
Assistant Professor
Syllabus For second semester
Subject Code: 20MAT21
Subject Name: Differential Equations and Numerical Methods

Module-1: Differential Equations of first order


Module-2: Linear Differential Equations
Module-3: Partial Differential Equations
Module-4: Statistical methods and Numerical solution of Equations
Module-5: Finite Difference and Interpolation
Module-4
Statistical methods and Numerical solution of Equations

• Topics You will be Learning

1. Correlation : Karl Pearson’s coefficient of correlation.


2. Regression Analysis: Lines of Regression
3. Curve fitting by the method of least squares: Straight line, Parabola
and geometrical curves.
4. Numerical solution of Equations: Solution of Algebraic and
Transcendental equations using Regula -Falsi Method and Newton
Raphson method.
Correlation
CO-Together Relation- Association

A correlation is a measure of the association/ relationship that exists


between two variables

Correlation measures 3 characteristics of relationship between vriables

1.Variations in association (+ve or -ve)


2.Form of association (Linear or Non Linear)
3.Degree of association (Strength in magnitude)
1. Variations in association (+ve or -ve)
(Types of correlation)
Can we relate two different situations of real life mathematically???
X Y

• No. of hours spent on studies Marks scored in Exam


• Stress and tention Blood Pressure(BP)
• Height of a person weight of the same person
• Demand of certain commodity Price
• Temperature Ice cream sale
• Amount of rain fall Yield of crops

: If the variables are moving in the same


direction then the correlation b/w them is called positive correlation/
Direct correlation.
X Y

• No. of hours spent on social Marks scored in Exam Decrease in Y


media for entertainment
• Volume Pressure
• Production Price Increase in Y

2) Negative Correlation: If the variables are moving in opposite


direction then the correlation b/w them is called Negative correlation/
Indirect correlation.
• Marks scored in the exam and Amount of rainfall.
• Ice cream sales and volume of a container.

Variation in X does not affect in Y and vice versa

3) Zero Relation: If two variable do not show the related variation


b/w them then it is called zero relation /No Correlation/ Orthogonal
Relation
Graphical Representation of Relationship

Perfect relation Perfect relation


NO Correlation
Positive Correlation Negative Correlation
How to measure the strength of relationship
between the variables mathematically?
Karl Pearson’s Coefficient of Correlation
• The Karl Pearson’s coefficient of correlation between two variable x and y
is given by
Cov( x, y ) n xy   x y
r r
 x y n x 2   x  n y 2   y 
2 2
Where
1
Cov( x, y )   ( x  x )( y  y )
n

1 1
x 
n
 ( x  x ) 2
y 
n
 ( y  y ) 2
3. Degree of association (Strength in magnitude)
For Positive Correlation : 0  r 1

For Negative Correlation : 1r 0

For No Correlation : r 0
Note : Coefficient of correlation is always
independent of units.
Problems
1. Calculate the correlation coefficient from the following data

x 1 2 3 4 5 6 7 8 9 10
y 10 12 16 28 25 36 41 49 40 50

Solution:
From the given data we get, n  10,  x  55,  y  307,

 x 2
 385,   11387,
y 2
 xy  2074

The correlation coefficient is given by


n xy   x  y 10  2074  55  307
r   0.9582
n x   x  n y   y  10  385  55 10 11387  307
2 2 2 2 2 2
2 . Psychological tests of intelligence and of engineering ability were applied
to 10 students. Here is a record of ungrouped data showing intelligence ratio
and engineering ratio. Calculate the coefficient of correlation.

IR 105 104 102 101 100 99 98 96 93 92


ER 101 103 100 98 95 96 104 92 97 94

Solution:
From the given n  10,  x  990,  y  980  xy  97112  x2  98180,  y  96180
2

data,
The correlation coefficient is given by

n xy   x y
r  10 97112  990  980  0.5963
n x 2   x  n  y 2   y 
2 2
10  98180  9902 10  96180  9802
3. Find the correlation coefficient between x and y from the given data:

x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108

Solution:
n  8,  x  596,  y  1006  xy  76538  x  45770,  y  128560
2 2
From the given data,

The correlation coefficient is given by

n xy   x y
r  8 76538  596 1006  0.9488
n x 2   x  n  y 2   y 
2 2
8  45770  5962 8128560 10062
4. A person while calculating coefficient ofcorrelation between two

variables from a set of 25 observations obtained the following data


 x  125,  y  100,  x 2
 650,  y 2  460,  xy  508
but it was later discovered that a pair of values x 8 6
y 12 8
x 6 8
were wrongly copied as y 14 6 . Obtain the correct value of ‘r’
Solution:
From the given data we have, n=25
Corrected value of Variable  Actual sum  (wrong value)  ( Correct value)

 x  125  (6  8)  (8  6)  125  y  100  (14  6)  (12  8)  100


 x 2
 650  ( 6 2
 8 2
)  (8 2
 6 2
)  650

 y 2
 460  (14 2
 6 2
)  (12 2
 8 2
)  436

 xy  508  (6 14  8  6)  (8 12  6  8)  520


Corrected value of r is given by
n xy   x  y 25  520  125 100
r   0.67
n x   x  n y   y  25  650  125 25  436  100
2 2 2 2 2 2
5. Prove the following
2 2
 xx y y 1  xx y y
r  1   
1
1. r  1    2. 
  3. 1  r  1
2n   x  y  2n   x  y 

Proof:
 (x  x)  ( y  y) 
2
 xx y y 2 2
( x  x )( y  y )
1. Consider        
 x2  y2
2
 x y
 x y 
n x2 n y2 n cov( x, y )
  2 1
 2
x  2
y  x y x 
n
 ( x  x )2

 n  n  2nr  2n(1  r ) 1
y 
n
 ( y  y)2
2
1  xx y y
r  1     1

2n   x  y  Cov( x, y ) 
n
 ( x  x )( y  y )
Cov ( x, y )
r 
 x y
2. Consider
 (x  x)  ( y  y) 
2
 xx y y 2 2
( x  x )( y  y )
       
 x2  y2
2
 x y
 x y 

n x2 n y2 n cov( x, y )
  2  n  n  2nr  2n(1  r )
 2
x  2
y  x y

3. From result (1), 1 r  0 r 1


1  r  1
From result (2), r 1  0 r  1

2
 xx y y
2
1  xx y y
r  1    1
r  1     
2n   x  y  
2n   x  y 
Homework Problems
1. Calculate the correlation coefficient from the following data
x 1 2 3 4 5
Answer: r= 0.8062
y 2 5 3 8 7

2. Calculate the correlation coefficient from the following data


x 3 5 6 9 10 12 15 20 22 28 Answer: r= 0.9834
y 10 12 15 18 20 22 23 30 32 34

3. If  x  2,  y  3 and r  0.4 find standard deviations of x+y and x-y.

Answer: 4.2190, 2.8636


REGRESSION ANALYSIS
Definition : Regression is the theory of estimating unknown
value of one variable with the known value of other variable.
Y  ax  b
yi 

Consider a set of values  Yi

X x1 x2 x3 - xi - xn

Y y1 y2 y3 - yi - yn

xi
Lines of Regression
n
Y  ax  b
S   Yi  yi 
2

yi i 1

A3 


n
S    axi  b  yi 
Yi 2
Minimum
 i 1
  A2
Differentiating w. r. t a & b we get
A1  
S S
  0    (1)  0    (2)
a b
xi

a  x  nb   y
Normal Equations
a  x  b x  yx
Solving eq. (1) &(2) we get 2
a  x  nb   y multiplying x to first equation n xy   x  y
a  byx
a  x  b x  yx n x 2    x 
2 2

Substitute a in one of the normal equations we get

b
y  a x
   b yx   y  byx x
y x
n n n

Y  ax  b y  byx x  ( y  byx x) Regression Equation (Line)


of y on x
y  y  byx (x  x)

Slope = b y x
Line passes through ( x , y )
Role of coefficient of correlation in regression

yi yi yi

xi xi xi
r=1 r = 0.8 r = 0.5

y  y  byx (x  x)
Regression Lines(Equations)
• If ‘x’ is treated as an independent variable and ‘y’ as dependent
variable then the straight line obtained is called Regression
equation of y on x.
n x y   x  y  
Equation of the form y  y  by x ( x  x ) Where b  OR by x  r  y 
n x    x  x 
yx 2
2

• If ‘y’ is treated as an independent variable and ‘x’ as dependent


variable then the straight line obtained is called Regression
equation of x on y.
n x y   x  y x 
Equation of the form x  x  bx y ( y  y ) Where b  OR bx y  r
 
n y    y 
xy 2
 y 
2
Points to Remember
1. r  by x .bxy

2. r , by x and bx y will have the same sign (all positive or all negative).

3. If by x  1 then bx y  1
4. The point ( x , y ) satisfies both the regression equations.
5. For regression equation of y on x i.e y  y  by x ( x  x ) where by x  slope

6. For regression equation of x on y i.e

1 1
x  x  bx y ( y  y )  y  y  x  x  where  slope
bx y bx y
Problems
1) Find the two regression lines and hence the coefficient of correlation
from the following data
x 1 2 3 4 5
y 2 5 3 8 7

Solution:
From the given data we get, n  5, x  15,  y  25, x  55,  y  151,2 2

xy  88
 x  y n xy   x y n xy   x y
x  3, y  5, byx   1.3, bxy   0.5
n x   x  n y 2   y 
2 2 2
n n
• Regression equation of y on x is given by

y  y  byx (x  x) y 5 1.3(x 3) y  1.3x1.1

• Regression equation of x on y is given by


x  x  bxy ( y  y) x  3  0.5( y  5) x  0.5y  0.5

• The correlation coefficient is given by

r 2  byx bxy r 2  (1.3)(0.5) r  0.81


2) For the following data find the most likely value of x when y=103.8 and r = - 0.32
x y
Mean 8.4 103
Standard deviation 1.21 6.4

Solution: From the given data x  8.4 , y  103 ,  x  1.21 ,  y  6.4


x 1.21
bxy  r  0.32  0.0605
y 6.4
Regression equation of x on y is given by
x  x  bxy ( y  y) x  8.4  0.0605( y 103) x  0.0605y 14.6315

Thus, x  0.0605103.8 14.6315  8.3516


3) The regression equations of two variables x and y are 3x+2y=26
and 6x+y=31. Find x , y and r
Solution:
sin ce x and y satisfies both regression equations, we get
3 x  2 y  26 
 solving we get x  4 and y  7
6 x  y  31 
• To find r:
3 26
3x  2 y  26  y  x
2 2
6 x  y  31  y  6 x  31
1 3 2
Suppose byx   6 &   bxy 
bxy 2 3

 2 
then r  byx .bxy   6     2 which is not accepted .
 3 

3 1 1
 Let byx  &  6  bxy 
2 bxy 6

 3  1 
then r  byx .bxy      0.5  1 accepted .
 2  6 
4) In a partially destroyed lab record on correlation data the following
records were only available, var(x)=9, regression equations are
4x- 5y+33=0 , 20x-9y=107. Find x , y , r and  y
Solution :
sin ce x and y satisfies both regression equations, we get
4 x  5 y  33 
 solving we get x  13 and y  17
20 x  9 y  107 

• To find r :
4 33
4 x  5 y  33  y  x 
5 5
20 107
20 x  9 y  107  y  x
9 9
4 1 20 9
Suppose byx  &   bxy 
5 bxy 9 20

 4  9 
then r  byx .bxy      0.6 .
 5   20 

• To find y

4 var(x) = 9
3 
y  byx . x  5
by x  r  y   4
x  r 0.6
Angle between RegressionLines
Consider two regression lines
y  y  by x ( x  x )......(1)

1
x  x  bx y ( y  y )  y  y   x  x  ......(2)
bx y
y 
Slope of line (1) : m1  by x  r


 x 

Slope of line (2) : 1  1   y 


m2     
bxy  r    x 
 1   y  y 
  r 
m2  m1 
  x 
r 
 x 
tan   
1  m1m2   y   y 
1   

 x  x 
  y  1   1 r2 

   r   x y   r 
 x  r   
 
y    x    y 
2 2 2

1  
x 

For r = +1 or -1, both lines coincide

For r = 0, lines are perpendicular


Home work Problems
1. Obtain the two regression lines and coefficient of correlation
from the following data
x 1 3 4 2 5 8 9 10 13 15
y 8 6 10 8 12 16 16 10 32 32

Answer: y  1.7647 x  2.647, x  0.4401 y 0.3985, r  0.8812

2. From the following data, obtain the two regression lines,


coefficient of correlation and most likely value of y when
x=30.
x 25 28 35 32 31 36 29 38 34 32
y 43 46 49 41 36 32 31 30 33 39

Answer: y  0.66429 x  59.257, x  0.23367 y 40.88, r  0.394


Spearman’s coefficient of correlation -
Rank Correlation

• Spearman’s rank correlation is a measure of


correlation between two ranked variables.

• Spearman’s correlation replaces observations by


their ranks in the calculation of correlation
coefficient.
• Consider a set of data for two variables x and y

x x1 x2 x3 x4 - - xn
y y1 y2 y3 y4 - - yn

• Assign ranks to x-series data and y-series data, with highest value as
rank 1 and lowest with rank n.

x x1 x2 x3 x4 - - xn
y y1 y2 y3 y4 - - yn
Rx Rx1 Rx2 Rx3 Rx4 - - Rxn
Ry Ry1 Ry2 Ry3 Ry4 - - Ryn
Spearman’s rank correlation coefficient -Formula
• Let d denote the difference in ranks i.e. d=Rx-Ry

x x1 x2 x3 x4 - - xn
y y1 y2 y3 y4 - - yn
Rx Rx1 Rx2 Rx3 Rx4 - - Rxn
Ry Ry1 Ry2 Ry3 Ry4 - - Ryn
d=Rx-Ry Rx1-Ry1 Rx2-Ry2 Rx3-Ry3 Rx4-Ry4 - - Rxn-Ryn

• Spearman’s rank correlation coefficient is given by

6 d 2

  1 where  1    1 Ranks are not repeated


n(n  1)
2
Problems
1. Calculate the rank correlation coefficient from the following data
Marks in Physics 36 43 47 28 35 50 40
Marks in Maths 73 44 35 30 20 36 40

Solution:
Marks in Physics (x) 36 43 47 28 35 50 40
Marks in Maths (y) 73 44 35 30 20 36 40
Rx 5 3 2 7 6 1 4
Ry 1 2 5 6 7 4 3
d = Rx-Ry 4 1 -3 1 -1 -3 1
d2 16 1 9 1 1 9 1

The rank correlation coefficient is given by


6 d 2 6  38
  1  1  0.3214
n( n  1)
2
7(7 2  1)
2. Calculate the rank correlation coefficient from the following data
x 68 64 75 50 69 80 76 40 55
y 62 58 68 45 81 60 69 48 50
Solution:
x y Rx Ry d=Rx-Ry d2 The rank correlation coefficient
68 62 5 4 1 1 is given by

6 d 2
64 58 6 6 0 0
6  28
75 68 3 3 0 0
  1  1
50 45 8 9 -1 1 n(n  1)
2
9(92  1)
69 81 4 1 3 9
80
76
60
69
1
2
5
2
-4
0
16
0
 0.7667
40 48 9 8 1 1
55 50 7 7 0 0
3. In a beauty contest 10 participants are ranked by 3 judges, determine which pair
of judges has a common taste in respect of beauty.

Judge 1 4 7 2 1 5 10 9 6 3 8
Judge 2 6 5 1 2 9 10 3 7 4 8
Judge 3 6 8 2 1 5 9 10 3 4 7

Solution :
To find: rank correlation between Judge 1 & 2, Judge 1 & 3, Judge 2 &3.
Judge 1 Judge 2 Judge 3 d12 d13 d23 d2 d2 d2
12 13 23
4 6 6 -2 -2 0 4 4 0
7 5 8 2 -1 -3 4 1 9
2 1 2 1 0 -1 1 0 1
1 2 1 -1 0 1 1 0 1
5 9 5 -4 0 4 16 0 16
10 10 9 0 1 1 0 1 1
9 3 10 6 -1 -7 36 1 49
6 7 3 -1 3 4 1 9 16
3 4 4 -1 -1 0 1 1 0
8 8 7 0 0 1 0 1 1
Total 64 18 94
• Rank Correlation between Judge 1 and 2 is given by

6 d122 6  64
12  1   1  0.6121
n(n  1)
2
10(10  1)
2

• Rank Correlation between Judge 1 and 3 is given by Highest

6 d132 6 18 Hence Judge 1


13  1   1  0.8909
n(n  1)
2
10(10  1)
2
and 3 have
common taste
• Rank Correlation between Judge 2 and 3 is given by in respect of
beauty
6 d 23
2
6  94
 23  1  1  0.4303
n(n  1)
2
10(10  1)
2
Home work Problem

Calculate the rank correlation coefficient from the following


data
x 56 42 72 36 63 47 55 49 38 43 68 60
y 147 125 160 118 149 128 150 145 115 140 152 155

Answer: 0.9336
Thank You

You might also like