0% found this document useful (0 votes)
2 views

Module III Correlation and Regression

The document provides an overview of correlation and regression, explaining the concepts of correlation as a statistical tool to measure relationships between variables, and detailing types of correlation including positive and negative. It discusses various methods for calculating correlation coefficients, such as Karl Pearson’s and Spearman’s coefficients, and highlights the importance of understanding the nature and degree of correlation. Additionally, it includes examples and problems to illustrate the application of these concepts in real-world scenarios.

Uploaded by

sajiniossajini0
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module III Correlation and Regression

The document provides an overview of correlation and regression, explaining the concepts of correlation as a statistical tool to measure relationships between variables, and detailing types of correlation including positive and negative. It discusses various methods for calculating correlation coefficients, such as Karl Pearson’s and Spearman’s coefficients, and highlights the importance of understanding the nature and degree of correlation. Additionally, it includes examples and problems to illustrate the application of these concepts in real-world scenarios.

Uploaded by

sajiniossajini0
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Module III

CORRELATION
&
REGRESSION
1
2 CORRELATION

? Correlation is a statistical tool that helps to measure and


analyze the degree of relationship between two variables.

? Correlation analysis deals with the association between


two or more variables.
X Y XY X2
3
6 82 492 36
2 63 126 4
1 57 57 1
5 88 440 25
3 68 204 9
2 75 150 4
1469 79

n= 6
b= 5.2
y= 5.2x + 55.69
a= 55.69
Types of Correlation
4

Positive Negative
Correlation Correlation

• Values change in the same • Values change in the opposite


direction direction
• As X is increasing, Y is • As X is increasing, Y is
increasing decreasing
• As X is decreasing, Y is • As X is decreasing, Y is
decreasing increasing

Height & Weight Price & Demand


Income & Expenditure Stress & Happiness
5
6 Degree of Correlation

? Perfect Positive Correlation


? High Positive Correlation
? Moderate Positive Correlation
? High Negative Correlation
? Moderate Negative Correlation
? Perfect Negative Correlation
? No Correlation
7
8
Correlation Coefficient
?
Measuring Correlation Coefficients
9
? Karl Pearson’s coefficient of correlation
? This method is applicable to find correlation
coefficient between two numerical attributes
? Charles Spearman’s coefficient of correlation
? This method is applicable to find correlation
coefficient between two ordinal attributes
? Chi-square coefficient of correlation
? This method is applicable to find correlation
coefficient between two categorical attributes
Measuring Correlation Coefficients
10
? Charles Spearman’s coefficient of correlation
? Coefficient of correlation by the method of least
squares
? Coefficient of correlation using simple regression
coefficients
? Karl Pearson’s coefficient of correlation
? Scatter Diagram
? Concurrent Deviation Method
11

? Correlation can either be positive or it can be


negative
? Correlation can either be linear or non linear
? Correlation can either be simple or it can be
partial or multiple correlation
12

Changes in Changes in Nature of Correlation


Independent Variable Dependent Variable

Increase (+) Increase (+) Positive (+)

Decrease (-) Decrease (-) Positive (+)

Increase (+) Decrease (-) Negative (-)

Decrease (-) Increase (+) Negative (-)


13 Karl Pearson’s Correlation Coefficient

? Most widely used, measuring the relationship


between two variables
Karl Pearson’s Correlation Coefficient
14
? This is also called Pearson’s Product Moment Correlation

Definition: Karl Pearson’s correlation coefficient


15 Steps
?
16 Pearson Correlation
• Also said to be the measure of covariance
Variance, Covariance and Standard
deviation

17
Correlation Coefficient
18

◻Pearson’s Product Moment Correlation


◻Symbolized by r
◻Covariance ÷ (product of the 2 SDs)

◻Correlation is a standardized covariance


19 Pearson Product Moment

r = Sample correlation coefficient


Sxy= Sample covariance
Sx = Sample standard deviation of x
Sy =Sample standard deviation of y
20 Computing Formula for Pearson’s r

?
PROBLEM: The following sample data for the stereo and sound equipment store
21

Week #Commercial Sales


s Volume
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
PROBLEM: The following sample data for the stereo and sound equipment store
22

Week x y
1 2 50 -1 -1 1
2 5 57 2 6 12
3 1 41 -2 -10 20
4 3 54 0 3 0
5 4 54 1 3 3
6 1 38 2 -13 26
7 5 63 -2 12 24
8 3 48 0 -3 0
9 4 59 1 8 8
10 2 46 1 -5 5
0 0 99
Variance, Covariance and Standard
deviation

23
Variance, Covariance and Standard
deviation

24
0.93
PROBLEM
25
Student Hrs Studies Marks Obtained
(Percentage)
A 6 82
B 2 63
C 1 57
D 5 88
E 3 68
F 2 75
26
Hrs studied Percentage
Student (x) (y) x2 y2 xy
A 6 82
B 2 63
C 1 57
D 5 88
E 3 68
F 2 75
Hrs studied Percentage y2
27 Student (x) (y) x2 xy
A 6 82 36 6724 492
B 2 63 4 3969 126
C 1 57 1 3249 57
D 5 88 25 7744 440
E 3 68 9 4624 204
F 2 75 4 5625 150

r= 0.86
Shyness Speeches
Compute r for the
X Y
relationship between
Shyness and Speeches.
0 8
2 10
3 4
6 6
9 1
10 3
Computational Example of r for the relationship
between Shyness and Speeches
Shyness Speeches XY X2 Y2
X Y
N ∑XY - ∑X ∑Y
r=
0 8 0 0 64
[ N ∑X – (∑X) ] [N ∑Y – (∑Y) ]
2 2 2 2

2 10 20 4 100
(6 X 107) – 30 (32) 3 4 12 9 16

[6 (230) – 302] [6 (226) – 322 ] 6 6 36 36 36

9 1 9 81 1

r = -.797 10 3 30 100 9

30 32 107 230 226


Homework
1. A department of transportation’s study on driving speed and
mileage for midsize automobiles resulted in the following data.
Driving Speed 30 50 40 55 30 25 60 25 50 55
Mileage 28 25 25 23 30 32 21 35 26 25
Compute and interpret the sample correlation coefficient

30
Incase of Grouped Data

31
Audit Time Frequency Middle point 𝑓iMi
(days)
10-14 4 12 48

15-19 8 17 136

20-24 5 22 110

25-29 2 27 54

30-34 1 32 32

20 380

32
Audit Time Frequency Middle 𝑓iMi
(days) point
10-14 4 12 48 -7 49 196

15-19 8 17 136 -2 4 32

20-24 5 22 110 3 9 45

25-29 2 27 54 8 64 128

30-34 1 32 32 13 169 169

20 380 570

33
38 Pearsonian Correlation: Assumptions

1. It describes a relationship between two variables. It does not


explain why the two variables are related. (Not proof of a
cause-and-effect relationship between two variables)- there
is a linear relationship between variables
2. The value of correlation can be affected by the range of
scores.
3. One of two extreme points (outliers) can have dramatic effect
on the value of a correlation
4. There is cause and effect relationship between the variables
Pearsonian Coefficient: Merits and
39
Limitations
? Merit
? Most commonly used
? Both value and direction can be obtained from the r
? Limitation
? Correlation always assumes linear relationship
? Very often the result is misinterpreted
? r unduly affected by extreme values in the series
? Takes more time to compute
Charles Spearman’s Correlation Coefficient
40
? This correlation measurement is also called Rank
correlation.

? This technique is applicable to determine the degree


of correlation between two variables in case of ordinal
data.

? We can assign rank to the different values of a variable with


ordinal data type.
? Example:
Height:[VS S T VT] T−shirt: [XS S L XL
XXL]
1 2 3 4 11 12
13 14 15
Spearman Rank Correlation Coefficient (rs)
▪ It is a non-parametric measure of correlation.
▪ This procedure makes use of the two sets of ranks that
may be assigned to the sample values of x and Y.
▪ Spearman Rank correlation coefficient could be
computed in the following cases:
▪Both variables are quantitative.
▪Both variables are qualitative ordinal.
▪One variable is quantitative and the other is
qualitative ordinal.
Procedure:
1. Rank the values of X from 1 to n where n is the
numbers of pairs of values of X and Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of observation by
subtracting the rank of Yi from the rank of Xi
4. Square each di and compute ∑di2 which is the sum of
the squared values.
5. Apply the following formula
Entrance Rank Final Rank d d2
1 10 9 81
2 7 5 25
3 6 3 9
4 4 0 0
5 8 3 9
6 3 3 9
7 1 6 36
8 9 1 1
9 2 7 49
10 5 5 25
43
Entrance Rank Final Rank d d2
1 10 9 81
2 7 5 25
3 6 3 9
4 4 0 0
5 8 3 9
6 3 3 9
7 1 6 36
8 9 1 1
9 2 7 49
10 5 r= -0.4789 5 25
44
Workers J1 J2 J3
A 7 4 10
B 4 1 2
C 10 9 8
D 5 10 5
E 9 7 7
F 8 3 6
G 6 2 9
H 2 5 1
I 1 6 4
J 3 8 3
45
Which two judges have the nearest approach?
Entrance Rank Final Marks Final Rank
1 70% 2
2 83% 1
3 60% 4
4 60% 4
5 50% 6
6 60% 4
46
Problem

In a study of the relationship between level education and income


the following data was obtained. Find the relationship between
them and comment.
Income level education sample
(Y) (X) numbers
25 Preparatory. A
10 Primary. B
8 University. C
10 secondary D
15 secondary E
50 illiterate F
60 University. G
d2 d Rank Rank
Y X (Y) (X)

4 2 3 5 25 Preparatory(3) A

0.25 0.5 5.5 6 10 Primary(4) B


30.25 -5.5 7 1.5 8 University(1) C
4 -2 5.5 3.5 10 Secondary(2) D
0.25 -0.5 4 3.5 15 Secondary (2) E
25 5 2 7 50 illiterate(5) F
0.25 0.5 1 1.5 60 University (1) G
Comment:
There is an indirect weak correlation between level of education
and income.
50 To summarize

? Spearman’s Rank Correlation have two


types of problems:
? When Rank is given
? Unique ranks
? Equal ranks (avg)
? When Rank is not given
51 Rank Correlation

? Merits:
? Simpler to understand and easier to apply
? Can be used for qualitative data
? Can be used when ranks are given in place of actual
data, can also be used when actual data is given
? Useful when the data is non normal
52 Rank Correlation

? Demerits:
? Cannot be used when grouped frequency
distribution is given
? When n increases calculations becomes complex
and tedious
53 When to use Rank Correlation

? Initial data in the form of ranks


? When N is fairly small
54 Concurrent Deviation Method
? Simplest of all methods
? Based on the direction of change of X and Y variables

C = number of concurrent deviations or the number of


positive signs obtained after multiplying Dx with Dy
n= number of pairs of observations compared
55 Steps
1. Find out the direction of change of X variable as
compared with the previous value, increasing “+”,
decreasing “–” and constant “0”. Denote this column
by Dx
2. Find the same for Y and denote the column as Dy
3. Multiply Dx and Dy and determine the number of
positive signs
4. Apply in the equation
56 Calculate the coefficient of correlation by
concurrent deviation method
X 60 55 50 56 30 70 40 35 80 80 75
Y 65 40 35 75 63 80 35 20 80 60 60
X Dx Y Dy Dx.Dy

57
60 65
55 - 40 - +
50 - 35 - +
56 + 75 + +
30 - 63 - +
70 + 80 + +
40 - 35 - +
35 - 20 - +
80 + 80 + +
80 0 60 + 0
75 - 60 0 0
58
Regression Analysis
? The regression analysis is a statistical method to deal with the
formulation of mathematical model depicting relationship amongst
variables, which can be used for the purpose of prediction of the
values of dependent variable, given the values of independent
variables.
? Classification of Regression Analysis Models
? Linear regression models
1. Simple linear regression
2. Multiple linear regression
? Non-linear regression models
59 SCATTER DIAGRAM
? graph of observed plotted points where each point
represents the values of X and Y as a coordinate
? Portrays the relationship between these two variables
graphically
? To determine the extent of association, look at the scatter
of various points
? The wider the scatter, the less close is the relationship
? Closer the points and the closer they come to falling on a
line passing through them, higher the degree of
association
60 SCATTER DIAGRAM
61 Problem Adv Profit
? The following data represents the money 5 8
spent on advertising of a product and 6 7
respective profits realized from each
7 9
advertising period for the given product.
The amounts are in thousands of dollars. 8 10
Assume profit to be dependent variable and 9 13
advertising as an independent variable
10 12
11 13
62
Simple Linear Regression Model
?
63 Slope of the Regression line
y= a +bx
64

?
65

You might also like