0% found this document useful (0 votes)
20 views10 pages

Correlation - and - Regression - Analysis

The document discusses correlation and regression analysis techniques. It defines correlation analysis and degrees of correlation. It also describes the Pearson correlation coefficient and Spearman correlation coefficient methods. Additionally, it provides examples to calculate coefficients. Finally, it introduces regression analysis and the simple linear regression model.

Uploaded by

iha8le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Correlation - and - Regression - Analysis

The document discusses correlation and regression analysis techniques. It defines correlation analysis and degrees of correlation. It also describes the Pearson correlation coefficient and Spearman correlation coefficient methods. Additionally, it provides examples to calculate coefficients. Finally, it introduces regression analysis and the simple linear regression model.

Uploaded by

iha8le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Correlation and regression analysis

Statistics and Probability, MTH 281

Correlation analysis

Correlation analysis holds significant importance in statistical analysis. It helps in under-


standing the strength and direction of relationships between variables.

Degrees of correlation

Here are correlation degrees based on the coefficient magnitude:

• Perfect Positive Correlation (r = 1): Data points form a positively sloped straight line.

• Strong Positive Correlation (0.7  r < 1): A strong relationship.

• Moderate Positive Correlation (0.3  r < 0.7): A moderate relationship.

• Weak Positive Correlation (0 < r < 0.3): A weak relationship.

• No Correlation (r = 0): No relationship.

• Weak Negative Correlation ( 0.3  r < 0): A weak inverse relationship.

• Moderate Negative Correlation ( 0.7  r < 0.3): A moderate inverse relationship.

• Strong Negative Correlation ( 1  r < 0.7): A strong inverse relationship.

• Perfect Negative Correlation (r = 1): Points form a negatively sloped straight line.

1
Methods of correlation

The most common approaches to measure the correlation are:

• Pearson correlation coefficient

• Spearman correlation coefficient

1 Pearson correlation coefficient

Definition: The Pearson correlation coefficient is a statistical measure that quantifies the
strength and direction of a linear relationship between two continuous variables. The math-
ematical formula of the Pearson correlation coefficient is given by
P P P
n xi yi xi yi
r=q P P 2q P 2 P 2,
n xi 2 ( xi ) n yi ( yi )

where r is the Pearson correlation coefficient, xi is the value of the variable x and yi is the
value of the variable y.

Example 1

The table below shows the values of x and y as follows,

x 1 5 7 9 11 13 15 5
y 4 7 9 11 13 15 17 12

Find the Pearson correlation coefficient.

Solution

The formula for the Pearson correlation coefficient is given by


P P P
n xi yi xi yi
r=p P 2 P 2 p P 2 P .
n xi ( xi ) · n yi ( yi ) 2

2
Here, n = 8, and we need to do the following calculations:

X X X X X
xi = 66, yi = 88, x2i = 696, yi2 = 1094, xi yi = 854.

Substituting the given values:

8 ⇥ 854 66 ⇥ 88
r=p p ⇡ 0.93.
8 ⇥ 696 662 · 8 ⇥ 1094 882

The relationship here is strong and direct.

Example 2

The table below shows the values of x and y as follows,

x 6 8 10 13 14 15
y 3 2 5 7 4 9

Find the Pearson correlation coefficient.

Solution

Here, n = 6, and we need to do the following calculations:

X X X X X
xi = 66, yi = 30, x2i = 790, yi2 = 184, xi yi = 366.

Substituting the given values:

6 ⇥ 326 66 ⇥ 30
r=p p ⇡ 0.77.
6 ⇥ 790 662 · 6 ⇥ 184 302

Here is a strong positive relationship.

3
2 Spearman correlation coefficient

Definition: The Spearman rank correlation is a non-parametric statistical measure that as-
sesses the strength and direction of the monotonic relationship between two variables. Unlike
the Pearson correlation, the Spearman correlation does not assume a specific distribution of
the data or a linear relationship. Instead, it operates on the ranks of the data, making it
suitable for ordinal or ranked variables. Mathematically, the Spearman correlation coefficient
is given by
P
6 d2
r=1 ,
n(n2 1)

where r is the Spearman correlation coefficient, d is the di↵erence in ranks for each pair of
observations and n is the number of data points or observations.

Example 1

The table below shows the values of x and y as follows,

x 10 8 12 7 5 9 13 15
y 7 4 2 10 8 11 5 9

Find the Spearman correlation coefficient.

Solution

To find the Spearman correlation coefficient, we must calculate the ranks of x and y as
follows.

4
x y Rank of x Rank of y d d2
10 7 5 4 1 1
8 4 3 2 1 1
12 2 6 1 5 25
7 10 2 7 -5 25
5 8 1 5 -4 16
9 11 4 8 -4 16
13 5 7 3 4 16
15 9 8 6 2 4

Now, we have the formula of the Spearman correlation coefficient stated as.
P
6 d2 6 ⇥ 104
r=1 =1 ⇡ 0.23.
n(n2 1) 8(82 1)

The correlation here is weak inverse.

Example 2

The table below shows the values of x and y as follows,

x 1 5 7 9 11 13 15 5
y 4 7 9 11 13 15 17 12

Find the Spearman correlation coefficient.

5
Solution

x y Rank of x Rank of y d d2
1 4 1 1 0 0
5 7 2.5 2 0.5 0.25
7 9 4 3 1 1
9 11 5 4 1 1
11 13 6 6 0 0
13 15 7 7 0 0
15 17 8 8 0 0
5 12 2.5 5 -2.5 6.25

Now, we have the formula of the Spearman correlation coefficient stated as.
P
6 d2 6 ⇥ 8.5
r=1 =1 ⇡ 0.9.
n(n2 1) 8(82 1)

The correlation here is direct and strong.

6
Regression analysis
Regression analysis is a powerful statistical method that allows us to explore relationships
between variables and make predictions based on observed data.

Note: Several types of regression analysis, such as simple linear regression, multiple linear
regression and logistic regression. Here, we will talk about the simple linear regression.

Simple linear regression

Definition Simple linear regression analysis is a statistical method that is applied to predict
one outcome from a single predictor, assuming a straight-line relationship between the two
variables.

The equation of the regression line

In the simple linear regression, the equation of the regression line is represented as:

ŷ = b0 + b1 x.

Here,

• ŷ represents the predicted value of the dependent variable (y),

• x is the independent variable,

• b0 is the intercept (where the line intersects the y-axis),

• b1 is the slope of the line (indicating the change in y for a unit change in x).

7
To calculate (b1 ), the formula is:
P P P
n xy x y
b1 = P 2 P 2,
n (x ) ( x)

and (b0 ) can be calculated using the formula:

b0 = ȳ b1 x̄,

where
P P
y x
ȳ = , x̄ = .
n n

Example 1

The table below shows the values of x and y as follows,

x 6 8 10 13 14 15
y 3 2 5 7 4 9

Find the following statements:

1. The equation of the regression line

2. If x = 19, then compute y.

Solution

1. We make the following table to find the regression line.

x y xy x2
6 3 18 36
8 2 16 64
10 5 50 100
13 7 91 169
14 4 54 196
15 9 135 225

8
Now, to calculate b1 :
P P P
n xy x y 6 ⇥ 364 (66 ⇥ 30)
b1 = P 2 P 2 = = 0.53,
n (x ) ( x) 6 ⇥ 790 (66)2

and b0 , we need firstly to solve:


P
x 66
x̄ = = = 11,
n 6
P
y 30
ȳ = = = 5.
n 6

Now, substituting these values, we have

b0 = ȳ b1 x̄ = 5 0.53 ⇥ 11 = 0.83.

The equation of the regression line is given as:

ŷ = b0 + b1 x = 0.83 + 0.53x.

2. At x = 19, then to compute the predicted value of y:

ŷ = 0.83 + 0.53 ⇥ 19 = 9.24.

Example 2

The table below shows the values of x and y as follows,

x 6 8 9 8 7 6 5 6 5 5
y 10 13 15 14 9 7 6 6 5 5

Find the following statements:

1. The equation of the regression line

2. If x = 16, then compute y.

9
Solution

1.

x y xy x2
6 10 60 36
8 13 104 64
9 15 135 81
8 14 112 64
7 9 63 49
6 7 42 36
5 6 30 25
6 6 36 36
5 5 25 25
5 5 25 25
P P P
n xy x y 10 ⇥ 632 (65 ⇥ 90)
b1 = P 2 P 2 = = 2.5.
n (x ) ( x) 10 ⇥ 441 (65)2
P
x 65
x̄ = = = 6.5.
n 10
P
y 90
ȳ = = = 9.
n 10

b0 = ȳ b1 x̄ = 9 2.5 ⇥ 6.5 = 7.25.

The equation of the regression line is given as:

ŷ = b0 + b1 x = 7.25 + 2.5x.

2. At x = 16, then to compute the predicted value of y:

ŷ = 7.25 + 2.5 ⇥ 16 = 32.75.

10

You might also like