Correlation - and - Regression - Analysis
Correlation - and - Regression - Analysis
Correlation analysis
Degrees of correlation
• Perfect Positive Correlation (r = 1): Data points form a positively sloped straight line.
• Perfect Negative Correlation (r = 1): Points form a negatively sloped straight line.
1
Methods of correlation
Definition: The Pearson correlation coefficient is a statistical measure that quantifies the
strength and direction of a linear relationship between two continuous variables. The math-
ematical formula of the Pearson correlation coefficient is given by
P P P
n xi yi xi yi
r=q P P 2q P 2 P 2,
n xi 2 ( xi ) n yi ( yi )
where r is the Pearson correlation coefficient, xi is the value of the variable x and yi is the
value of the variable y.
Example 1
x 1 5 7 9 11 13 15 5
y 4 7 9 11 13 15 17 12
Solution
2
Here, n = 8, and we need to do the following calculations:
X X X X X
xi = 66, yi = 88, x2i = 696, yi2 = 1094, xi yi = 854.
8 ⇥ 854 66 ⇥ 88
r=p p ⇡ 0.93.
8 ⇥ 696 662 · 8 ⇥ 1094 882
Example 2
x 6 8 10 13 14 15
y 3 2 5 7 4 9
Solution
X X X X X
xi = 66, yi = 30, x2i = 790, yi2 = 184, xi yi = 366.
6 ⇥ 326 66 ⇥ 30
r=p p ⇡ 0.77.
6 ⇥ 790 662 · 6 ⇥ 184 302
3
2 Spearman correlation coefficient
Definition: The Spearman rank correlation is a non-parametric statistical measure that as-
sesses the strength and direction of the monotonic relationship between two variables. Unlike
the Pearson correlation, the Spearman correlation does not assume a specific distribution of
the data or a linear relationship. Instead, it operates on the ranks of the data, making it
suitable for ordinal or ranked variables. Mathematically, the Spearman correlation coefficient
is given by
P
6 d2
r=1 ,
n(n2 1)
where r is the Spearman correlation coefficient, d is the di↵erence in ranks for each pair of
observations and n is the number of data points or observations.
Example 1
x 10 8 12 7 5 9 13 15
y 7 4 2 10 8 11 5 9
Solution
To find the Spearman correlation coefficient, we must calculate the ranks of x and y as
follows.
4
x y Rank of x Rank of y d d2
10 7 5 4 1 1
8 4 3 2 1 1
12 2 6 1 5 25
7 10 2 7 -5 25
5 8 1 5 -4 16
9 11 4 8 -4 16
13 5 7 3 4 16
15 9 8 6 2 4
Now, we have the formula of the Spearman correlation coefficient stated as.
P
6 d2 6 ⇥ 104
r=1 =1 ⇡ 0.23.
n(n2 1) 8(82 1)
Example 2
x 1 5 7 9 11 13 15 5
y 4 7 9 11 13 15 17 12
5
Solution
x y Rank of x Rank of y d d2
1 4 1 1 0 0
5 7 2.5 2 0.5 0.25
7 9 4 3 1 1
9 11 5 4 1 1
11 13 6 6 0 0
13 15 7 7 0 0
15 17 8 8 0 0
5 12 2.5 5 -2.5 6.25
Now, we have the formula of the Spearman correlation coefficient stated as.
P
6 d2 6 ⇥ 8.5
r=1 =1 ⇡ 0.9.
n(n2 1) 8(82 1)
6
Regression analysis
Regression analysis is a powerful statistical method that allows us to explore relationships
between variables and make predictions based on observed data.
Note: Several types of regression analysis, such as simple linear regression, multiple linear
regression and logistic regression. Here, we will talk about the simple linear regression.
Definition Simple linear regression analysis is a statistical method that is applied to predict
one outcome from a single predictor, assuming a straight-line relationship between the two
variables.
In the simple linear regression, the equation of the regression line is represented as:
ŷ = b0 + b1 x.
Here,
• b1 is the slope of the line (indicating the change in y for a unit change in x).
7
To calculate (b1 ), the formula is:
P P P
n xy x y
b1 = P 2 P 2,
n (x ) ( x)
b0 = ȳ b1 x̄,
where
P P
y x
ȳ = , x̄ = .
n n
Example 1
x 6 8 10 13 14 15
y 3 2 5 7 4 9
Solution
x y xy x2
6 3 18 36
8 2 16 64
10 5 50 100
13 7 91 169
14 4 54 196
15 9 135 225
8
Now, to calculate b1 :
P P P
n xy x y 6 ⇥ 364 (66 ⇥ 30)
b1 = P 2 P 2 = = 0.53,
n (x ) ( x) 6 ⇥ 790 (66)2
b0 = ȳ b1 x̄ = 5 0.53 ⇥ 11 = 0.83.
ŷ = b0 + b1 x = 0.83 + 0.53x.
Example 2
x 6 8 9 8 7 6 5 6 5 5
y 10 13 15 14 9 7 6 6 5 5
9
Solution
1.
x y xy x2
6 10 60 36
8 13 104 64
9 15 135 81
8 14 112 64
7 9 63 49
6 7 42 36
5 6 30 25
6 6 36 36
5 5 25 25
5 5 25 25
P P P
n xy x y 10 ⇥ 632 (65 ⇥ 90)
b1 = P 2 P 2 = = 2.5.
n (x ) ( x) 10 ⇥ 441 (65)2
P
x 65
x̄ = = = 6.5.
n 10
P
y 90
ȳ = = = 9.
n 10
ŷ = b0 + b1 x = 7.25 + 2.5x.
10