0% found this document useful (0 votes)
17 views14 pages

L5 Correlation & Regression - 082913

The document discusses correlation and regression, explaining how to find relationships between quantitative variables using statistical techniques. It covers scatter diagrams, Pearson's correlation coefficient, Spearman rank correlation, and linear regression models, providing examples for clarity. The document emphasizes the importance of understanding the nature and strength of relationships between variables for predictive analysis.

Uploaded by

Ahmed Khaled
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

L5 Correlation & Regression - 082913

The document discusses correlation and regression, explaining how to find relationships between quantitative variables using statistical techniques. It covers scatter diagrams, Pearson's correlation coefficient, Spearman rank correlation, and linear regression models, providing examples for clarity. The document emphasizes the importance of understanding the nature and strength of relationships between variables for predictive analysis.

Uploaded by

Ahmed Khaled
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Correlation & Regression

Dr. Mahmud Ismail, 2025

Correlation
Correlation Finding the relationship between two quantitative variables
(data numbers) without being able to infer causal relationships.
It is a statistical technique used to determine the degree to which two
variables are related

Scatter diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and the second
is called dependent (Y)
• Points are not joined
Example Plot Scatter between Wt and blood pressure

Scatter plots
The pattern of data is indicative of the type of relationship between
your two variables:
positive relationship
negative relationship
no relationship

positive relationship negative relationship no relationship


Pearson's Simple Correlation coefficient (r)
Statistic showing the degree of relation between two variables
It is also called Pearson's correlation or product moment correlation
coefficient.
It measures the nature and strength between two variables of
the quantitative type.
The sign of r denotes the nature of association
while the value of r denotes the strength of association.
If the sign is +ve this means the relation is direct (an increase in one
variable is associated with an increase in the other variable and a
decrease in one variable is associated with a decrease in the other
variable).
While if the sign is -ve this means an inverse or indirect relationship
(which means an increase in one variable is associated with a decrease
in the other).

The value of r ranges between ( -1) and ( +1)


The value of r denotes the strength of the association as illustrated
by the following diagram.

strong intermediate weak weak intermediate strong

1- -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
Example:
A sample of 6 children was selected, data about their age in years and
weight in kilograms was recorded as shown in the following table . It is
required to find the correlation between age and weight.

Weight (Kg) Age serial


(years) No
12 7 1
8 6 2
12 8 3
10 5 4
11 6 5
13 9 6

These 2 variables are of the quantitative type, one variable (Age) is


called the independent and denoted as (X) variable and the other
(weight) is called the dependent and denoted as (Y) variables
to find the relation between age and weight compute the simple
correlation coefficient using the following formula:

You need to make the following table for calculations ;


Weight Age
2 2 Serial
Y X xy (Kg) (years)
.n
(y) (x)
144 49 84 12 7 1
64 36 48 8 6 2
144 64 96 12 8 3
100 25 50 10 5 4
121 36 66 11 6 5
169 81 117 13 9 6
=y2∑ =x2∑ xy=∑ =y∑ =x∑ Total
742 291 461 66 41

r = 0.759
strong direct relation

EXAMPLE: Relationship between Anxiety and Test


Scores
Anxiety Test score (Y) X2 Y2 XY
(X)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑

Indirect strong correlation


Spearman Rank Correlation Coefficient (rs)
❑ It is a non-parametric measure of correlation.
❑ This procedure makes use of the two sets of ranks that may be
assigned to the sample values of x and Y.
❑ Spearman Rank correlation coefficient could be computed in the
following cases:
❑ Both variables are quantitative.
❑ Both variables are qualitative ordinal.
❑ One variable is quantitative and the other is qualitative ordinal.
❑ Procedure:

• Rank the values of X from 1 to n where n is the numbers of pairs of values of X


and Y in the sample.
• Rank the values of Y from 1 to n.
• Compute the value of di for each pair of observation by subtracting the rank of Yi
from the rank of Xi
• Square each di and compute ∑di2 which is the sum of the squared values.

Example
Find the correlation between the marks of English and Math.

Solution
give ranking for each subject (English and math) -1
since we have ten marks (n=10) we will rank from 1 to 10 -2
3- Give rank 1 for the highest ,2 for the second highest and so on
4-Do these for the two subjects

Note : this method can be applied for non repeated values


Solution: Make the following table;

Example
In a study of the relationship between level education and
income the following data was obtained. Find the
relationship between them and comment.

Income level education sample


(Y) (X) numbers
25 Preparatory. A
10 Primary. B
8 University. C
10 secondary D
15 secondary E
50 illiterate F
60 University. G
Important note : Now we have repeated values for X as in ,University and
secondary
; In this case the repeated must give the average of ranging as shown in table below

Rank Rank
X X (X)
5 5 Preparatory‫ﻣﺗوﺳطﺔ‬

6 6 Primary.‫اﺑﺗداﺋﻲ‬
(2+1)/2.=1.5 2 University.‫ﺟﺎﻣﻌﻲ‬
(3+4)/2=3.5 3 Secondary‫ﺛﺎﻧوي‬
(3+4)/2=3.5 4 secondary
7 7 Illiterate ‫اﻣﻲ‬
(2+1)/2.=1.5 1 university.

For the Y also we have repeated for income 10 so the same


procedure must be made

Rank Rank
Y Y (Y)
3 3 25

(5+6)/2 =5.5 5 10
7 7 8
(5+6)/2 =5.5 6 10
4 4 15
2 2 50
1 1 60
Answer:
di2 di Rank Rank
Y X (Y) (X)
4 2 3 5 25 Preparatory A

0.25 0.5 5.5 6 10 Primary. B


30.25 -5.5 7 1.5 8 University. C
4 -2 5.5 3.5 10 secondary D
0.25 -0.5 4 3.5 15 secondary E
25 5 2 7 50 illiterate F
0.25 0.5 1 1.5 60 university. G

∑ di2=64
:Comment
.There is an indirect weak correlation between level of education and income

Linear Regression
Whilst correlation predict the relation between two variables the Linear regression
is used in predictive analysis to find the best equation relates the dependent and
. independent variables
For example, want to relate the weights of individuals to their heights using a
.linear regression model
.There are several linear regression analyses available to the researcher
Simple linear regression
One dependent variable
One independent variable
Multiple linear regression
(One dependent variable
Two or more independent variables
Logistic regression
One dependent variable (binary)
Two or more independent variable(s)
Ordinal regression
One dependent variable (ordinal)
One or more independent variable(s) (nominal )
Solved example
Example 2

1-Find regression model equation


2- Make prediction of BP for some one make 6 hr exercise
3- validate the model and calculate coefficient of determination (
R square )
1- regression model

b= -2.427 , a = 143.106
Y= 143.106-2.427 X

2- Prediction

3-Model validation
3-Model validation

Note: Values of R2 between 0.9 to 1.00 represent a


good validation of regression model to best fit data

You might also like