Correlation & Regression
Correlation & Regression
Pratap L Jadhav
Assistant professor Statistics and Demography
Department of Community Medicine
Seth G. S. Medical College & KEM Hospital, Mumbai
Introduction: Correlation
➢Sometimes two continuous variables are measured in same person
such as height and weight, temperature and pulse rate, age and
weight age and height etc.
➢At other times the same character (variable) is measured in two
related groups such as tallness in parents and tallness in children,
intelligent quotient (IQ) in brothers and in corresponding sisters
(siblings).
➢The relationship or association between two quantitatively measured
or continuous variable is called correlation
➢Correlation is the statistical measure for finding out degree (strength)
of association between two variables.
➢By “association” we mean the tendency of the variables to move
together.
Introduction: Correlation
• If the two variables X and Y are so related that movements (or
variation) in one tends to be accompanied by the corresponding
movements(or variation) in the other and X and Y are said to be
correlated.
• The movements (variation) may be in the same direction (i.e. either
both X and Y increases or both of them decreases called as directly
proportional) or in opposite direction (i.e. one X increases and other,
Y decreases called as inversely proportional)
• Correlation is said be Positive or negative according as these
movements are in the same or in the opposite directions.
• If Y is unaffected by any change in X, then X and Y are said to be
uncorrelated.
Introduction: Correlation
• Correlation may be Linear or Non-linear .
• If the amount of variation in X bears a constant ratio to the
corresponding amount of variation in Y, then correlation
between X and Y said to be linear. Otherwise it is nonlinear
• The degree or extent of relationship between two variables
is measured by Karl Pearson’s coefficient of correlation or
simply called as correlation coefficient and denoted by “r”
• The extent or degree of correlation varies between -1 and +1
i.e. -1≤ r ≥ +1
Types of correlation
➢There are five types of correlation depending on its extent and
direction
1) Perfect positive correlation : in this, two variables X and Y are
directly proportional and fully correlated with each other. (i.e. r=
+1) in hypothetical condition only it exist. In nature there is not a
single example of perfectly positive correlation but some variables
approaches toward it. (e.g. height and weight up to certain age)
2) Perfectly negative Correlation: in this relationship two variables X
and Y are inversely proportional to each other and r= -1. similar to
perfect positive correlation it exist in hypothetical situations. (e.g.
Pressure applied and Volume of gas )
3) Partial positive correlation: in this relationship X and Y moves in
same direction but not like perfectly positive the value of r lies
between 0 and 1 i.e. 0 < r < +1 (e.g. weight and cholesterol level)
Types of correlation
4) Partial negative correlation : in this relationship the two variables
are moves in opposite direction in some extent. i.e. r value lies
between -1 to 0 ( -1 < r < 0) ( e.g. working men hours and
duration to complete task)
5) Absolutely No correlation : if there is no movement in specific
direction and movements are haphazard and indicating no linear
relationship exist between two variables. i.e. r = 0
➢Correlation between two variables may be determined by any of the
following method
1) Scatter diagram
2) Covariance method or Karl Pearson method
3) Ranked method / Spearman (beyond Scope )
Scatter Diagram
Scatter Diagram Showing correlation between Height • The existence of correlation can be
and Weight shown graphically means of Scatter
80 diagram. Statistical data relating to
simultaneous movement
75
(variation)of two variables can be
70
represented by dots.
65
• One of the two variable
X(independent variable)is taken
Weight (kg)
𝑟 = ± 𝒃𝒚𝒙 × 𝒃𝒙𝒚
➢r is positive when 𝑏𝑦𝑥 is positive and r is negative when 𝑏𝑦𝑥 is
negative
➢r, 𝒃𝒚𝒙 and 𝒃𝒙𝒚 are simultaneously of same sign.
Exercise no 14.2 page no- 88
➢Using usual notation given 8
•𝑟= = 0.9
64×100
N= 10, σ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത = 72
(There exist strong Positive Correlation)
σ 𝑥 − 𝑥ҧ 2 = 64 σ 𝑦 − 𝑦ത 2 = 100
• The test of significance for “r” is given
Find correlation coefficient (r ) and its by
significance.
𝑁−2 8
H0: The correlation coefficient(r) 𝑡𝑛−2 = 𝑟 ∗ = 0.9 × = 5.84
1−𝑟 2 1−0.81
between two variables X and Y = 0
• T tabulated at 8 degrees of freedom
H1: The correlation coefficient(r) and 5 % level of significance = 2.31
between two variables X and Y ≠ 0
• Since T calculated > T tabulated reject
The correlation coefficient (r ) is null hypothesis and accept Alternative
calculate using formula hypothesis there is significant
σ 𝑋 − 𝑋ത 𝑌 − 𝑌ത correlation between X and Y
𝑟=
σ 𝑋 − 𝑋ത 2σ 𝑌 − 𝑌ത 2
Exercise no 14.3 Page no:
• The estimate of Y for given X using
➢Correlation coefficient between age (x) in
years and Systolic blood pressure (y) in mm linear regression equation of y on x
of Hg is 0.8, mean age is 50 years. Mean (𝑏𝑦𝑥 ) is 𝑌 − 𝑌ത = 𝑏𝑦𝑥 𝑋 − 𝑋ത
systolic blood pressure is 130 mm of Hg. (y- 130) = 1 x (55 – 50)
Standard deviation of age is 8 yrs.,
standard deviation of blood pressure is 10 Y= 135
mm of Hg. Find regression equation of Y
on X and estimate systolic blood pressure
So for a person whose age is 55
for a person whose age is 55 yrs. estimated value of systolic blood
pressure = 135 mm of Hg.
➢Given r= 0.8 𝑥ҧ = 50 𝑎𝑛𝑑 𝑦ത = 130 σx = 8
σy= 10 have to calculate 𝒃𝒚𝒙 and estimate
value of Y for x= 55.
𝜎𝑦 10
➢𝑏𝑦𝑥 = 𝑟 × = 0.8 × = 1 ……….(1)
𝜎𝑥 8
Chapter no 15 & 16
Pratap L Jadhav
Assistant Professor Statistics & Demography
Department of Community Medicine
Seth G.S. Medical College & K.E.M. Hospital
Corrections in the formulae in chapter 15
• Corrected formulae for Chapter no 15 (page no 92 )
σ𝑤2
𝑤1 𝐴𝑆𝐹𝑅
1) 𝑇𝑜𝑡𝑎𝑙 𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑡𝑦 𝑅𝑎𝑡𝑒 𝑇𝐹𝑅 = 1000
Xh
Where h = Length of class interval ASFR = Age Specific Fertility rate
σ𝑤2
𝑤1 𝐴𝑆𝐹𝑅 𝑓𝑜𝑟 𝐹𝑒𝑚𝑎𝑙𝑒 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠
2) 𝐺𝑟𝑜𝑠𝑠 𝑅𝑒𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑅𝑎𝑡𝑒 𝐺𝑅𝑅 = ×ℎ
1000
Where h = Length of class interval ASFR = Age Specific Fertility rate
Additional formula for example 15.2
σ 𝑃𝑠 ×𝐷𝑥
3) Standardized Death Rate = SDR 𝑆𝐷𝑅 = σ 𝑃𝑠
Where Ps= Population of Standard Locality Dx= Death rate of Comparable population
4) SDR of Standard Population = Crude Death rate of Standard Population
Example 15.2 Page no 95
• In the following table population of locality A and B of different age
groups together with age specific death rates are given
• Taking locality A as standard Population find standardized death rate
of locality A and B separately and hence find which of the two
localities A and B is healthier? Also find Crude death rate of Locality B.
Age group Locality A Locality B
Population Death/1000 Population Death/1000
(Ps) (Ds) (Px) (Dx)
0-10 600 30 400 40
10-20 1000 05 1500 04
20-60 3000 08 2400 10
60 and 400 50 700 30
above
Example 15.2 Page no 95
𝑁𝑜 𝑜𝑓 𝐷𝑒𝑎𝑡ℎ𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑌𝑒𝑎𝑟
• Crude death rate = × 1000
𝑀𝑖𝑑−𝑦𝑒𝑎𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
67 67
CDR (A)= ∗ 1000 =13.4/1000 population CDR (B)= ∗ 1000 =13.4/1000 population
5000 5000
σ 𝑃𝑠 ×𝐷𝑥 70000
SDR (A) = CDR (A) = 13.4/1000 population 𝑆𝐷𝑅(𝐵) = σ 𝑃𝑠
= = 14/1000 population
5000
• Since SDR (A) < SDR (B) Population A is healthier compared to Population B
Unsolved exercises
➢From Exercise no 15.3 onwards use Appropriate formula for
the given problem and solve with appropriate unit.
➢Unit of the indicator should be the unit of the quantity which is in the
denominator
➢For Total fertility Rate (TFR) unit is = ____births/ Female ( it can be
described as child bearing capacity of the female during her reproductive
age group)
➢For Gross Reproduction Rate (GRR) unit is = __Female births/ Female
(Described as No of female births per Female during her reproductive age
group.)
➢In Chapter no 16, All are statistical Fallacies and we have to
disagree with the statements with logical reasoning using
appropriate statistical measures.