Chapt04 BPS
Chapt04 BPS
Scatterplots and
Correlation
04/07/24 Chapter 4 1
Explanatory Variable
and Response Variable
• Correlation describes linear relationships
between quantitative variables
• X is the quantitative explanatory variable
• Y is the quantitative response variable
• Example: The correlation between per
capita gross domestic product (X) and life
expectancy (Y) will be explored
04/07/24 Chapter 4 2
Data (data file = gdp_life.sav)
Country Per Capita GDP (X) Life Expectancy
(Y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
04/07/24 Chapter 4 3
Scatterplot: Bivariate points (x , y ) i i
79.5
78.0
77.5
77.0
LIFE_EXP
76.5
76.0
18 19 20 21 22 23 24
GDP
04/07/24 Chapter 4 4
Interpreting Scatterplots
• Form: Can relationship be described by
straight line (linear)? ..by a curved line? etc.
• Outliers?: Any deviations from overall
pattern?
• Direction of the relationship either:
– Positive association (upward slope)
– Negative association (downward slope)
– No association (flat)
• Strength: Extent to which points adhere to
imaginary trend line
04/07/24 Chapter 4 5
Example: Interpretation
Here is the scatterplot we saw earlier:
This is the data point for
Switzerland (23.8, 78.99)
79.5
79.0 Interpretation:
78.5
• Form: linear (straight)
78.0
76.5
04/07/24 Chapter 4 6
Example 2
Interpretation
• Form: linear
• Outliers: none
• Direction: positive
• Strength: difficult to
judge by eye (looks
strong)
04/07/24 Chapter 4 7
Example 3
• Form: linear
• Outliers: none
• Direction: negative
• Strength: difficult to
judge by eye (looks
moderate)
04/07/24 Chapter 4 8
Example 4
• Form: linear(?)
• Outliers: none
• Direction: negative
• Strength: difficult to
judge by eye (looks
weak)
04/07/24 Chapter 4 9
Interpreting Scatterplots
• Form: curved
• Outliers: none
• Direction: U-shaped
• Strength: difficult to
judge by eye (looks
moderate)
04/07/24 Chapter 4 10
Correlational Strength
• It is difficult to judge
correlational strength by
eye alone
• Here are identical data
plotted on differently
axes
• First relationship seems
weaker than second
• This is an artifact of the
axis scaling
• We use a statistical
called the correlation
coefficient to judge
strength objectively
04/07/24 Chapter 4 11
Correlation coefficient (r)
• r ≡ Pearson’s correlation coefficient
• Always between −1 and +1 (inclusive)
r = +1 all points on upward sloping line
r = -1 all points on downward line
r = 0 no line or horizontal line
The closer r is to +1 or –1, the stronger the
correlation
04/07/24 Chapter 4 12
Interpretation of r
• Direction: positive, negative, ≈0
• Strength: the closer |r| is to 1, the stronger the
correlation
0.0 |r| < 0.3 weak correlation
0.3 |r| < 0.7 moderate correlation
0.7 |r| < 1.0 strong correlation
|r| = 1.0 perfect correlation
04/07/24 Chapter 4 13
04/07/24 Chapter 4 14
More Examples of
Correlation Coefficients
• Husband’s age / Wife’s age
• r = .94 (strong positive correlation)
• Husband’s height / Wife’s height
• r = .36 (weak positive correlation)
• Distance of golf putt / percent success
• r = -.94 (strong negative correlation)
04/07/24 Chapter 4 15
Calculating r by hand
• Calculate mean and standard deviation of X
• Turn all X values into z scores
• Calculate mean and standard deviation of Y
• Turn all Y values into z scores
• Use formula on next page
04/07/24 Chapter 4 16
Correlation coefficient r
n
1
r
n - 1 i 1
z X zY
where xi x
zX
sx
yi y
zY
sy
04/07/24 Chapter 4 17
Example: Calculating r
ZY ZX ∙ ZX
X Y ZX
21.4 77.48 -0.078 -0.345 0.027
23.2 77.53 1.097 -0.282 -0.309
20.0 77.32 -0.992 -0.546 0.542
22.7 78.63 0.770 1.102 0.849
20.8 77.17 -0.470 -0.735 0.345
18.6 76.39 -1.906 -1.716 3.271
21.5 78.51 -0.013 0.951 -0.012
22.0 78.15 0.313 0.498 0.156
23.8 78.99 1.489 1.555 2.315
21.2 77.37 -0.209 -0.483 0.101
7.285
Notes: x-bar= 21.52 sx =1.532;
y-bar= 77.754; sy =0.795
04/07/24 Chapter 4 18
Example: Calculating r
1 n x i x y i y
r
n - 1 i 1 s x s y
1
(7.285)
10 1
0.809
04/07/24 Chapter 4 19
Calculating r
Check calculations with calculator or applet.
Data entry screen of the two variable Applet
TI two-variable
that comes with the text
calculator
04/07/24 Chapter 4 20
Beware!
• r applies to linear relations only
• Outliers have large influences on r
• Association does not imply
causation
04/07/24 Chapter 4 21
Nonlinear relationships
• Figure shows :miles
per gallon” versus
“speed” (“car data” n
= 10)
• r 0; but this is
misleading because
there is a strong non- 35
30
shape relationship 15
10
5
0
0 50 100
speed
04/07/24 Chapter 4 22
Outliers Can Have a Large
Influence
Outlier
04/07/24 Chapter 4 25
Additional Practice : IQ and grades
(a) Positive or negative
association?
(b) Is form linear?
(c) Does correlation
strong?
(d) What is the IQ and
GPA for the outlier
on the bottom there?
04/07/24 Chapter 4 26