Ch. 7: Scatterplots, Association, and Correlation
Ch. 7: Scatterplots, Association, and Correlation
Correlation:
Measures the strength of the association between two
QUANTITATIVE variables
Between 0 and ±1 (positive association between 0 and +1;
negative association between 0 and -1)
xx y y
NO UNITS! Data is standardized: (zx , z y ) ,
s s
x y
Standardizing data makes the origin the new center of the
scatterplot and the scales on both axes the same
Correlation coefficient:
o See pg. 148-149
zx z y
o r
n 1
o Summarizes both strength and direction of a LINEAR
association between two QUANTITATIVE variables
Conditions…
o Quantitative Variables Condition: Correlation applies only
to quantitative variables. Check that you know the
variables’ units and what they measure.
o Straight Enough Condition: Must have a linear
association
o Outlier Condition: Outliers can distort the correlation
dramatically. When you see an outlier, it’s often a good
idea to report the correlations with and without the point
Correlation Properties:
The sign of a correlation coefficient gives the direction of the
association
Correlation is always between -1 and +1. Correlation can be
exactly -1 or +1, but these values are unusual in real data
because they mean that all the data points fall exactly on a
single straight line
Correlation treats x and y symmetrically. The correlation of x
with y is the same as the correlation of y with x
Correlation has no units.
Correlation is not affected by changes in the center or scale of
either variable. Changing the units or baseline of either
variable has no effect on the correlation coefficient. Correlation
depends only on the z-scores, and they are unaffected by
changes in center or scale
Correlation measures the strength of the LINEAR association
between the two variables. Variables can be strongly
associated but still have a small correlation if the association
isn’t linear
Correlation is sensitive to outliers. A single outlying value can
make a small correlation large or make a large one small