Correlation and Regression: Modeling Bivariate Relationships
Correlation and Regression: Modeling Bivariate Relationships
Modeling bivariate
relationships
Correlation and Regression
Bivariate relationships
Both variables are numerical
Response variable
a.k.a. y, dependent
Explanatory variable
Something you think might be related to the response
a.k.a. x, independent, predictor
Correlation and Regression
Graphical representations
Put response on vertical axis
Put explanatory on horizontal axis
Correlation and Regression
Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = tailL)) +
geom_point()
Correlation and Regression
Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = tailL)) +
geom_point() +
scale_x_continuous("Length of Possum Tail (cm)") +
scale_y_continuous("Length of Possum Body (cm)")
Correlation and Regression
Bivariate relationships
Can think of boxplots as sca!erplots
but with discretized explanatory variable
cut() function discretizes
Choose appropriate number of "boxes"
Correlation and Regression
Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = cut(tailL, breaks = 5))) +
geom_point()
Correlation and Regression
Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = cut(tailL, breaks = 5))) +
geom_boxplot()
CORRELATION AND REGRESSION
Lets practice!
CORRELATION AND REGRESSION
Characterizing bivariate
relationships
Correlation and Regression
Sign legibility
Correlation and Regression
NIST
Correlation and Regression
NIST 2
Correlation and Regression
Non-linear
Correlation and Regression
Fan shape
CORRELATION AND REGRESSION
Lets practice!
CORRELATION AND REGRESSION
Outliers
Correlation and Regression
Outliers
> ggplot(data = mlbBat10, aes(x = SB, y = HR)) +
geom_point()
Correlation and Regression
Add transparency
> ggplot(data = mlbBat10, aes(x = SB, y = HR)) +
geom_point(alpha = 0.5)
Correlation and Regression
Lets practice!