0% found this document useful (0 votes)
80 views23 pages

Correlation and Regression: Modeling Bivariate Relationships

This document discusses modeling bivariate relationships through correlation and regression analysis. It describes representing bivariate data using scatterplots with the response variable on the y-axis and explanatory variable on the x-axis. Methods for characterizing bivariate relationships are discussed, including analyzing the form, direction, strength, and presence of outliers in the relationship. Practical examples are provided using R code to visualize bivariate data and identify outliers.

Uploaded by

yohnnis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views23 pages

Correlation and Regression: Modeling Bivariate Relationships

This document discusses modeling bivariate relationships through correlation and regression analysis. It describes representing bivariate data using scatterplots with the response variable on the y-axis and explanatory variable on the x-axis. Methods for characterizing bivariate relationships are discussed, including analyzing the form, direction, strength, and presence of outliers in the relationship. Practical examples are provided using R code to visualize bivariate data and identify outliers.

Uploaded by

yohnnis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CORRELATION AND REGRESSION

Modeling bivariate
relationships
Correlation and Regression

Bivariate relationships
Both variables are numerical
Response variable
a.k.a. y, dependent
Explanatory variable
Something you think might be related to the response
a.k.a. x, independent, predictor
Correlation and Regression

Graphical representations
Put response on vertical axis
Put explanatory on horizontal axis
Correlation and Regression

Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = tailL)) +
geom_point()
Correlation and Regression

Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = tailL)) +
geom_point() +
scale_x_continuous("Length of Possum Tail (cm)") +
scale_y_continuous("Length of Possum Body (cm)")
Correlation and Regression

Bivariate relationships
Can think of boxplots as sca!erplots
but with discretized explanatory variable
cut() function discretizes
Choose appropriate number of "boxes"
Correlation and Regression

Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = cut(tailL, breaks = 5))) +
geom_point()
Correlation and Regression

Sca!erplot
> ggplot(data = possum, aes(y = totalL, x = cut(tailL, breaks = 5))) +
geom_boxplot()
CORRELATION AND REGRESSION

Lets practice!
CORRELATION AND REGRESSION

Characterizing bivariate
relationships
Correlation and Regression

Characterizing bivariate relationships


Form (e.g. linear, quadratic, non-linear)
Direction (e.g. positive, negative)
Strength (how much sca!er/noise?)
Outliers
Correlation and Regression

Sign legibility
Correlation and Regression

NIST
Correlation and Regression

NIST 2
Correlation and Regression

Non-linear
Correlation and Regression

Fan shape
CORRELATION AND REGRESSION

Lets practice!
CORRELATION AND REGRESSION

Outliers
Correlation and Regression

Outliers
> ggplot(data = mlbBat10, aes(x = SB, y = HR)) +
geom_point()
Correlation and Regression

Add transparency
> ggplot(data = mlbBat10, aes(x = SB, y = HR)) +
geom_point(alpha = 0.5)
Correlation and Regression

Add some ji!er


> ggplot(data = mlbBat10, aes(x = SB, y = HR)) +
geom_point(alpha = 0.5, position = "jitter")
Correlation and Regression

Identify the outliers


> mlbBat10 %>%
filter(SB > 60 | HR > 50) %>%
select(name, team, position, SB, HR)

## name team position SB HR


## 1 J Pierre CWS OF 68 1
## 2 J Bautista TOR OF 9 54
CORRELATION AND REGRESSION

Lets practice!

You might also like