0% found this document useful (0 votes)

63 views31 pages

Psyteachr Github Io Stat-Models-V1 Correlation-And-Regression Html#exercises

Correlation matrices summarize relationships between multiple measurements from the same individuals. They show the strength and direction of relationships between pairs of variables on a scale of -1 to 1. Higher positive values indicate stronger positive relationships, while more negative values represent stronger negative relationships. The document demonstrates how to calculate and visualize correlations using the starwars dataset in R, including removing outliers to better understand the relationships.

Uploaded by

Zaul De los santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views31 pages

Psyteachr Github Io Stat-Models-V1 Correlation-And-Regression Html#exercises

Uploaded by

Zaul De los santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

2 Correlation and regression

2.1 Correlation matrices

You may be familiar with the concept of a correlation matrix from reading papers in
psychology. Correlation matrices are a common way of summarizing relationships
between multiple measurements taken from the same individual.

Let's say you've measured psychological well-being using multiple scales. One
question is the extent to which these scales are measuring the same thing. Often you
will look at a correlation matrix to explore all the pairwise relationships between
measures.

Recall that a correlation coefficient quantifies the strength and direction of a

relationship between two variables. It is usually represented by the symbol r or ρ
(Greek letter "rho"). The correlation coefficient ranges between -1 and 1, with 0
corresponding to no relationship, positive values reflecting a positive relationship (as
one variable increases, so does the other), and negative values reflecting a negative
relationship (as one variable increases, the other decreases).

Loading [MathJax]/jax/output/CommonHTML/jax.js
PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.1: Different types of bivariate relationships.

If you have n measures, how many pairwise correlations can you compute? You can
figure this out either by the formula in the info box below, or more easily you can
computed it directly through the choose(n, 2) function in R. For instance, to get the
number of possible pairwise correlations between 6 measures, you'd type choose(6,
2) , which tells you that you have 15 combinations.

n!
For any n measures, you can calculate 2 ( n − 2 ) ! pairwise correlations
between measures. The ! symbol is called the factorial operator, defined
as the product of all numbers from 1 to n. So, if you have six
measurements, you have

6! 1×2×3×4×5×6 720
= = = 15
2(6 − 2) ! 2(1 × 2 × 3 × 4) 2(24)
PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
You can create a correlation matrix in R using base::cor() or
corrr::correlate() . We prefer the latter function because cor() requires that
your data is stored in a matrix, whereas most of the data we will be working with is
tabular data stored in a data frame. The corrr::correlate() function takes a data
frame as the first argument, and provides "tidy" output, so it integrates better with
tidyverse functions and pipes ( %>% ).

Let's create a correlation matrix to see how it works. Start by loading in the packages
we will need.

library("tidyverse")
library("corrr") # install.packages("corrr") in console if missing

We will use the starwars dataset, which is a built-in dataset that becomes available
after you load the tidyverse package. This dataset has information about various
characters that have appeared in the Star Wars film series. Let's look at the
correlation between

starwars %>%
select(height, mass, birth_year) %>%
correlate()

## Correlation computed with

## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

## # A tibble: 3 × 4
## term height mass birth_year
## <chr> <dbl> <dbl> <dbl>
## 1 height NA 0.134 -0.400

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
## 2 mass 0.134 NA 0.478
## 3 birth_year -0.400 0.478 NA

You can look up any bivariate correlation at the intersection of any given row or
column. So the correlation between height and mass is .134, which you can find in
row 1, column 2 or row 2, column 1; the values are the same. Note that there are only
choose(3, 2) = 3 unique bivariate relationships, but each appears twice in the
table. We might want to show only the unique pairs. We can do this by appending
corrr::shave() to our pipeline.

starwars %>%
select(height, mass, birth_year) %>%
correlate() %>%
shave()

## Correlation computed with

## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

## # A tibble: 3 × 4
## term height mass birth_year
## <chr> <dbl> <dbl> <dbl>
## 1 height NA NA NA
## 2 mass 0.134 NA NA
## 3 birth_year -0.400 0.478 NA

Now we've only got the lower triangle of the correlation matrix, but the NA values are
ugly and so are the leading zeroes. The corrr package also provides the
fashion() function that cleans things up (see ?corrr::fashion for more
options).

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
starwars %>%
select(height, mass, birth_year) %>%
correlate() %>%
shave() %>%
fashion()

## Correlation computed with

## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

## term height mass birth_year

## 1 height
## 2 mass .13
## 3 birth_year -.40 .48

Correlations only provide a good description of the relationship if the relationship is

(roughly) linear and there aren't severe outliers that are wielding too strong of an
influence on the results. So it is always a good idea to visualize the correlations as
well as to quantify them. The base::pairs() function does this. The first argument
to pairs() is simply of the form ~ v1 + v2 + v3 + ... + vn where v1 , v2 , etc.
are the names of the variables you want to correlate.

pairs(~ height + mass + birth_year, starwars)

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.2: Pairwise correlations for the starwars dataset

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
We can see that there is a big outlier influencing our data; in particular, there is a
creature with a mass greater than 1200kg! Let's find out who this is and eliminate
them from the dataset.

starwars %>%
filter(mass > 1200) %>%
select(name, mass, height, birth_year)

## # A tibble: 1 × 4
## name mass height birth_year
## <chr> <dbl> <int> <dbl>
## 1 Jabba Desilijic Tiure 1358 175 600

OK, let's see how the data look without this massive creature.

starwars2 <- starwars %>%

filter(name != "Jabba Desilijic Tiure")

pairs(~height + mass + birth_year, starwars2)

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.3: Pairwise correlations for the starwars dataset after removing outlying mass
value.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Better, but there's a creature with an outlying birth year that we might want to get rid
of.

starwars2 %>%
filter(birth_year > 800) %>%
select(name, height, mass, birth_year)

## # A tibble: 1 × 4
## name height mass birth_year
## <chr> <int> <dbl> <dbl>
## 1 Yoda 66 17 896

It's Yoda. He's as old as the universe. Let's drop him and see how the plots look.

starwars3 <- starwars2 %>%

filter(name != "Yoda")

pairs(~height + mass + birth_year, starwars3)

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.4: Pairwise correlations for the starwars dataset after removing outlying mass and
birth_year values.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
That looks much better. Let's see how that changes our correlation matrix.

starwars3 %>%
select(height, mass, birth_year) %>%
correlate() %>%
shave() %>%
fashion()

## Correlation computed with

## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

## term height mass birth_year

## 1 height
## 2 mass .74
## 3 birth_year .45 .24

Note that these values are quite different from the ones we started with.

Sometimes it's not a great idea to remove outliers. Another approach to dealing with
outliers is to use a robust method. The default correlation coefficient that is
computed by corrr::correlate() is the Pearson product-moment correlation
coefficient. You can also compute the Spearman correlation coefficient by changing
the method() argument to correlate() . This replaces the values with ranks before
computing the correlation, so that outliers will still be included, but will have
dramatically less influence.

starwars %>%
select(height, mass, birth_year) %>%
correlate(method = "spearman") %>%

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
shave() %>%
fashion()

## Correlation computed with

## • Method: 'spearman'
## • Missing treated using: 'pairwise.complete.obs'

## term height mass birth_year

## 1 height
## 2 mass .75
## 3 birth_year .16 .15

Incidentally, if you are generating a report from R Markdown and want your tables to
be nicely formatted you can use knitr::kable() .

starwars %>%
select(height, mass, birth_year) %>%
correlate(method = "spearman") %>%
shave() %>%
fashion() %>%
knitr::kable()

term height mass birth_year

height

mass .75

birth_year .16 .15

2.2 Simulating bivariate data

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
You have already learned how to simulate data from the normal distribution using the
function rnorm() . Recall that rnorm() allows you to specify the mean and
standard deviation of a single variable. How do we simulate correlated variables?

It should be clear that you can't just run rnorm() twice and combine the variables,
because then you end up with two variables that are unrelated, i.e., with a correlation
of zero.

The package MASS provides a function mvrnorm() which is the 'multivariate' version
of rnorm (hence the function name, mv + rnorm , which makes it easy to remember.

The MASS package comes pre-installed with R. But the only function you'll probably
ever want to use from MASS is mvrnorm() , so rather than load in the package using
library("MASS") , it is preferable to use MASS::mvrnorm() , especially as MASS and
the dplyr package from tidyverse don't play well together, due to both packages
having the function select() . So if you load in MASS after you load in tidyverse ,
you'll end up getting the MASS version of select() instead of the dplyr version. It
will do your head in trying to figure out what is wrong with your code, so always use
MASS::mvrnorm() without loading library("MASS") .

MASS before dplyr, clashes not dire;

dplyr before MASS, pain in the ass. #rstats pic.twitter.com/vHIbGwSKd8

— Dale Barr ((???)) September 30, 2014

Have a look at the documentation for the mvrnorm() function (type ?

MASS::mvrnorm in the console).

There are three arguments to take note of:

arg description

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
n the number of samples required

mu a vector giving the means of the variables

Sigma a positive-definite symmetric matrix specifying the covariance matrix of the

variables.

The descriptions for n and mu are understandable, but what is a "positive-definite

symmetric matrix specifying the covariance"?

When you have multivariate data, the covariance matrix (also known as the variance-
covariance matrix) specifies the variances of the individual variables and their
interrelationships. It is like a multidimensional version of the standard deviation. To
fully describe a univariate normal distribution, you need to know only the mean and
standard deviation; to describe a bivariate normal distribution, you need the means
of each of the two variables, their standard deviations, and their correlation; for a
multivariate distribution with more the two variables you need the means for all of
the variables, their standard deviations, and all of the possible pairwise correlations.
These concepts become very important once we start talking about mixed-effects
modelling.

You can think of a covariance matrix as something like the correlation matrix that
you saw above; indeed, with a few calculations you can turn a covariance matrix into
a correlation matrix.

What's all this talk about the Matrix? Wasn't that a sci-fi film series from
the 1990s?

In mathematics, matrices are just generalizations of the concept of a

vector: a vector can be thought of as having one dimension, whereas a
matrix can have any number of dimensions.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
So the matrix

( )
1 4 7
2 5 8
3 6 9

( )( )
1 4
is a 3 (row) by 3 (column) matrix containing the column vectors 2 , 5 ,
3 6

()
7
and 8 . Conventionally, we refer to matrices in i by j format, with i being
9
the number of rows and j being the number of columns. So a 3x2 matrix
has 3 rows and 2 columns, like so.

( )
a d
b e
c f

A square matrix is a matrix where the number of rows is equal to the

number of columns.

You can create the above matrix in R using the matrix() function (see
below) or by binding together vectors using the base R cbind() and
rbind() , which bind vectors together column-wise and row-wise,
respectively. Try cbind(1:3, 4:6, 7:9) in your console.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Now what is all this about the matrix being "positive-definite" and "symmetric"?
These are mathematical requirements about the kinds of matrices that can represent
possible multivariate normal distributions. In other words, the covariance matrix you
supply must represent a legal multivariate normal distribution. At this point, you don't
really need to know much more about this than that.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
plot math code help about

5 y
4

1
x
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
−1

−2

−3

−4

−5

new sample reset

+0.00
1.00
1.00
+0.00
+0.00

Let's start by simulating data representing hypothetical humans and their heights
and weights. We know these things are correlated. What we need to be able to
simulate data are means and standard deviations for these two variables and their
correlation.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
I found some data here which I converted into a CSV file. If you want to follow along,
download the file heights_and_weights.csv. Here's how the scatterplot looks:

handw <- read_csv("data/heights_and_weights.csv", col_types = "dd")

ggplot(handw, aes(height_in, weight_lbs)) +

geom_point(alpha = .2) +
labs(x = "height (inches)", y = "weight (pounds)")

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.5: Heights and weights of 475 humans (including infants)

Now, that's not quite a linear relationship. We can make it into one by log
transforming each of the variables first.

handw_log <- handw %>%

mutate(hlog = log(height_in),
wlog = log(weight_lbs))

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.6: Log transformed heights and weights.

The fact that there is a big cluster of points in the top right tail of the cloud probably
indicates that there were more adults than kids in this sample, since adults are taller
and heavier.

The mean log height is 4.11 (SD = 0.26), while the mean log weight is 4.74 (SD =
0.65). The correlation between log height and log weight, which we can get using the
cor() function, is very high, 0.96.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
We now have all the information we need to simulate the heights and weights of, let's
say, 500 humans. But how do we get this information into MASS::mvrnorm() ? We
know the first part of the function call will be MASS::mvrnorm(500, c(4.11, 4.74),
...) , but what about Sigma , the covariance matrix? We know from above that
σ̂ x = 0.26 and σ̂ y = 0.65, and ρ̂ xy = 0.96.

A covariance matrix representating Sigma (Σ) for bivariate data has the following
format:

Σ=
( σx2

ρ yxσ yσ x
ρ xyσ xσ y

σy2 )
The variances (squared standard deviations, σ x 2 and σ y 2) are in the diagonal, and the
covariances (the correlation times the two standard deviations, ρ xyσ xσ y) are in the off-
diagonal. It is useful to remember that the covariance is just the correlation times the
product of the two standard deviations. As we saw above with the correlation
matrices, there is redundant information in the table; namely, the covariance appears
in the top right cell as well as the bottom left cell of the matrix.

So plugging in the values we got above, our covariance matrix should be

( .26 2
(.96)(.65)(.26)
(.96)(.26)(.65)
.65 2 )( =
.067
.162
.162
.423 )
OK, how do we form Sigma in R so that we can pass it to the mvrnorm() function?
We will use the matrix() function, as shown below.

First let's define our covariance and store it in the variable my_cov .

my_cov <- .96 * .26 * .65

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Now we'll use matrix() to define our Sigma , my_Sigma .

my_Sigma <- matrix(c(.26^2, my_cov, my_cov, .65^2), ncol = 2)

my_Sigma

## [,1] [,2]
## [1,] 0.06760 0.16224
## [2,] 0.16224 0.42250

Confused about the matrix() function?

The first argument is a vector of values, which we created above using

c() . The ncol argument specifies how many columns the matrix should
have. There is also an nrow argument that you could use, but if you give it
one of the dimensions, it can infer the size of the other using the length of
the vector supplied as the first argument.

You can see that matrix() fills in the elements of the matrix column by
column, rather than row by row by running the following code:

matrix(c("a", "b", "c", "d"), ncol = 2)

If you want to change this behavior, set the byrow argument to TRUE .

matrix(c("a", "b", "c", "d"), ncol = 2, byrow = TRUE)

Great. Now that we've got my_Sigma , we're ready to use MASS::mvrnorm() . Let's
test it out by creating 6 synthetic humans.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
set.seed(62) # for reproducibility

# passing the named vector c(height = 4.11, weight = 4.74)

# for mu gives us column names in the output
log_ht_wt <- MASS::mvrnorm(6,
c(height = 4.11, weight = 4.74),
my_Sigma)

log_ht_wt

## height weight
## [1,] 4.254209 5.282913
## [2,] 4.257828 4.895222
## [3,] 3.722376 3.759767
## [4,] 4.191287 4.764229
## [5,] 4.739967 6.185191
## [6,] 4.058105 4.806485

So MASS::mvrnorm() returns a matrix with a row for each simulated human, with
the first column representing the log height and the second column representing the
log weight. But log heights and log weights are not very useful to us, so let's
transform them back by using exp() , which is the inverse of the log() transform.

exp(log_ht_wt)

## height weight
## [1,] 70.40108 196.94276
## [2,] 70.65632 133.64963
## [3,] 41.36254 42.93844
## [4,] 66.10779 117.24065
## [5,] 114.43045 485.50576
## [6,] 57.86453 122.30092
PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
So our first simulated human is 70.4 inches tall (about 5'5" or X) and weighs 196.94
pounds (89.32 kg). Sounds about right! (Note also that it will generate observations
outside of our original data; we'll get super tall humans, like observation 5, but at
least the weight/height relationship will be preserved.)

OK, let's randomly generate a bunch of humans, transform them from log to inches
and pounds, and plot them against our original data to see how we're doing.

## simulate new humans

new_humans <- MASS::mvrnorm(500,
c(height_in = 4.11, weight_lbs = 4.74),
my_Sigma) %>%
exp() %>% # back-transform from log to inches and pounds
as_tibble() %>% # make tibble for plotting
mutate(type = "simulated") # tag them as simulated

## combine real and simulated datasets

## handw is variable containing data from heights_and_weights.csv
alldata <- bind_rows(handw %>% mutate(type = "real"),
new_humans)

ggplot(alldata, aes(height_in, weight_lbs)) +

geom_point(aes(colour = type), alpha = .1)

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.7: Real and simulated humans.

You can see that our simulated humans are much like the normal ones, except that
we are creating humans outside the normal range of heights and weights.

2.3 Relationship between correlation and regression

OK, we know how to estimate correlations, but what if we wanted to predict the
weight of someone given their height? This might sound like an impractical problem,
but in fact, emergency medical technicians can use this technique to get a quick
PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
estimate of people's weights in emergency situations where they need to administer
drugs or procedures whose safety depends on the patient's weight, and don't have
time to weigh them.

Recall that the GLM for a simple regression model is

Y i = β 0 + β 1X i + e i.

Here, we are trying to predict the weight (Y i) of person i from their observed height (
X i). In this equation, β 0 and β 1 are the y-intercept and slope parameters respectively,
and the e is are the residuals. It is conventionally assumed that the e i values are from
a normal distribution with mean of zero and variance σ 2; the math-y way of saying
this is e i ∼ N(0, σ 2), where ∼ is read as "distributed according to" and N(0, σ 2)
means "Normal distribution (N) with mean of 0 and variance of σ 2".

It turns out that if we have estimates of the means of X and Y (denoted by μ x and μ y
respectively), of their standard deviations (σ̂ x and σ̂ y), and the correlation between X
and Y (ρ̂), we have all the information we need to estimate the parameters of the
regression equation β 0 and β 1. Here's how.

First, the slope of the regression line β 1 equals the correlation coefficient ρ times the
ratio of the standard deviations of Y and X.

σY
β1 = ρ
σX

Given the estimates above for log height and weight, can you solve for β 1?

b1 <- .96 * (.65 / .26)

Learning Statistical Models On this page

Through Simulation in R
PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
2 Correlation and
oug S u at o 2 Correlation and
## [1] 2.4 regression
Search
2.1 Correlation
The next thing to note is that for mathematical reasons, the regression line is matrices
Table of contents
guaranteed to go through the point corresponding to the mean of X and the mean of
Overview 2.2 Simulating
Y, i.e., the point (μ x, μ y). (You can think of the regression line "pivoting" around that bivariate data
1 Introduction point depending on the slope). You also know that β 0 is the y-intercept, the point
2.3 Relationship
2 Correlation and regression where the line crosses the vertical axis at X = 0. From this information, and the between correlation
3 Multiple regression estimates above, can you figure out the value of β 0? and regression
4 Interactions 2.4 Exercises
Here is the reasoning by which you can solve for β 0.
5 Introducing linear mixed-effects
models The β 1 value tells you that for each change in X you have a corresponding change of View source 
6 Linear mixed-effects models with 2.4 for Y, and you know that the line goes through the points (μ x, μ y) as well as the y- Edit this page 
one random factor
intercept (0, β 0).
7 Linear mixed-effects models with
crossed random factors Think about stepping back unit-by-unit from X = μ x to X = 0. At X = μ x, Y = 4.74. Each
8 Generalized linear mixed-effects unit step you take backward in the X dimension, Y will drop by β 1 = 2.4 units. When
models
you get to zero, Y will have dropped from μ y to μ y − μ xβ 1.
9 Modeling Ordinal Data
So the general solution is: β 0 = μ y − μ xβ 1.
Appendix

A Symbols Since β 1 = 2.4, μ x = 4.11, and μ y = 4.74, β 0 = − 5.124. Thus, our regression equation is:
B Bibliography
Y i = − 5.124 + 2.4X i + e i.

View book source  To check our results, let's first run a regression on the log-transformed data using
lm() , which estimates parameters using ordinary least squares regression.

summary(lm(wlog ~ hlog, handw_log))

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
##
## Call:
## lm(formula = wlog ~ hlog, data = handw_log)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.63296 -0.09915 -0.01366 0.09285 0.65635
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.26977 0.13169 -40.02 <2e-16 ***
## hlog 2.43304 0.03194 76.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1774 on 473 degrees of freedom
## Multiple R-squared: 0.9246, Adjusted R-squared: 0.9245
## F-statistic: 5802 on 1 and 473 DF, p-value: < 2.2e-16

Looks pretty close. The reason that it doesn't match exactly is only because we've
rounded off our estimates to two decimal places for convenience.

As another check, let's superimpose the regression line we computed by hand on the
scatterplot of the log-transformed data.

ggplot(handw_log, aes(hlog, wlog)) +

geom_point(alpha = .2) +
labs(x = "log(height)", y = "log(weight)") +
geom_abline(intercept = -5.124, slope = 2.4, colour = 'blue')

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Figure 2.8: Log transformed values with superimposed regression line.

Looks right.

To close, here are a few implications from the relationship between correlation and
regression.

β 1 = 0 is the same as ρ = 0.

β 1 > 0 implies ρ > 0, since standard deviations can't be negative.

β 1 < 0 implies ρ < 0, for the same reason.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
Rejecting the null hypothesis that β 1 = 0 is the same as rejecting the null
hypothesis that ρ = 0. The p-values that you get for β 1 in lm() will be the same
as the one you get for ρ from cor.test() .

2.4 Exercises

Consider the variance-covariance matrix below.

Enter your answers to two decimal places.

check my answers

new problem

« 1 Introduction 3 Multiple regression »

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!
"Learning Statistical Models Through Simulation in R" was written by
Dale J. Barr. It was last built on 2022-10-06.

PDF hecho con ❤ en https://www.htmlapdf.com. ¿Quieres convertir un sitio web completo a PDF? Mira el tutorial aquí!

Dating - After Heartbreak Matthew - Hussey TSW
100% (4)
Dating - After Heartbreak Matthew - Hussey TSW
16 pages
PerarEC-EB 0 PDF
100% (1)
PerarEC-EB 0 PDF
21 pages
2019 OPCR TARGETS-New Form
100% (2)
2019 OPCR TARGETS-New Form
8 pages
Graphic Designing Course Syllabus Online
50% (2)
Graphic Designing Course Syllabus Online
3 pages
L6 - Biostatistics - Linear Regression and Correlation
No ratings yet
L6 - Biostatistics - Linear Regression and Correlation
121 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
Chapter 6 Booklet - Bivariate Data
No ratings yet
Chapter 6 Booklet - Bivariate Data
12 pages
Correlation 2025
No ratings yet
Correlation 2025
90 pages
Pearson'S Product-Moment Correlation Coefficient: Statistics and Probability
No ratings yet
Pearson'S Product-Moment Correlation Coefficient: Statistics and Probability
18 pages
12.15.14 Lines of Best Fit and Correlation Coefficients
100% (1)
12.15.14 Lines of Best Fit and Correlation Coefficients
4 pages
Q4 Week 6 Statistics and Probability
No ratings yet
Q4 Week 6 Statistics and Probability
21 pages
Pearson and Spearman Correlation
No ratings yet
Pearson and Spearman Correlation
50 pages
082 FM 200 System Preventative Maintenance Checklist 1
No ratings yet
082 FM 200 System Preventative Maintenance Checklist 1
23 pages
L3 Correlation
No ratings yet
L3 Correlation
101 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
37 pages
R-Unit 5
No ratings yet
R-Unit 5
76 pages
Surveying & Levelling: Lecture # 1
No ratings yet
Surveying & Levelling: Lecture # 1
37 pages
R Record-1
No ratings yet
R Record-1
57 pages
Topic4 Linear Models
No ratings yet
Topic4 Linear Models
72 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Reg Lin
No ratings yet
Reg Lin
73 pages
Measures of Relationship
No ratings yet
Measures of Relationship
11 pages
生物统计方法与应用9-Regression and Correlation
No ratings yet
生物统计方法与应用9-Regression and Correlation
42 pages
Welding Valve Rebuild Repair
No ratings yet
Welding Valve Rebuild Repair
12 pages
Introduction of Correlation
No ratings yet
Introduction of Correlation
39 pages
Buenavista CBDRP Mop 8.15x13
No ratings yet
Buenavista CBDRP Mop 8.15x13
126 pages
YBS Third Party Risk Management Policy Overview
No ratings yet
YBS Third Party Risk Management Policy Overview
10 pages
Pearson Correlation
No ratings yet
Pearson Correlation
59 pages
12 - The Correlational Research Strategy Short
No ratings yet
12 - The Correlational Research Strategy Short
44 pages
Week 3 - Similarity Distance Measures
No ratings yet
Week 3 - Similarity Distance Measures
42 pages
Jan 30 - Correlation III
No ratings yet
Jan 30 - Correlation III
31 pages
Module 8-2
No ratings yet
Module 8-2
25 pages
L3 Bivariate Worksheet
No ratings yet
L3 Bivariate Worksheet
25 pages
Q4 Week 6 - Statistics and Probability
No ratings yet
Q4 Week 6 - Statistics and Probability
22 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
3 Bivariate Data
No ratings yet
3 Bivariate Data
33 pages
Chapter 07
No ratings yet
Chapter 07
40 pages
L3 Bivariate Worksheet Answers
No ratings yet
L3 Bivariate Worksheet Answers
25 pages
SHS Correlation and Regression Final
No ratings yet
SHS Correlation and Regression Final
79 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
Regression Ex
No ratings yet
Regression Ex
13 pages
Correlation Practice
No ratings yet
Correlation Practice
18 pages
Chapt04 BPS
No ratings yet
Chapt04 BPS
26 pages
Correlation
No ratings yet
Correlation
26 pages
Correlation
No ratings yet
Correlation
48 pages
F141E - DT-Welding-LPMW-Terms and Conditions-Jan. 2020
No ratings yet
F141E - DT-Welding-LPMW-Terms and Conditions-Jan. 2020
5 pages
SSRN Id3512994
No ratings yet
SSRN Id3512994
34 pages
Correlation New
No ratings yet
Correlation New
37 pages
Handover To QAQC-Anand Kumar
No ratings yet
Handover To QAQC-Anand Kumar
6 pages
Correlation Diploma
No ratings yet
Correlation Diploma
10 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
List of Functions
No ratings yet
List of Functions
7 pages
Correlation Coefficients
No ratings yet
Correlation Coefficients
9 pages
Ww2 Coastal Edu Kingw Statistics R Tutorials Simplelinear HT
No ratings yet
Ww2 Coastal Edu Kingw Statistics R Tutorials Simplelinear HT
15 pages
Hypothesis Testing Correlation
No ratings yet
Hypothesis Testing Correlation
15 pages
Business Statistics
No ratings yet
Business Statistics
19 pages
Nano Fluids PDF
No ratings yet
Nano Fluids PDF
22 pages
R Programming End Term
No ratings yet
R Programming End Term
4 pages
Covariance and Correlation
No ratings yet
Covariance and Correlation
6 pages
B.A. (Program) Sociology: Scheme of Courses and Syllabus
No ratings yet
B.A. (Program) Sociology: Scheme of Courses and Syllabus
46 pages
Statistics & Probability Q4 - Week 7-8
No ratings yet
Statistics & Probability Q4 - Week 7-8
15 pages
Do The Same For All The Questions
No ratings yet
Do The Same For All The Questions
6 pages
Statistics: Correlation: 2.1 Interpreting A Scatterplot
No ratings yet
Statistics: Correlation: 2.1 Interpreting A Scatterplot
8 pages
Day 6 - Orthophoto
No ratings yet
Day 6 - Orthophoto
33 pages
Measures of Relationship
No ratings yet
Measures of Relationship
11 pages
Package Corrplot': June 30, 2021
No ratings yet
Package Corrplot': June 30, 2021
22 pages
Chapter 3 Notes A
No ratings yet
Chapter 3 Notes A
3 pages
Lab Exercise 3
No ratings yet
Lab Exercise 3
6 pages
Exercise Sheet 5
No ratings yet
Exercise Sheet 5
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Summary of Chapter 4
No ratings yet
Summary of Chapter 4
3 pages
Chap 3 Worksheet
No ratings yet
Chap 3 Worksheet
4 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
Week12 ExerciseSolutions
No ratings yet
Week12 ExerciseSolutions
5 pages
18 Graphics Creation
No ratings yet
18 Graphics Creation
27 pages
Combined Results (2023-10-29 160133)
No ratings yet
Combined Results (2023-10-29 160133)
302 pages
Yearly Prayer Times 2019 - IslamicFinder
No ratings yet
Yearly Prayer Times 2019 - IslamicFinder
13 pages
Surat Edaran Mahkamah Agung
No ratings yet
Surat Edaran Mahkamah Agung
5 pages
E-Government Presentation by Thet Pine Htoo EMBA - I - 20
100% (1)
E-Government Presentation by Thet Pine Htoo EMBA - I - 20
40 pages
Hot-Finished Circular Hollow Sections, Section Properties Dimensions & Properties
No ratings yet
Hot-Finished Circular Hollow Sections, Section Properties Dimensions & Properties
20 pages
Introduction To Anatomy Trans
No ratings yet
Introduction To Anatomy Trans
19 pages
Self-Configuration of Scrambling Codes For WCDMA Small Cell Networks
No ratings yet
Self-Configuration of Scrambling Codes For WCDMA Small Cell Networks
6 pages
Clock Divider and Pulse Counter
No ratings yet
Clock Divider and Pulse Counter
4 pages
An Analysis of Mathematics Anxiety Among First Year BSE Mathematics Students of Northwest Samar State University
No ratings yet
An Analysis of Mathematics Anxiety Among First Year BSE Mathematics Students of Northwest Samar State University
3 pages
# Check If The DRD Software Is Installed
No ratings yet
# Check If The DRD Software Is Installed
5 pages
01 PDFsam - King Keohane Verba 1994 Designing Social Inquiry
No ratings yet
01 PDFsam - King Keohane Verba 1994 Designing Social Inquiry
6 pages
Sociologia Ruralis - 2002 - Van Der Ploeg - Rural Development From Practices and Policies Towards Theory
No ratings yet
Sociologia Ruralis - 2002 - Van Der Ploeg - Rural Development From Practices and Policies Towards Theory
18 pages
Drawings of Reservoir
No ratings yet
Drawings of Reservoir
1 page
LOT#3 Equipment Equipment Name FAT Tests SAT Tests
No ratings yet
LOT#3 Equipment Equipment Name FAT Tests SAT Tests
2 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Psyteachr Github Io Stat-Models-V1 Correlation-And-Regression Html#exercises

Uploaded by

Psyteachr Github Io Stat-Models-V1 Correlation-And-Regression Html#exercises

Uploaded by

2 Correlation and regression

2.1 Correlation matrices

Recall that a correlation coefficient quantifies the strength and direction of a

## Correlation computed with

## Correlation computed with

## Correlation computed with

## term height mass birth_year

Correlations only provide a good description of the relationship if the relationship is

pairs(~ height + mass + birth_year, starwars)

starwars2 <- starwars %>%

pairs(~height + mass + birth_year, starwars2)

starwars3 <- starwars2 %>%

pairs(~height + mass + birth_year, starwars3)

## Correlation computed with

## term height mass birth_year

## Correlation computed with

## term height mass birth_year

term height mass birth_year

birth_year .16 .15

2.2 Simulating bivariate data

MASS before dplyr, clashes not dire;

— Dale Barr ((???)) September 30, 2014

Have a look at the documentation for the mvrnorm() function (type ?

There are three arguments to take note of:

mu a vector giving the means of the variables

Sigma a positive-definite symmetric matrix specifying the covariance matrix of the

The descriptions for n and mu are understandable, but what is a "positive-definite

In mathematics, matrices are just generalizations of the concept of a

A square matrix is a matrix where the number of rows is equal to the

new sample reset

handw <- read_csv("data/heights_and_weights.csv", col_types = "dd")

ggplot(handw, aes(height_in, weight_lbs)) +

handw_log <- handw %>%

So plugging in the values we got above, our covariance matrix should be

my_cov <- .96 * .26 * .65

my_Sigma <- matrix(c(.26^2, my_cov, my_cov, .65^2), ncol = 2)

Confused about the matrix() function?

The first argument is a vector of values, which we created above using

matrix(c("a", "b", "c", "d"), ncol = 2)

matrix(c("a", "b", "c", "d"), ncol = 2, byrow = TRUE)

# passing the *named* vector c(height = 4.11, weight = 4.74)

## simulate new humans

## combine real and simulated datasets

ggplot(alldata, aes(height_in, weight_lbs)) +

2.3 Relationship between correlation and regression

Recall that the GLM for a simple regression model is

b1 <- .96 * (.65 / .26)

Learning Statistical Models On this page

summary(lm(wlog ~ hlog, handw_log))

ggplot(handw_log, aes(hlog, wlog)) +

β 1 > 0 implies ρ > 0, since standard deviations can't be negative.

β 1 < 0 implies ρ < 0, for the same reason.

Consider the variance-covariance matrix below.

Enter your answers to two decimal places.

« 1 Introduction 3 Multiple regression »

You might also like

# passing the named vector c(height = 4.11, weight = 4.74)