0% found this document useful (0 votes)

54 views25 pages

Recitation 4

The document discusses setting up working directories in R and installing packages like TinyTex and stargazer that allow formatting R code and output. It then demonstrates using the stargazer package to output regression results from R code in a nicely formatted table. Specifically, it estimates a linear regression model examining whether people are less likely to help those perceived as out-groups (wearing hijabs) as temperature increases, finding the interaction term to be marginally significant.

Uploaded by

Mane Harutyunyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views25 pages

Recitation 4

Uploaded by

Mane Harutyunyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Recitation Four

Agabek Kabdullin

2022-09-18

Contents

Counting Stars 2
Fixing working directories . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
A note on TinyTex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Stargazer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Plots in Base R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Anscombe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Simpson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Try on your own . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1
Counting Stars

Fixing working directories

Let’s set up a global working directory to resolve some of the issues from last week:

```{r setup, include=FALSE}

knitr::opts_chunk$set(root.dir =
'D:/.../Fall/data_analysis/recitations/recitation (4)/r_code')
library(stargazer)

A note on TinyTex

Not everyone is a fun of HTML. Some people prefer pdf outputs. Luckily, R Markdown
allows us to produce nicely looking pdf output. (We will use stargazer a bit later)
You will need tinytex package to knit pdfs in R Markdown. Please install it using the code
below. You only need to run it once! Don’t include it in all of your R Markdowns!
Also, please install the stargazer package. Again, you only need to install it once! But
you do need to call for it when using it in R Markdown.

```{r, include=FALSE}
#knitr::opts_chunk$set(echo = TRUE)

install.packages('tinytex') # installing 'tinytex' package

#which allows us to knit pdf in R Markdown
tinytex::install_tinytex() # install TinyTeX

install.packages('stargazer')
library(stargazer)

There are, however, some limitations that come with that. For one, we lose a lot of inter-
activity as we move from HTML to pdf. After all, pdf is only a document. Some default
options are also quite limited in R Markdown when opting for pdf. For instance, the only
available font sizes by default are 10pt, 11pt and 12pt.
---
title: "Recitation Three"
author: "Agabek Kabdullin"
date: "2022-09-11"
output: pdf_document
geometry: margin=1in
fontsize: 12pt
---

2
If you wanted more fonts and font sizes, you’ll need to jump through a couple of hoops.
See more on that here and here
Still, some basic interactivity is there. You can add an interactive table of contents, for
instance:
NB: “depth” of table of contents goes up to 6 (that’s the number of types of headers in R
Markdown; don’t worry, you wouldn’t every produce a sub-sub-sub-sub-sub-section; and
if you do, you need to reconsider some of your writing choices)

---
title: "Recitation Three"
author: "Agabek Kabdullin"
date: "2022-09-11"
output:
pdf_document:
toc: true
toc_depth: 2
geometry: margin=1in
fontsize: 12pt
---

3
Stargazer

“we’ll be countin’ stars”

– One Republic
Let’s take a look at some real world variables now.
Please open the first_experiment.csv and second_experiment.csv files (they should
be in the proper working directory!)
Remove the first column from both data frames (it’s a name of the row that made its way
into the file by accident)

## Read the data:

first_experiment <- read.csv("first_experiment.csv")
#data from the first experiment (2018)
#colnames(first_experiment)[-1]
first_experiment <- first_experiment[colnames(first_experiment)[-1]]
first_experiment$exp <- 1

second_experiment <- read.csv("second_experiment.csv")

#data from the second experiment (2019)
#colnames(second_experiment)[-1]
second_experiment <- second_experiment[colnames(second_experiment)[-1]]
second_experiment$exp <- 2

## Merge the data:

combined <- rbind(first_experiment, second_experiment)
# "rbind" stands for "row-binding"

## Some data cleaning:

combined$station1 <-
gsub(" ", "", combined$station1, fixed = TRUE)
#remove empty spaces from station names

# write data:
write.csv(combined, 'combined.csv', row.names = F)
# "row.names = F" makes sure that we do not write the row
# names into the file

4
Data come from the experiment by Choi, Poertner and Sambanis, (2019):

“The experimental intervention itself proceeded as follows: a female confeder-

ate approached a bench at a train station where other individuals were waiting
for their train and conducted a brief call addressing a friend regarding an in-
nocuous personal matter (step 1). During this call, the confederate dropped
fruit (oranges or lemons) from a paper bag that had seemingly torn at the bot-
tom (step 2). The fruit dispersed and the confederate appeared to be in need
of assistance to pick them up (step 3). We observed whether bystanders (Ger-
man natives) helped the confederate pick up the fruit (step 4)…

The key dimension of the intervention—the confederate’s perceived member-

ship in the ingroup (German natives) or outgroup (Muslim immigrants)—was
manipulated experimentally by randomly assigning a confederate with spe-
cific ethno-religious attributes: a Middle-Eastern immigrant wearing a hijab or
a white German female. We used several different actors (15 immigrants and
17 natives across 11 teams) and chose similarly aged confederates of com-
parable attractiveness and controlled for social class by having confederates
wear similar attire across iterations.”

5
help to natives help to non−natives
did not help
did not help

helped
helped

help to natives at 25 + C help to non−natives at 25 + C

did not help
did not help

helped
helped

help to natives at 30 + C help to non−natives at 30 + C

did not help
did not help

helped
helped

help to natives at 35 + C help to non−natives at 35 + C

did not help
did not help

helped
helped

The pie charts above suggest that as temperature increases, people tend to provide less
help to perceived out-groups. Is that statistically significant?
The equation that we want to estimate is:

\begin{align*}
Help_i = \alpha + \beta_1*temperature_{i} +
\beta_2*hijab_{i} +
\beta_3*temperature_{i}*hijab_{i} +
\gamma*X + \varepsilon
\end{align*}

𝐻𝑒𝑙𝑝𝑖 = 𝛼+𝛽1 ∗ 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒𝑖 +

𝛽2 ∗ ℎ𝑖𝑗𝑎𝑏𝑖 +
𝛽3 ∗ 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒𝑖 ∗ ℎ𝑖𝑗𝑎𝑏𝑖 +
𝛾∗𝑋+𝜀
We could estimate this equation using R’s lm function (which stands for “linear model”).
That, however, would only give us the estimates of coefficients without telling us anything
about the statistical significance:

6
lm(anyhelp ~ temp*treat, data = combined)

##
## Call:
## lm(formula = anyhelp ~ temp * treat, data = combined)
##
## Coefficients:
## (Intercept) temp treat temp:treat
## 0.672474 0.003890 0.140017 -0.008733

Let’s wrap our lm thing with the summary function:

summary(lm(anyhelp ~ temp*treat, data = combined))

##
## Call:
## lm(formula = anyhelp ~ temp * treat, data = combined)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8281 -0.6563 0.2303 0.3091 0.3880
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.672474 0.097012 6.932 5.78e-12 ***
## temp 0.003890 0.003564 1.091 0.2752
## treat 0.140017 0.132792 1.054 0.2918
## temp:treat -0.008733 0.004840 -1.804 0.0714 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4451 on 1782 degrees of freedom
## (2024 observations deleted due to missingness)
## Multiple R-squared: 0.01343, Adjusted R-squared: 0.01177
## F-statistic: 8.084 on 3 and 1782 DF, p-value: 2.387e-05

Now we’re getting somewhere! You see that the coefficient for the “Intercept” is statistically
significant at the level 0 (check out the significance codes at the bottom). The interaction
term (temp:treat) has a p-value of 0.0714. That means that there is a 7% chance of
observing this estimate (or estimate even farther away from zero) if the null hypothesis
was true.
Suppose you wanted to present this table to your peers. You could, of course, copy and
paste each value into a text editor, but working on that table (or tables) would be as pleas-
ant as stabbing yourself with a fork. Fortunately, there is stargazer.

7
Originally, it was created to turn R output into text that is readable by Latex (or LATEX) as a
ready table:

```{r}
stargazer(lm(anyhelp ~ temp*treat, data = combined))

stargazer(lm(anyhelp ~ temp*treat, data = combined))

##
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-
mail: marek.hlavac at gmail.com
## % Date and time: Sun, Sep 25, 2022 - 7:47:08 PM
## \begin{table}[!htbp] \centering
## \caption{}
## \label{}
## \begin{tabular}{@{\extracolsep{5pt}}lc}
## \\[-1.8ex]\hline
## \hline \\[-1.8ex]
## & \multicolumn{1}{c}{\textit{Dependent variable:}} \\
## \cline{2-2}
## \\[-1.8ex] & anyhelp \\
## \hline \\[-1.8ex]
## temp & 0.004 \\
## & (0.004) \\
## & \\
## treat & 0.140 \\
## & (0.133) \\
## & \\
## temp:treat & $-$0.009$^{*}$ \\
## & (0.005) \\
## & \\
## Constant & 0.672$^{***}$ \\
## & (0.097) \\
## & \\
## \hline \\[-1.8ex]
## Observations & 1,786 \\
## R$^{2}$ & 0.013 \\
## Adjusted R$^{2}$ & 0.012 \\
## Residual Std. Error & 0.445 (df = 1782) \\
## F Statistic & 8.084$^{***}$ (df = 3; 1782) \\
## \hline
## \hline \\[-1.8ex]
## \textit{Note:} & \multicolumn{1}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01
## \end{tabular}

8
## \end{table}

In R Mardkown, however, if you use the option asis for your results argument in the
chunk with stargazer code, you’ll be getting a neat table:

```{r, results='asis'}
stargazer(lm(anyhelp ~ temp*treat, data = combined))

stargazer(lm(anyhelp ~ temp*treat, data = combined))

% Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail:
marek.hlavac at gmail.com % Date and time: Sun, Sep 25, 2022 - 7:47:08 PM

Table 1:
Dependent variable:
anyhelp
temp 0.004
(0.004)

treat 0.140
(0.133)

temp:treat −0.009∗
(0.005)

Constant 0.672∗∗∗
(0.097)

Observations 1,786
R2 0.013
Adjusted R2 0.012
Residual Std. Error 0.445 (df = 1782)
F Statistic 8.084∗∗∗ (df = 3; 1782)
∗
Note: p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

How does stargazer do that?

Let’s assign our linear model to an object model1. Like many other objects in R, model1
would have certain features (“values” of an lm).
For instance, model1 has coefficients:

9
model1 = lm(anyhelp ~ temp + treat + temp*treat, data = combined)
model1$coefficients

## (Intercept) temp treat temp:treat

## 0.672473976 0.003889554 0.140017333 -0.008732651

It also stores the formula for our regression:

model1$call

## lm(formula = anyhelp ~ temp + treat + temp * treat, data = combined)

How about some standard errors? They are stored in a matrix:

vcov(model1)

## (Intercept) temp treat temp:treat

## (Intercept) 0.0094112442 -3.410816e-04 -0.0094112442 3.410816e-04
## temp -0.0003410816 1.270026e-05 0.0003410816 -1.270026e-05
## treat -0.0094112442 3.410816e-04 0.0176337439 -6.344734e-04
## temp:treat 0.0003410816 -1.270026e-05 -0.0006344734 2.342818e-05

To get them out, we take the diagonal of that matrix and take the square root of each value:

sqrt(diag(vcov(model1)))

## (Intercept) temp treat temp:treat

## 0.097011567 0.003563743 0.132792108 0.004840266

stargazer pulls out all of these details out of our linear model objects and plugs them into
pre-made tables that are readable by Latex.
stargazer also gives us quite some control over how our tables would look. For in-
stance, you could use the argument dep.var.labels to change the label of your depen-
dent variable. You could also use the argument covariate.labels to change the names
of your variables. Let’s also set header to FALSE so that we don’t get info on the author of
stargazer every time we use it (as much as we appreciate what Marek Hlavac has done
for R users).

10
stargazer((lm(anyhelp ~ temp + treat + temp*treat, data = combined)),
type = 'latex',
style = 'apsr',
dep.var.labels = 'Outcome: Did any bystanders offer help?',
covariate.labels = c('Temperature',
'Hijab vs native',
'Temperature x hijab versus native'),
header=FALSE)

Table 2:
Outcome: Did any bystanders offer help?
Temperature 0.004
(0.004)
Hijab vs native 0.140
(0.133)
Temperature x hijab versus native −0.009∗
(0.005)
Constant 0.672∗∗∗
(0.097)
N 1,786
R2 0.013
Adjusted R2 0.012
Residual Std. Error 0.445 (df = 1782)
F Statistic 8.084∗∗∗ (df = 3; 1782)
∗
p < .1; ∗∗ p < .05; ∗∗∗ p < .01

We are using APSR style from here on out, but if you’re interested, here is a full style list.

11
We could combine several models into one table with stargazer.
Note that we are omitting a bunch of variables from the second model using omit argu-
ment.

model1 = lm(anyhelp ~ temp + treat + temp*treat, data = combined)

model2 = lm(anyhelp ~ temp*treat + station1 + rush, data = combined)

stargazer(model1, model2,
omit = c(paste("station1", unique(combined$station1), sep=""),
'rush'),
type = 'latex',
style = 'apsr',
dep.var.labels = 'Outcome: Did any bystanders offer help?',
covariate.labels = c('Temperature',
'Hijab vs native',
'Temperature x hijab versus native'),
header=FALSE,
title="There is some relation between
the attitudes towards the out-group and
the environment")

Table 3: There is some relation between the attitudes towards the out-group and the en-
vironment
Outcome: Did any bystanders offer help?
(1) (2)
Temperature 0.004 0.007∗
(0.004) (0.004)
Hijab vs native 0.140 0.137
(0.133) (0.133)
Temperature x hijab versus native −0.009∗ −0.008∗
(0.005) (0.005)
Constant 0.672∗∗∗ 0.570∗∗∗
(0.097) (0.108)
N 1,786 1,786
R2 0.013 0.043
Adjusted R2 0.012 0.025
Residual Std. Error 0.445 (df = 1782) 0.442 (df = 1752)
F Statistic 8.084∗∗∗ (df = 3; 1782) 2.400∗∗∗ (df = 33; 1752)
∗
p < .1; ∗∗ p < .05; ∗∗∗ p < .01

12
Let’s add models but this time without the intercept estimates.

model1 = lm(anyhelp ~ temp + treat + temp*treat,

data = combined)
model2 = lm(anyhelp ~ temp*treat + station1 + rush,
data = combined)
model3 = lm(anyhelp ~ temp*treat + station1 + rush + bystander,
data = combined)

stargazer(model1, model2, model3,

omit = c(paste("station1", unique(combined$station1), sep=""),
'rush',
'bystander',
'Constant'),
type = 'latex',
style = 'apsr',
dep.var.labels = 'Outcome: Did any bystanders offer help?',
covariate.labels = c('Temperature',
'Hijab vs native',
'Temperature x hijab versus native'),
header=FALSE,
title="There is some relation between
the attitudes towards the out-group and
the environment")

Table 4: There is some relation between the attitudes towards the out-group and the en-
vironment
Outcome: Did any bystanders offer help?
(1) (2) (3
∗
Temperature 0.004 0.007 0.00
(0.004) (0.004) (0.0
Hijab vs native 0.140 0.137 0.1
(0.133) (0.133) (0.1
Temperature x hijab versus native −0.009∗ −0.008∗ −0.0
(0.005) (0.005) (0.0
N 1,786 1,786 1,7
R2 0.013 0.043 0.0
Adjusted R2 0.012 0.025 0.0
Residual Std. Error 0.445 (df = 1782) 0.442 (df = 1752) 0.442 (df
F Statistic 8.084∗∗∗ (df = 3; 1782) 2.400∗∗∗ (df = 33; 1752) 2.341∗∗∗ (df
∗
p < .1; ∗∗ p < .05; ∗∗∗ p < .01

13
We don’t always need all of the information on our results. R-squared, for instance,
wouldn’t tell us much in this case because our dependent variable is binary. So, let’s
drop it and some other stats via omit.stat = c('adj.rsq', 'rsq', 'ser', 'f').
That will also help us get rid of the table getting out the margins.
NB: See the list of statistic codes

model1 = lm(anyhelp ~ temp + treat + temp*treat,

data = combined)
model2 = lm(anyhelp ~ temp*treat + station1 + rush,
data = combined)
model3 = lm(anyhelp ~ temp*treat + station1 + rush + bystander,
data = combined)

stargazer(model1, model2, model3,

Table 5: Relation between the attitudes towards the out-group and the environment
Outcome: Did any bystanders offer help?
(1) (2) (3)
Temperature 0.004 0.007∗ 0.006∗
(0.004) (0.004) (0.004)
Hijab vs native 0.140 0.137 0.138
(0.133) (0.133) (0.133)
Temperature x hijab versus native −0.009∗ −0.008∗ −0.009∗
(0.005) (0.005) (0.005)
N 1,786 1,786 1,786
∗
p < .1; ∗∗ p < .05; ∗∗∗ p < .01

14
Our three models are different: first model is the baseline model, second model has station
fixed effects and a variable on whether it was the rush hour. The last model contains all
these plus the number of bystanders. Let’s make sure that we communicate that in our
table. I’m gonna use the add.lines argument to add some lines to our table. Compare
Table 6 to the table in the original paper.

stargazer(model1, model2, model3,

omit = c(paste("station1", unique(combined$station1), sep=""),
'rush', 'bystander', 'Constant'),
omit.stat = c('adj.rsq', 'rsq', 'ser', 'f'),
type = 'latex',
style = 'apsr',
dep.var.labels = 'Outcome: Did any bystanders offer help?',
covariate.labels = c('Temperature',
'Hijab vs native',
'Temperature x hijab versus native'),
header=FALSE, title="Help behavior by temperature",
add.lines = list(
c("Constant", round(summary(model1)$coefficients[1,1], 3), '', ''),
c("", round(summary(model1)$coefficients[1,2], 3), '', ''),
c("Rush hour FE", "No", "Yes", "Yes"),
c("Station FE", "No", "Yes", "Yes"),
c("Number of bystanders FE", "No", "No", "Yes")
)
)

Table 6: Help behavior by temperature

Outcome: Did any bystanders offer help?
(1) (2) (3)
Temperature 0.004 0.007∗ 0.006∗
(0.004) (0.004) (0.004)
Hijab vs native 0.140 0.137 0.138
(0.133) (0.133) (0.133)
Temperature x hijab versus native −0.009∗ −0.008∗ −0.009∗
(0.005) (0.005) (0.005)
Constant 0.672
0.097
Rush hour FE No Yes Yes
Station FE No Yes Yes
Number of bystanders FE No No Yes
N 1,786 1,786 1,786
∗
p < .1; ∗∗ p < .05; ∗∗∗ p < .01

15
Plots in Base R

Anscombe

When exploring data one of your very first steps should be plotting the data. Let’s load
datasauRus package to demonstrate this point.

#install.packages('datasauRus')
library(datasauRus)

datasauRus package has a datasaurus_dozen data frame which is nothing but twelve data
sets combined into one (hence, the name).

unique(datasaurus_dozen$dataset)

## [1] "dino" "away" "h_lines" "v_lines" "x_shape"

## [6] "star" "high_lines" "dots" "circle" "bullseye"
## [11] "slant_up" "slant_down" "wide_lines"

Each set has x and y variables. Let’s examine star and dino datasets:

subset_star = datasaurus_dozen[datasaurus_dozen$dataset == 'star' , ]

subset_dino = datasaurus_dozen[datasaurus_dozen$dataset == 'dino' , ]

model1 = lm(y ~ x, data = subset_star)

model2 = lm(y ~ x, data = subset_dino)

stargazer(model1, model2,
#omit.stat = c('adj.rsq', 'rsq', 'ser', 'f'),
type = 'latex',
style = 'apsr',
#dep.var.labels = 'Outcome: Did any bystanders offer help?',
column.labels = c('star', 'dino'),
header=FALSE,
title="Seemingly the relationship between x and y
is the same in both data sets"
)

If we were to only look at the linear regression results, we would have concluded that the
relationships between x and y are quite similar in both data sets.

16
Table 7: Seemingly the relationship between x and y is the same in both data sets
y
star dino
(1) (2)
x −0.101 −0.104
(0.135) (0.136)
Constant 53.327∗∗∗ 53.453∗∗∗
(7.692) (7.693)
N 142 142
R2 0.004 0.004
Adjusted R2 −0.003 −0.003
Residual Std. Error (df = 140) 26.973 26.975
F Statistic (df = 1; 140) 0.557 0.584
∗
p < .1; ∗∗ p < .05; ∗∗∗ p < .01

Were you to plot the data, however, you’d see that the relationships are far from being
similar:

par(mfrow = c(1, 2), # panel with one row and two columns
mai = c(0.5, 0.5, 0.25, 0.25)) # bottom, left, top and right margins
plot(subset_dino$x, subset_dino$y)
plot(subset_star$x, subset_star$y)
100

80
80
subset_dino$y

subset_star$y
60

60
40

40
20

20
0

20 40 60 80 100 30 40 50 60 70 80

NB: This actuallysubset_dino$x subset_star$x

is a (rather very) special extension of Anscombe’s quartet.

17
Let’s make our panel of plots a tad bit prettier. We’ll add the axes labels, a title; we will
also change the shape, color and size of our points. Let’s remove the ticks from one of
our plots just because we can.
Let’s also add our regression lines to further demonstrate why Anscombe disagreed that
“numerical calculations are exact, but graphs are rough”

par(mfrow = c(1, 2), # rows, columns

mai = c(0.75, 0.5, 0.75, 0.1)) # bottom, left, top and right
plot(subset_dino$x, subset_dino$y,
xlab = 'x', ylab = 'y', main = 'dino data',
pch = 19, cex = 1.2, col = 'maroon',
xlim = c(0, 120), ylim = c(0, 120),
xaxt = 'n', yaxt = 'n')
abline(lm(y ~ x, data = subset_dino), lty=2, lwd=3, col='seagreen')
plot(subset_star$x, subset_star$y,
xlab = 'x', ylab = 'y', main = 'star data',
pch = 21, cex = 3, col = 'cornflowerblue',
xlim = c(0, 120), ylim = c(0, 120))
abline(lm(y ~ x, data = subset_star), lty=2, lwd=3, col='seagreen')

dino data star data

120
100
80
60
y

40
20
0

0 20 40 60 80 100

x x

18
Simpson

Why else is it important to plot data?

Let’s explore another hypothetical example.

scores = read.csv('simpsons.csv')

These are simulated data on the relation between the amount of time students prepare for
a test and their final score.

```{r, include=T, fig.cap='seemingly, the longer you prepare for a test,

the worse you do'}
plot(scores$prep_time, scores$score,
xlab = 'preparation time', ylab = 'score',
main = 'relationship between preparation time and score')

relationship between preparation time and score

92
88
score

84
80

10 12 14 16 18 20 22

preparation time

Figure 1: seemingly, the longer you prepare for a test, the worse you do

19
It looks like the more you study, the lower your grade gets. What happens if we break it
down by subject?

plot(scores$prep_time, scores$score,
xlab = 'preparation time', ylab = 'score',
main = 'relationship between preparation time and score',
col = ifelse(scores$subject == 'Physical Education', 'maroon', 'black'))

relationship between preparation time and score

92
90
88
score

86
84
82
80

10 12 14 16 18 20 22

preparation time

Figure 2: physical education is relatively easy

Well, you’ll notice two things. First, physical education is an easy subject. You don’t need
to prepare for 20 hours to get a good grade in it. Second, the more you prepare for a
physical education test, the better you get at it.

20
Let’s highlight scores for English by using a slightly more complicated ifelse statement
with our col argument. Such ifelse statements are sometimes called “nested.”

plot(scores$prep_time, scores$score,
xlab = 'preparation time', ylab = 'score',
main = 'relationship between preparation time and score',
pch = 19,
col = ifelse(scores$subject == 'Physical Education', 'maroon',
ifelse(scores$subject == 'English', 'cornflowerblue',
'black')))

relationship between preparation time and score

92
90
88
score

86
84
82
80

10 12 14 16 18 20 22

preparation time

Figure 3: English is harder

You see that trend within a group (in our case within a subject) is positive, but the trend
across the groups is negative. This phenomenon is known as Simpson’s paradox
Unfortunately, you still need to study to do better in class.

21
Let’s add a legend to our plot so that our point comes across more effectively.
NB: Some guidance on legends here

relationship between preparation time and score

PhysEd
90

English
88
score

86
84
82
80

10 12 14 16 18 20 22

preparation time

Figure 4: English is harder

22
Let’s not forget about our colorblind folks and folks with bad printers:

plot(scores$prep_time, scores$score,
xlab = 'preparation time', ylab = 'score',
main = 'relationship between preparation time and score',
pch = ifelse(scores$subject == 'Physical Education', 19,
ifelse(scores$subject == 'English', 17, 21)),
col = ifelse(scores$subject == 'Physical Education', 'maroon',
ifelse(scores$subject == 'English', 'cornflowerblue',
'black')))
legend('topright', inset=0.1, legend=c("PhysEd", "English"),
col=c("maroon", "cornflowerblue"), pch=c(19, 17), cex=1)

relationship between preparation time and score

PhysEd
90

English
88
score

86
84
82
80

10 12 14 16 18 20 22

preparation time

Figure 5: English is harder

23
Let’s not forget about our colorblind folks and folks with bad printers (2):

plot(scores$prep_time, scores$score,
xlab = 'preparation time', ylab = 'score',
main = 'relationship between preparation time and score',
cex = ifelse(scores$subject == 'Physical Education', 1.2,
ifelse(scores$subject == 'English', 2, 1)),
pch = ifelse(scores$subject == 'Physical Education', 19,
ifelse(scores$subject == 'English', 17, 21)),
col = ifelse(scores$subject == 'Physical Education', 'maroon',
ifelse(scores$subject == 'English', 'cornflowerblue',
'black')))
legend('topright', inset=0.1, legend=c("PhysEd", "English"),
col=c("maroon", "cornflowerblue"), pch=c(19, 17), cex=c(1,1.2))

relationship between preparation time and score

PhysEd
90

English
88
score

86
84
82
80

10 12 14 16 18 20 22

preparation time

24
Finishing touches:

plot(scores$prep_time, scores$score,
xlab = 'preparation time', ylab = 'score',
main = 'relationship between preparation time and score',
cex = ifelse(scores$subject == 'Physical Education', 1.2,
ifelse(scores$subject == 'English', 2, 1)),
pch = ifelse(scores$subject == 'Physical Education', 19,
ifelse(scores$subject == 'English', 17, 21)),
col = ifelse(scores$subject == 'Physical Education', 'maroon',
ifelse(scores$subject == 'English', 'cornflowerblue', 'black')))
legend('topright', inset=0.025, legend=c("PhysEd", "English"),
col=c("maroon", "cornflowerblue"), pch=c(19, 17), cex=c(1,1.2),
title = 'subject', text.font = 2, box.lty = 0, bg = 'cadetblue1')

relationship between preparation time and score

subject
92

PhysEd
90

English
88
score

86
84
82
80

10 12 14 16 18 20 22

preparation time

Try on your own

1. What is the hardest subject in the scores data set?

2. Could you color all five subjects on the last plot? Would it be easier with ggplot2?

Writing A Reproducible Paper in R Markdown: Mail@paulcbauer - Eu Github Repository
No ratings yet
Writing A Reproducible Paper in R Markdown: Mail@paulcbauer - Eu Github Repository
18 pages
Tidyverse: Core Packages in Tidyverse
No ratings yet
Tidyverse: Core Packages in Tidyverse
8 pages
Brochure VD4
No ratings yet
Brochure VD4
8 pages
Lecture 20
No ratings yet
Lecture 20
46 pages
Stargazer 2
No ratings yet
Stargazer 2
26 pages
Jay Borda Itr Practical 1 To 9
No ratings yet
Jay Borda Itr Practical 1 To 9
42 pages
1research Methodology For Commerce Lab
No ratings yet
1research Methodology For Commerce Lab
35 pages
621 RcmdsFromClass
No ratings yet
621 RcmdsFromClass
17 pages
Useful R Commands
No ratings yet
Useful R Commands
17 pages
R - Lecture #2
No ratings yet
R - Lecture #2
21 pages
楊睿中統計學合併版
No ratings yet
楊睿中統計學合併版
557 pages
Stat 1000 Assignment 2
No ratings yet
Stat 1000 Assignment 2
17 pages
Workshop 1
No ratings yet
Workshop 1
7 pages
(Unit 4-5) R 2marks
No ratings yet
(Unit 4-5) R 2marks
6 pages
Simple Rmarkdown
No ratings yet
Simple Rmarkdown
6 pages
Stargazer
No ratings yet
Stargazer
11 pages
Tarea1 Biometria
No ratings yet
Tarea1 Biometria
4 pages
R Exercises 3 PDF
No ratings yet
R Exercises 3 PDF
91 pages
Codes - Part 1
No ratings yet
Codes - Part 1
7 pages
An Introduction To R: Biostatistics 615/815
No ratings yet
An Introduction To R: Biostatistics 615/815
59 pages
An Sweave Demo: Charles J. Geyer July 27, 2010
No ratings yet
An Sweave Demo: Charles J. Geyer July 27, 2010
7 pages
Worksheet 1 - MATH38161: Software Installations
No ratings yet
Worksheet 1 - MATH38161: Software Installations
2 pages
Exercises That Practice and Extend Skills With R: John Maindonald April 15, 2009
No ratings yet
Exercises That Practice and Extend Skills With R: John Maindonald April 15, 2009
3 pages
Annotated-Lab 1 Spring 2025 Assignment - RMD
No ratings yet
Annotated-Lab 1 Spring 2025 Assignment - RMD
3 pages
Foo PDF
No ratings yet
Foo PDF
7 pages
Presentation of R
No ratings yet
Presentation of R
109 pages
Glocal University: Practical File of R Programming
100% (1)
Glocal University: Practical File of R Programming
32 pages
00 Intro To RMD New Solutions
No ratings yet
00 Intro To RMD New Solutions
7 pages
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 207
No ratings yet
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 207
13 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
ADAS Training Systems
100% (2)
ADAS Training Systems
20 pages
Ejercicio #55 R Notebook
No ratings yet
Ejercicio #55 R Notebook
1 page
Assignment-1 PART 1 and Part2. Part3 PDF
No ratings yet
Assignment-1 PART 1 and Part2. Part3 PDF
6 pages
2013 - Notes - R Trinker'S - Notes
No ratings yet
2013 - Notes - R Trinker'S - Notes
274 pages
Useful R Functions-1
No ratings yet
Useful R Functions-1
4 pages
What Are The Tidyverse Packages in R Language?
No ratings yet
What Are The Tidyverse Packages in R Language?
12 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
Azizuddin Cv-Store Keeper
100% (1)
Azizuddin Cv-Store Keeper
3 pages
GCMS-QP2010 User'sGuide (Ver2.5) PDF
No ratings yet
GCMS-QP2010 User'sGuide (Ver2.5) PDF
402 pages
Exploratory Data Analysis and Graphics: Lab 2
No ratings yet
Exploratory Data Analysis and Graphics: Lab 2
19 pages
Canon IRC1020 Trouble Error Codes
No ratings yet
Canon IRC1020 Trouble Error Codes
9 pages
Intel 8085 Architecture
No ratings yet
Intel 8085 Architecture
8 pages
Sta108hw4 1
No ratings yet
Sta108hw4 1
5 pages
Scaler Masterclass - Notification Systems - HLD - Dec 10 2024
No ratings yet
Scaler Masterclass - Notification Systems - HLD - Dec 10 2024
10 pages
FE418 RLectureNotes1
No ratings yet
FE418 RLectureNotes1
15 pages
Aragón, Tomás J. - Applied Epidemiology Using R-Springer (2010)
No ratings yet
Aragón, Tomás J. - Applied Epidemiology Using R-Springer (2010)
190 pages
ALV - ALV Utility Program
No ratings yet
ALV - ALV Utility Program
4 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
EpidemiologyUsingR PDF
No ratings yet
EpidemiologyUsingR PDF
302 pages
Rcmds From Class
No ratings yet
Rcmds From Class
17 pages
Experiment - 7 Single-Phase Half Wave Voltage Multiplier 7-1 Object
No ratings yet
Experiment - 7 Single-Phase Half Wave Voltage Multiplier 7-1 Object
2 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
All v2 Basic Statistics Using R
No ratings yet
All v2 Basic Statistics Using R
241 pages
Challenging Revaluation REsult
No ratings yet
Challenging Revaluation REsult
4 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Mini Project Report
No ratings yet
Mini Project Report
26 pages
Applied Epidemiology Using R PDF
No ratings yet
Applied Epidemiology Using R PDF
302 pages
WWW Kratikal Com Blog How Is Vulnerability Management Different From Vulnerability Assessment
No ratings yet
WWW Kratikal Com Blog How Is Vulnerability Management Different From Vulnerability Assessment
7 pages
R Examples
No ratings yet
R Examples
56 pages
Connection Pooling
No ratings yet
Connection Pooling
5 pages
Akira Ct-14ns9re 3y11 Chassis
No ratings yet
Akira Ct-14ns9re 3y11 Chassis
34 pages
UL2
No ratings yet
UL2
2 pages
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
No ratings yet
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
4 pages
Data Exploration Visulization and Feature Engineering Using R
No ratings yet
Data Exploration Visulization and Feature Engineering Using R
24 pages
A1rib T4
No ratings yet
A1rib T4
5 pages
Memory Management Exercises Answers New
No ratings yet
Memory Management Exercises Answers New
7 pages
Swot
No ratings yet
Swot
9 pages
R Commands
No ratings yet
R Commands
2 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Ideas For CAS - School
No ratings yet
Ideas For CAS - School
3 pages
Cse291d 2 PDF
No ratings yet
Cse291d 2 PDF
54 pages
Dkvm-8E: 8-Port Keyboard, Video, and Mouse Switch
No ratings yet
Dkvm-8E: 8-Port Keyboard, Video, and Mouse Switch
30 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
CAS and IB Diploma Subject Groups
No ratings yet
CAS and IB Diploma Subject Groups
6 pages
Alarm IPCAM IOS EYE4 APP User Manual
No ratings yet
Alarm IPCAM IOS EYE4 APP User Manual
11 pages
Introduction To Rstudio: Creating Vectors
No ratings yet
Introduction To Rstudio: Creating Vectors
11 pages
YOUR CAS PLAN Name
No ratings yet
YOUR CAS PLAN Name
2 pages
2020 APCS Tom Tat Gioi Thieu Mon Hoc 2
No ratings yet
2020 APCS Tom Tat Gioi Thieu Mon Hoc 2
28 pages
SNP Log
No ratings yet
SNP Log
3 pages
Writing Reproducible Reports: Knitr With R Markdown
No ratings yet
Writing Reproducible Reports: Knitr With R Markdown
24 pages
Urban Planning and GIS
No ratings yet
Urban Planning and GIS
2 pages
Figure PPT ch003
No ratings yet
Figure PPT ch003
50 pages
Welcoming Letter
No ratings yet
Welcoming Letter
1 page
Deep Learning For Middle School Students
No ratings yet
Deep Learning For Middle School Students
34 pages
Build A Human Lightwave
No ratings yet
Build A Human Lightwave
6 pages
Buffer Stock Management System
No ratings yet
Buffer Stock Management System
11 pages
Hyper Upgraded Titan Speakerman Toilet Tower Defense Wiki Fandom
No ratings yet
Hyper Upgraded Titan Speakerman Toilet Tower Defense Wiki Fandom
1 page
RPubs - Panel Data Examples Using R&Quot
No ratings yet
RPubs - Panel Data Examples Using R&Quot
1 page
Calculator Puzzles, Tricks and Games
From Everand
Calculator Puzzles, Tricks and Games
Norvin Pallas
2/5 (2)
Number: To Infinity and Beyond
From Everand
Number: To Infinity and Beyond
Oliver Linton
No ratings yet
The Game of Logic
From Everand
The Game of Logic
Lewis Carroll
4.5/5 (2)