0% found this document useful (0 votes)
8 views6 pages

R Tutorial

The document provides an overview of relational operators in R, explaining how they evaluate relationships between values and return logical results. It also covers subsetting data frames using logical values, calculating callback rates, and creating conditional statements with the ifelse() function. Additionally, it introduces factor variables, frequency tables, and statistical tests like t-tests for analyzing differences in means.

Uploaded by

yuruoqianqy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

R Tutorial

The document provides an overview of relational operators in R, explaining how they evaluate relationships between values and return logical results. It also covers subsetting data frames using logical values, calculating callback rates, and creating conditional statements with the ifelse() function. Additionally, it introduces factor variables, frequency tables, and statistical tests like t-tests for analyzing differences in means.

Uploaded by

yuruoqianqy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Kosuke Imai - Quantitative Social Science:

2. Relational Operators
Relational operators evaluate the relationships between two values, and they return logical values
(TRUE/FALSE).
• > greater than
• >= greater than or equal to
• < less than
• <= less than or equal to
• == equal to (== is different from =)
• != not equal to

1 > 0

## [1] TRUE
1 == 2

## [1] FALSE
"HELLO" == "hello" # R is case sensitive

## [1] FALSE

• When applied to a vector, the operators evaluate each element of the vector.
• We can use the square brackets [ ] to index the values in a vector by placing the logical value of each
element into a vector of the same length within [ ].
• The elements whose indexing value is TRUE are extracted.
x <- c(1,2,5,7)
x

## [1] 1 2 5 7

2
x>5

## [1] FALSE FALSE FALSE TRUE


x[c(FALSE,FALSE,FALSE,TRUE)] # extract the last element

## [1] 7
x[c(TRUE,FALSE,FALSE,TRUE)] # extract the first and the last elements

## [1] 1 7

y <- c(11,0,5,-2)
x==max(x) # a vector of logical values: only element where x is maximized is TRUE

## [1] FALSE FALSE FALSE TRUE


y[x==max(x)] # select the value of y where x is maximized

## [1] -2
y[c(FALSE,FALSE,FALSE,TRUE)] # alternative

## [1] -2

3. Subsetting using logical values


Suppose we want to calculate the callback rate for females (based on Marianne Bertrand and Sendhil
Mullainathan (2004) “Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on
labor market discrimination.”):
setwd("/Users/qianruoyu/Downloads/Causal Inference_ECON3284/tutorial")
resume <- read.csv("resume.csv")
mean(resume$call[resume$sex == "female"]) # callback rate for females

## [1] 0.08248799

Decomposition:
• resume$sex
– use the $ operator to access an individual variable in a data frame
– obtain individual variable “sex” in the “resume” data frame, which is a vector
• vec1 <- resume$sex == “female”
– “==” is the relational operator “equal to”
– “==” evaluates each element of the “sex” column to see if it is equal to “female”
– if the element is equal to “female”, return a logical value of “TRUE”; otherwise “FALSE”
– vec1 is a vector of logical values
• vec2 <- resume$call[resume$sex == “female”]
– the square brackets index the values in the “call” column using the corresponding logical value in
the “vec1” vector
– extract elements whose indexing value is TRUE
– subset of the “call” column: only females

3
• mean(resume$call[resume$sex == “female”])
– use the function mean( ) to calculate the sample mean of the subsetted vector

Alternatively, the calculation of callback rate for females can be done in two steps.
• We first subset a data frame object so that it contains only the resumes of females (with all columns)
and then compute the callback rate.
• Notice that we use square brackets [ , ] to index the rows and columns of a data frame.
• Unlike in the case of indexing vectors, we use a comma to separate row and column indexes.
• This comma is important and forgetting to include it will lead to an error.
• Here, we do not specify a column index after the comma because we want to keep all columns.

# step1: subset only females, keep all columns (do not specify a column index)
resume_f <- resume[resume$sex == "female", ]

dim(resume)

## [1] 4870 4
dim(resume_f) # fewer rows/observations

## [1] 3746 4
table(resume$sex)

##
## female male
## 3746 1124
# step2: callback rate for females
mean(resume_f$call)

## [1] 0.08248799

(optional) We can also use the subset() function to construct a data frame that contains just some of the
original observations and just some of the original variables.

4. Simple Conditional Statements


In many situations, we would like to perform different actions depending on whether a statement is true or
false: the ifelse() function.
• The function ifelse(X, Y, Z) contains three elements.
– X: A logical condition or a logical vector that specifies the condition to be evaluated.
– Y: The value or expression to be returned when the condition is TRUE.
– Z: The value or expression to be returned when the condition is FALSE.
• For each element in X that is TRUE, the corresponding element in Y is returned. In contrast, for each
element in X that is FALSE, the corresponding element in Z is returned.
• e.g., suppose that we want to create a new binary variable called BlackFemale in the resume data frame
that equals 1 if the job applicant’s name sounds black and female, and 0 otherwise.

4
resume$BlackFemale <- ifelse(resume$race == "black" &
resume$sex == "female", 1, 0)

Decomposition:
• resume$race == “black”
– a vector of logical values
– “==” evaluates each element of the “race” column to see if it is equal to “black”
– if the element is equal to “black”, return a logical value of “TRUE”; otherwise “FALSE”
• resume$race == “black” & resume$sex == “female”
– a vector of logical values
– the element is TRUE only when both of the objects have a value of TRUE: race is black and sex
is female
• ifelse(resume$race == “black” & resume$sex == “female”, 1, 0)
– a vector of 0/1 values
– for elements in resume$race == “black” & resume$sex == “female” that are TRUE, return a
value of 1; for elements that are FALSE, return a value of 0

5. Factor Variables
A factor variable (or a categorical variable) takes a finite number of distinct values or levels.
• e.g., we wish to create a factor variable that takes one of the four values, i.e., BlackFemale, BlackMale,
WhiteFemale, and WhiteMale.
• We specify each type using the characteristics of the applicants.

resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale"


resume$type[resume$race == "black" & resume$sex == "male"] <- "BlackMale"
resume$type[resume$race == "white" & resume$sex == "female"] <- "WhiteFemale"
resume$type[resume$race == "white" & resume$sex == "male"] <- "WhiteMale"

resume$type <- as.factor(resume$type) ## coerce "type" into a factor variable


table(resume$type) # obtain the number of observations for each level

##
## BlackFemale BlackMale WhiteFemale WhiteMale
## 1886 549 1860 575

6. Other commands in ps2


1. rnorm(n, mean, sd) function: to generate a vector of normally distributed random numbers.
• n: The number of observations (sample size).
• mean: The mean value of the sample data; its default value is 0.
• sd: The standard deviation; its default value is 1.
2. mean(x, na.rm = TRUE) function: to calculate the mean of a vector.
• Sometimes, the vector contains missing values. In this case, the mean function returns NA.

5
• To drop the missing values from the calculation, we use na.rm = TRUE, which means “remove the NA
values”.
3. plot(density(x)): to plot the density curve of a vector x
• The probability density function of a vector x, denoted by f(x), describes the probability of the variable
taking certain value.
• You can use a different color by setting the col argument.
• You can use the lines() function to add a new plot.

x <- rnorm(n=50,mean=20,sd=1)
y <- rnorm(n=50,mean=21,sd=2)

plot(density(x), col="green")
lines(density(y))

density(x = x)
0.4
0.3
Density

0.2
0.1
0.0

16 18 20 22

N = 50 Bandwidth = 0.3599

4. table() function: to create frequency tables


• Frequency Table for One Variable x: table(x)
• Frequency Table for Two Variables x, y: table(x,y)
• We can access the table value using square brackets [ ].
t1 <- table(resume$sex)
t2 <- table(resume$sex, resume$race)
t2

##
## black white
## female 1886 1860
## male 549 575

6
t2[1,1] # female & black (the first row and the first column)

## [1] 1886
t2[2,1] # male & black (the second row and the first column)

## [1] 549
sum(t2[1,]) # total number of females (the sum of the first row)

## [1] 3746

5. t.test(x,y) function: to determine if there is a significant difference between the means of the two
groups x and y
• e.g., suppose we want to test if there is any difference between callback rate for blacks and callback rate
for whites.
• We can extract information from the t-test results.

t <- t.test(resume$call[resume$race=="black"],
resume$call[resume$race=="white"])
t

##
## Welch Two Sample t-test
##
## data: resume$call[resume$race == "black"] and resume$call[resume$race == "white"]
## t = -4.1147, df = 4711.6, p-value = 3.943e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.04729503 -0.01677067
## sample estimates:
## mean of x mean of y
## 0.06447639 0.09650924
names(t)

## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"


## [6] "null.value" "stderr" "alternative" "method" "data.name"
t$estimate[1] # callback rate for blacks

## mean of x
## 0.06447639
t$estimate[2] # callback rate for whites

## mean of y
## 0.09650924
t$p.value # p-value: whether the difference is significant

## [1] 3.942942e-05

You might also like