0% found this document useful (0 votes)

8 views6 pages

R Tutorial

The document provides an overview of relational operators in R, explaining how they evaluate relationships between values and return logical results. It also covers subsetting data frames using logical values, calculating callback rates, and creating conditional statements with the ifelse() function. Additionally, it introduces factor variables, frequency tables, and statistical tests like t-tests for analyzing differences in means.

Uploaded by

yuruoqianqy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

R Tutorial

Uploaded by

yuruoqianqy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Kosuke Imai - Quantitative Social Science:

2. Relational Operators
Relational operators evaluate the relationships between two values, and they return logical values
(TRUE/FALSE).
• > greater than
• >= greater than or equal to
• < less than
• <= less than or equal to
• == equal to (== is different from =)
• != not equal to

1 > 0

## [1] TRUE
1 == 2

## [1] FALSE
"HELLO" == "hello" # R is case sensitive

## [1] FALSE

• When applied to a vector, the operators evaluate each element of the vector.
• We can use the square brackets [ ] to index the values in a vector by placing the logical value of each
element into a vector of the same length within [ ].
• The elements whose indexing value is TRUE are extracted.
x <- c(1,2,5,7)
x

## [1] 1 2 5 7

2
x>5

## [1] FALSE FALSE FALSE TRUE

x[c(FALSE,FALSE,FALSE,TRUE)] # extract the last element

## [1] 7
x[c(TRUE,FALSE,FALSE,TRUE)] # extract the first and the last elements

## [1] 1 7

y <- c(11,0,5,-2)
x==max(x) # a vector of logical values: only element where x is maximized is TRUE

## [1] FALSE FALSE FALSE TRUE

y[x==max(x)] # select the value of y where x is maximized

## [1] -2
y[c(FALSE,FALSE,FALSE,TRUE)] # alternative

## [1] -2

3. Subsetting using logical values

Suppose we want to calculate the callback rate for females (based on Marianne Bertrand and Sendhil
Mullainathan (2004) “Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on
labor market discrimination.”):
setwd("/Users/qianruoyu/Downloads/Causal Inference_ECON3284/tutorial")
resume <- read.csv("resume.csv")
mean(resume$call[resume$sex == "female"]) # callback rate for females

## [1] 0.08248799

Decomposition:
• resume$sex
– use the $ operator to access an individual variable in a data frame
– obtain individual variable “sex” in the “resume” data frame, which is a vector
• vec1 <- resume$sex == “female”
– “==” is the relational operator “equal to”
– “==” evaluates each element of the “sex” column to see if it is equal to “female”
– if the element is equal to “female”, return a logical value of “TRUE”; otherwise “FALSE”
– vec1 is a vector of logical values
• vec2 <- resume$call[resume$sex == “female”]
– the square brackets index the values in the “call” column using the corresponding logical value in
the “vec1” vector
– extract elements whose indexing value is TRUE
– subset of the “call” column: only females

3
• mean(resume$call[resume$sex == “female”])
– use the function mean( ) to calculate the sample mean of the subsetted vector

Alternatively, the calculation of callback rate for females can be done in two steps.
• We first subset a data frame object so that it contains only the resumes of females (with all columns)
and then compute the callback rate.
• Notice that we use square brackets [ , ] to index the rows and columns of a data frame.
• Unlike in the case of indexing vectors, we use a comma to separate row and column indexes.
• This comma is important and forgetting to include it will lead to an error.
• Here, we do not specify a column index after the comma because we want to keep all columns.

# step1: subset only females, keep all columns (do not specify a column index)
resume_f <- resume[resume$sex == "female", ]

dim(resume)

## [1] 4870 4
dim(resume_f) # fewer rows/observations

## [1] 3746 4
table(resume$sex)

##
## female male
## 3746 1124
# step2: callback rate for females
mean(resume_f$call)

## [1] 0.08248799

(optional) We can also use the subset() function to construct a data frame that contains just some of the
original observations and just some of the original variables.

4. Simple Conditional Statements

In many situations, we would like to perform different actions depending on whether a statement is true or
false: the ifelse() function.
• The function ifelse(X, Y, Z) contains three elements.
– X: A logical condition or a logical vector that specifies the condition to be evaluated.
– Y: The value or expression to be returned when the condition is TRUE.
– Z: The value or expression to be returned when the condition is FALSE.
• For each element in X that is TRUE, the corresponding element in Y is returned. In contrast, for each
element in X that is FALSE, the corresponding element in Z is returned.
• e.g., suppose that we want to create a new binary variable called BlackFemale in the resume data frame
that equals 1 if the job applicant’s name sounds black and female, and 0 otherwise.

4
resume$BlackFemale <- ifelse(resume$race == "black" &
resume$sex == "female", 1, 0)

Decomposition:
• resume$race == “black”
– a vector of logical values
– “==” evaluates each element of the “race” column to see if it is equal to “black”
– if the element is equal to “black”, return a logical value of “TRUE”; otherwise “FALSE”
• resume$race == “black” & resume$sex == “female”
– a vector of logical values
– the element is TRUE only when both of the objects have a value of TRUE: race is black and sex
is female
• ifelse(resume$race == “black” & resume$sex == “female”, 1, 0)
– a vector of 0/1 values
– for elements in resume$race == “black” & resume$sex == “female” that are TRUE, return a
value of 1; for elements that are FALSE, return a value of 0

5. Factor Variables
A factor variable (or a categorical variable) takes a finite number of distinct values or levels.
• e.g., we wish to create a factor variable that takes one of the four values, i.e., BlackFemale, BlackMale,
WhiteFemale, and WhiteMale.
• We specify each type using the characteristics of the applicants.

resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale"

resume$type[resume$race == "black" & resume$sex == "male"] <- "BlackMale"
resume$type[resume$race == "white" & resume$sex == "female"] <- "WhiteFemale"
resume$type[resume$race == "white" & resume$sex == "male"] <- "WhiteMale"

resume$type <- as.factor(resume$type) ## coerce "type" into a factor variable

table(resume$type) # obtain the number of observations for each level

##
## BlackFemale BlackMale WhiteFemale WhiteMale
## 1886 549 1860 575

6. Other commands in ps2

1. rnorm(n, mean, sd) function: to generate a vector of normally distributed random numbers.
• n: The number of observations (sample size).
• mean: The mean value of the sample data; its default value is 0.
• sd: The standard deviation; its default value is 1.
2. mean(x, na.rm = TRUE) function: to calculate the mean of a vector.
• Sometimes, the vector contains missing values. In this case, the mean function returns NA.

5
• To drop the missing values from the calculation, we use na.rm = TRUE, which means “remove the NA
values”.
3. plot(density(x)): to plot the density curve of a vector x
• The probability density function of a vector x, denoted by f(x), describes the probability of the variable
taking certain value.
• You can use a different color by setting the col argument.
• You can use the lines() function to add a new plot.

x <- rnorm(n=50,mean=20,sd=1)
y <- rnorm(n=50,mean=21,sd=2)

plot(density(x), col="green")
lines(density(y))

density(x = x)
0.4
0.3
Density

0.2
0.1
0.0

16 18 20 22

N = 50 Bandwidth = 0.3599

4. table() function: to create frequency tables

• Frequency Table for One Variable x: table(x)
• Frequency Table for Two Variables x, y: table(x,y)
• We can access the table value using square brackets [ ].
t1 <- table(resume$sex)
t2 <- table(resume$sex, resume$race)
t2

##
## black white
## female 1886 1860
## male 549 575

6
t2[1,1] # female & black (the first row and the first column)

## [1] 1886
t2[2,1] # male & black (the second row and the first column)

## [1] 549
sum(t2[1,]) # total number of females (the sum of the first row)

## [1] 3746

5. t.test(x,y) function: to determine if there is a significant difference between the means of the two
groups x and y
• e.g., suppose we want to test if there is any difference between callback rate for blacks and callback rate
for whites.
• We can extract information from the t-test results.

t <- t.test(resume$call[resume$race=="black"],
resume$call[resume$race=="white"])
t

##
## Welch Two Sample t-test
##
## data: resume$call[resume$race == "black"] and resume$call[resume$race == "white"]
## t = -4.1147, df = 4711.6, p-value = 3.943e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.04729503 -0.01677067
## sample estimates:
## mean of x mean of y
## 0.06447639 0.09650924
names(t)

## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"

## [6] "null.value" "stderr" "alternative" "method" "data.name"
t$estimate[1] # callback rate for blacks

## mean of x
## 0.06447639
t$estimate[2] # callback rate for whites

## mean of y
## 0.09650924
t$p.value # p-value: whether the difference is significant

## [1] 3.942942e-05

Lecture 1
No ratings yet
Lecture 1
167 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Manual For 3108 - Compressed
No ratings yet
Manual For 3108 - Compressed
22 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
R Cheat Sheet Merged
100% (2)
R Cheat Sheet Merged
35 pages
Introdution To R - Network Analysis - Practical 1 - Sacha Epskamp - University of Amsterdam, 2013
No ratings yet
Introdution To R - Network Analysis - Practical 1 - Sacha Epskamp - University of Amsterdam, 2013
34 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
How To Learn Anything and Remember Everything 3 Books in 1
100% (3)
How To Learn Anything and Remember Everything 3 Books in 1
77 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
R Programing Bhagu
No ratings yet
R Programing Bhagu
40 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Data Analysis in R
No ratings yet
Data Analysis in R
10 pages
BE184
No ratings yet
BE184
47 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
R Practicals
No ratings yet
R Practicals
32 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Case 1 - Interview Experiment
No ratings yet
Case 1 - Interview Experiment
11 pages
Practical Test 1222678
No ratings yet
Practical Test 1222678
5 pages
R Programming
No ratings yet
R Programming
50 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
Econ 2b03 Assignment 1
No ratings yet
Econ 2b03 Assignment 1
8 pages
R Programming Interview Questions-1
No ratings yet
R Programming Interview Questions-1
20 pages
Data Wrangling
No ratings yet
Data Wrangling
12 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
Practical2 3
No ratings yet
Practical2 3
6 pages
CH 3
No ratings yet
CH 3
33 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
R Commands
No ratings yet
R Commands
18 pages
MTSE Previous-Year-Question-Paper-2023
No ratings yet
MTSE Previous-Year-Question-Paper-2023
8 pages
R
No ratings yet
R
13 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
R Programming-1
No ratings yet
R Programming-1
6 pages
Logistic Regression Assignment
No ratings yet
Logistic Regression Assignment
20 pages
Ex 5
No ratings yet
Ex 5
4 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
R Examples
No ratings yet
R Examples
56 pages
Experiment Lab-II
No ratings yet
Experiment Lab-II
9 pages
Experiment Lab-II
No ratings yet
Experiment Lab-II
9 pages
Basic Commands
No ratings yet
Basic Commands
10 pages
R Commands
No ratings yet
R Commands
5 pages
M6 - Q3 - W1 - Solid Figures (CO1)
100% (2)
M6 - Q3 - W1 - Solid Figures (CO1)
33 pages
R Cheat Sheet: 1. Basics 4. Input and Export of Data
100% (1)
R Cheat Sheet: 1. Basics 4. Input and Export of Data
4 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
3CS4-24 de Sarabjeet Singh Lab Manual
No ratings yet
3CS4-24 de Sarabjeet Singh Lab Manual
71 pages
BAN5
No ratings yet
BAN5
2 pages
R Assignment
No ratings yet
R Assignment
9 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
R Course
No ratings yet
R Course
7 pages
Exercise Preparing Final Exam
No ratings yet
Exercise Preparing Final Exam
2 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
11 03 0161 03 0wng Indoor Mimo Wlan Channel Models
No ratings yet
11 03 0161 03 0wng Indoor Mimo Wlan Channel Models
43 pages
UL2
No ratings yet
UL2
2 pages
Exercise Sheet - Control Structures and Functions: Hint: You Can Use The Command Diag
No ratings yet
Exercise Sheet - Control Structures and Functions: Hint: You Can Use The Command Diag
4 pages
2023 Nov Algebra 1
No ratings yet
2023 Nov Algebra 1
2 pages
Quantitative Techniques
100% (1)
Quantitative Techniques
3 pages
PROGRAMMING FOR PROBLEM SOLVING USING C Syllabus
No ratings yet
PROGRAMMING FOR PROBLEM SOLVING USING C Syllabus
2 pages
Unit 5 PQF
No ratings yet
Unit 5 PQF
19 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
4 pages
Transient Analyses of Interceptor Trench
No ratings yet
Transient Analyses of Interceptor Trench
9 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
UML Sequence Diagrams Reading:: UML Distilled Ch. 4, by M. Fowler
No ratings yet
UML Sequence Diagrams Reading:: UML Distilled Ch. 4, by M. Fowler
18 pages
Tabullation
No ratings yet
Tabullation
35 pages
Chapter 7 Risk Management
No ratings yet
Chapter 7 Risk Management
8 pages
23-03-2025 - SR Out Going - Jee Main Model - Igtm-1 - QP
No ratings yet
23-03-2025 - SR Out Going - Jee Main Model - Igtm-1 - QP
19 pages
Nature of Inquiry and Research
No ratings yet
Nature of Inquiry and Research
18 pages
Electronics: Asynchronous Floating-Point Adders and Communication Protocols: A Survey
No ratings yet
Electronics: Asynchronous Floating-Point Adders and Communication Protocols: A Survey
23 pages
WC XII Artificial Intelligence 843 AY 2023 24 QP SET1 1
No ratings yet
WC XII Artificial Intelligence 843 AY 2023 24 QP SET1 1
5 pages
IN - 2 - YearEnd - Math 2
No ratings yet
IN - 2 - YearEnd - Math 2
6 pages
ANSYS HFSS W03 11 3D Modeler Parameterized Horn
No ratings yet
ANSYS HFSS W03 11 3D Modeler Parameterized Horn
17 pages
Catani Et Al. - 2005 - Landslide Hazard and Risk Mapping at Catchment Scale in The Arno River Basin
No ratings yet
Catani Et Al. - 2005 - Landslide Hazard and Risk Mapping at Catchment Scale in The Arno River Basin
14 pages
Improvement of One Factor at A Time Through Design of Experiments
No ratings yet
Improvement of One Factor at A Time Through Design of Experiments
6 pages
Comparison of Two Deterministic Interpolation Methods For Predicting Ground Water Level in Baghdad
No ratings yet
Comparison of Two Deterministic Interpolation Methods For Predicting Ground Water Level in Baghdad
10 pages
MTH302 Py Question
No ratings yet
MTH302 Py Question
5 pages
Mode Participation Factor and Effective Mass: Modal Analysis - Lesson 4
No ratings yet
Mode Participation Factor and Effective Mass: Modal Analysis - Lesson 4
8 pages
Practice in Scaterring Surfaces PDF
No ratings yet
Practice in Scaterring Surfaces PDF
9 pages
Groovy Like An Old Time Movie
No ratings yet
Groovy Like An Old Time Movie
7 pages
Transferencia de Calor
No ratings yet
Transferencia de Calor
7 pages
Courseworksheet (MD)
No ratings yet
Courseworksheet (MD)
3 pages
List of Graduates NKP 2021
No ratings yet
List of Graduates NKP 2021
4 pages

R Tutorial

Uploaded by

R Tutorial

Uploaded by

Kosuke Imai - Quantitative Social Science:

## [1] FALSE FALSE FALSE TRUE

## [1] FALSE FALSE FALSE TRUE

3. Subsetting using logical values

4. Simple Conditional Statements

resume$type[resume$race == "black" & resume$sex == "female"] <- "BlackFemale"

resume$type <- as.factor(resume$type) ## coerce "type" into a factor variable

6. Other commands in ps2

4. table() function: to create frequency tables

## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"

You might also like