Econ117 ps1
Econ117 ps1
Econ117 ps1
Introduction
This problem set will be different from the other six problem sets in that it focuses on
learning the basics of the R programming language.
This problem set should be submitted as a PDF together with the .R (or .Rmd) file
used to run the analysis. The PDF should include all your work (code, output, hand-
written/typed answers) and can be prepared in Word, Latex, or any other electronic
format (such as R Markdown). We uploaded another short video that walks you through
the steps to export the R code and output you obtain to Word. You can watch it here.
This is also described as “Option 1” in Problem Set Submission Guidelines on Canvas.
If you are unable to follow these steps, copying and pasting your code and output by
hand into a Word file and then creating a PDF is acceptable for this problem set. From
Problem Set 2 onwards it is recommended that you follow either Option 1 or 2 in the
guidelines to submit your answers.
1
Part 1. An online crash course in R.
Sign up and complete the tutorial Introduction to R using this link. You must
sign up with your Yale email address. This will make you part of our class
on datacamp and will allow us to confirm that you completed the required work.
This is free and you do not need to sign up for a paid account.
Once signed up, you ONLY need to complete the assignment “Introduc-
tion to R”. The other courses listed there are provided as additional learning
material and are not required for this problem set or course.
The “Introduction to R” course is highly interactive as it requires answering some
questions and modifying some code along the way. If you have prior exposure to
R, this should take less than 3 hours. For someone with no R or programming
experience it should take less than 5 hours.
Alternative: If you have experience using R (or do not want to create an ac-
count on datacamp), write a short (3 sentence max) statement either briefly
describing your past use of R or just stating that you did not want to sign up
for datacamp.
NOTE: The software used in this course (R and RStudio) are open source
software and require no payment.
1 Install R: Visit https://cloud.r-project.org/ and download R (note different
links on this site for Windows, Mac, and Linux).
2 Install RStudio: Visit https://posit.co/download/RStudio-desktop/ and
download the free version of RStudio (note different links on this site for Win-
dows, Mac, and Linux).
REMARK: Canvas page Resources for learning R contains additional infor-
mation to get you started with R.
2
Part 4. Opening an R Script in RStudio
Complete the six requested exercises contained at the bottom of the R script
econ117_pset1_scriptA.R. Please include the needed code below each com-
ment and include the requested output in the pset. These are simple exercises
to test your familiarity with basic data types in R. They do not use any external
sources of data.
For convenience, the six exercises are also included at the end of this pset.
Some basic organization is required when carrying out an R project (or any data
project!). You will be handling two main types of files: datasets, and R scripts.
It is usually good practice for each project to have its own folder where these
files are stored together.
Create a folder in your desktop or Econ117 course folder called econ117_pset1_folder.
Download econ117_pset1_scriptB.R from canvas and save it in econ117_pset1_folder.
Open the file in RStudio and complete the eight exercises contained in it. Please
include the needed code below each comment and include the requested output
in the pset. In these exercises, we will show how to read data into R and make
time series plots. We will use tax data from the United States to study the share
of income coming to the highest- and lowest-earning taxpayers over the course
of the last hundred years.
Click on the “Ed Discussion” link on the left hand side of the canvas website or
use the following link https://edstem.org/us/courses/49565/discussion/.
Please respond to the thread we posted with “Hello World”. Alternatively, ask
your own question or answer someone else’s question.
3
Questions from econ117_pset1_scriptA.R:
1 # --- 1 ----
# After you have run the code above, take a screen shot (or a photo)
of your screen showing RStudio, this code, and the plot it produced.
# --- 2 ---
# Create a new vector "d" of 100 random numbers using the "runif()"
function from above.
6 # Print the first 6 values using the head() command and include them
in your pset.
# Create a new vector "e" that contains the numbers 1 to 100 (hint,
use ":" or the "seq()" command from above).
# --- 3 ---
# Create a new matrix "draws2" containing the vectors "d" and "e" as
columns.
11 # Print the last 6 rows using the tail() command and include them in
your pset.
# --- 4 ---
# Plot draws2, save the figure and include it in your pset (hint, use
the "plot()" command from above).
# Note you can save or copy the file using the "Export" button above
the plot.
16
# --- 5 ---
# Replace the third entry in the vector "d" with the number "777"
# Print the first five values of d and include in your pset
# You can use the "head(d, n = 5)" command for this, where the n=5
option tells the command to print the first 5 values.
21 # You can also use d[1:5] to show the first 5 values of d.
# --- 6 ---
# Save this R script with your answers to 1-5 above and submit it with
pset 1.
4
Questions from econ117_pset1_scriptB.R:
1 # --- 1 ----
# Even if you have not specified one yet, R has a default working
directory.
# At any time you can figure out which is your current working
directory using the getwd() command.
# Print your current working directory using the "getwd()" command and
include it in your pset.
6 # --- 2 ----
# As you can see in the previous question, the working directory is
indexed by a path. A path is a sequence of folders that lead up to
a given folder.
# Using the setwd() command, make the folder econ117_pset1_folder your
new working directory.
# [hint: when writing the path inside the parentheses of setwd() you
should use quotation marks ""]
# [hint: In RStudio you can choose Session -> Set Working Directory ->
Choose Directory to choose a working directory by hand. This will
also run the corresponding setwd() command. We recommend doing
this, and then copying the command!]
11
# --- 3 ----
# Download the data file income_shares_USA.csv from the Problem Set 1
page under assignments.
# Move this file to the folder econ117_pset1_folder.
# The list.files() command allows you to list all the files in your
working directory.
16 # Use the list.files() command to print the contents of your working
directory and include it in your pset.
# --- 4 ----
# Load the income_shares_USA.csv dataset from Canvas using the
read.csv() command. # [Hint: this can also be done with the "Import
Dataset" button in the "Environment" tab of RStudio, and will
generate the command as well.]
# [Hint: you can assign a name to you dataset by typing dataset_name
<- read.csv(), where "dataset_name" is a name of your choice].
21 # Print the list of variable names of the data set using the names()
command and include it in your pset.
# [Hint: if your code used "data" for "dataset_name" above, you would
type "names(data)"]
# DATASOURCE: https://wid.world/data/
# --- 5 ---
26 # Using plot() command, produce a plot of the income share accounted
for by the bottom 50% of the income distribution (variable p0p50)
over years (variable year). Save the graph and include it in your
5
pset.
# [Hint: you can plot variable Y of dataset "dataset_name" against
variable X by typing plot(dataset_name$X,dataset_name$Y)].
# [Hint: you can save the graph by clicking "Export" button located
right above the graph.]
# Describe what happened with the share of income accounted for by the
poorest Americans over the last 50 years (1 sentence).
31 # --- 6 ---
# Produce a plot of the income share accounted for by the top 1% of
the income distribution (variable p99p100). Save the graph and
include it in your pset.
# Describe what happened with the share of income accounted for by the
richest Americans over the last 50 years (1 sentence).
# --- 7 ---
36 # Using max 3 sentences, describe what you think happened with income
inequality in the US over the last 50 years. Refer to the results
you obtained in (5) and (6) while answering this question.
# --- 8 ---
# Save this R script with your answers to 1-7 above and submit it with
pset 1.