Homework RMD
Homework RMD
# Preamble from R
**Important**: When you click the **Knit** button a document will be generated that
includes both content as well as the output of any embedded R code chunks within
the document.
**Advice**: You can first run your code on a simple R script and, when it works
correctly, you can transpose the code into this R Markdown file. You will be
evaluated both on your code and your interpretation.
**Support**: You can find support on how to code each task at these links:
[descriptive statistics](https://www.datacamp.com/doc/r/descriptives), [loops in R]
(https://www.w3schools.com/r/r_for_loop.asp), [linear
regression](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/
lm), [regression
output](https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf)
(to visualize the regression output of stargazer, insert an executable cell of R
code with this preamble: {r NAME_OF_CODE, results = "asis"}), [data visualization]
(https://ggplot2.tidyverse.org/).
## First task
Remove [Name Surname] and replace it with the names from the members of your group
at the beginning of this file.
## Research Question
"In the social sciences, the term “peer effects” has been widely used to describe
the various ways in which individual behaviors and attitudes can be influenced by
friends, acquaintances, and the wider social environment"
([Source](https://oxfordre.com/education/display/10.1093/acrefore/
9780190264093.001.0001/acrefore-9780190264093-e-849?d=%2F10.1093%2Facrefore
%2F9780190264093.001.0001%2Facrefore-9780190264093-e-
849&p=emailAwDWIpvnTJaIY#:~:text=Summary,and%20the%20wider%20social
%20environment.)).
What is the relationship between peers and individual productivity? Do teams with
highly productive members boost the productivity of their members? What about
institutions or firms? These are **central** questions in economics and business.
If interested, see this paper:
[link](https://drive.google.com/file/u/0/d/1fQwh6KVsd-p-DG-i-l4ydGv-lDPXGHRa/view?
usp=sharing&pli=1) (Teams: Heterogeneity, Sorting and Complementarity).
## Data
At this [link](https://cran.r-project.org/web/packages/wooldridge/wooldridge.pdf),
you can find the **description** of each variable and the overall dataset.
If you have problem using the .csv file on Moodle, you can also use the R package
"Woolridge" and run data('big9salary').
# Homework Tasks
Read each task carefully and evaluate both your code and the interpretation of the
output.
## 1. Packages
```{r}
library(tidyverse)
library(ggplot2)
library(stargazer)
library(plm)
```
If these packages are not present in your library, you **should install** them. You
can find detailed descriptions of these packages online.
In this first section, provide a brief description of the dataset and include
summary statistics for the key variables. Additionally, visualize the data to
illustrate the correlation between these key variables.
```{r}
# Load dataset big9salary.csv from your own path.
# HINT: replace
# "C:/Users/Lorenzo Navarini/Desktop/teaching/homework/phd_returns/"
# with the folder path where you placed your file csv
- Obtain the summary statistics for the key variables for this analysis: pubindx
and lsalary. Moreover, produce an histogram of pubindx and interpret it, by looking
at the mean and the median of pubindx (as included in the summary statistics).
```{r}
data$year95 <- data$year==95
data$year99 <- data$year==99
```
- Produce a scatter plot with a fitted line between pubindx and lsalary.
Moreover, compute the correlation between pubindx10 and salary. Interpret it.
- Estimate two models: (1) regress lsalary on pubindex10 with a simple OLS
(pooled cross-sectional data), and (2) use a fixed effects estimator to regress
lsalary on pubindex10. Interpret the difference between these two outcomes. (The
code is given, but it is based on pubindx, and not pubindx10. You can use the same
structure of the code for obtaining the output for different regressions.)
```{r}
# HINT: lm for pooled cross-sectional data, plm for panel data fixed effects
lsalary_pubindx10 <- lm(lsalary ~
pubindx, data = data)
```
stargazer(lsalary_pubindx10, lsalary_pubindx10_fe,
header=FALSE, float=FALSE,
notes="Numbers in parentheses are standard errors.",
omit.stat = c("adj.rsq", "ser", "f"), no.space = TRUE)
```
- Now, include in both specifications also controls for occupation (assoc, prof,
chair), gender (female), experience and experience squared, and time fixed effects.
Interpret the table. To put things into perspective, what is the salary premium,
ceteris paribus, to someone who go from 0 to 3.5 (the mean) in the publication
index? Interpret the wage premium.
- In the dataset, you have dummy variables for each institution (osu, iowa,
indiana, purdue, ...). Create a single categorical variable, such that each
category is linked to one university dummy.
```{r}
# HINT: execute a for loop, where you loop
#over the list of universities (uni_list) and
# you assign each category to a single variable
# University list: osu iowa indiana purdue msu minn mich wisc illinois
uni_list <- c("osu", "iowa", "indiana", "purdue", "msu", "minn", "mich", "wisc",
"illinois")
peers_effect <- list()
```
- Describe the new variable and check that you correctly constructed a
categorical variable (for instance, compare a tabulation of your categorical
variable with a tabulation of a single dummy variable, such as osu, and check the
number of observations).
## 4. Peer effects
- Regress the publication index on university dummies and peer effects and create
a table where you present the results compared to the first model. Interpret the
results. Eventually, explain what variation are you capturing with this measure of
peer effects relative to the university dummies.
- Check the variation within each university in the peer effects by producing a
box plot of peer effects over the university variable. Interpret it.
- Is there a way to compute peer effects differently? You can try the so-called
leave-one-out mean. You compute individual peer effects which are year-specific, by
removing the individual year-specific observation from the year-specific peer
effects computation. The code is given, interpret it and give the intuition behind
it. For instance, how do you remove the individual from the peers effect
computation? Describe the summary statistics and the relative histogram.
(IMPORTANT: remove " , eval=FALSE" from the code to execute the code in the final
file. Be careful about the names of the variables and the name of your data.)
```{r , eval=FALSE}
# Leave-one-out mean code (remove eval=FALSE)
data$peers_effect_loo <- 0
peers_effect_loo <- list()
hist(data$peers_effect_loo)
summary(data$peers_effect_loo)
```
- Generate a box plot of the leave-one-out peer effects. Interpret the difference
with the average peer effects and the relative variation within institutions.
- Now, that you have a new peer effect measure (peers_effect_loo), regress again
pubindx10 on university dummies and leave-one-out peer effects (peers_effect_loo).
Interpret the results and give the intuition about the differences with the
previous results.