0% found this document useful (0 votes)
4 views5 pages

Homework RMD

This document outlines a homework assignment for an Introductory Econometrics course at the University of Vienna, focusing on peer effects in academic productivity. Students are required to analyze data related to publication indices and salaries, using R for statistical analysis and interpretation. The assignment includes specific tasks such as data description, regression analysis, and the construction of peer effect measures, with a deadline of January 31, 2025.

Uploaded by

hussainsaadeddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Homework RMD

This document outlines a homework assignment for an Introductory Econometrics course at the University of Vienna, focusing on peer effects in academic productivity. Students are required to analyze data related to publication indices and salaries, using R for statistical analysis and interpretation. The assignment includes specific tasks such as data description, regression analysis, and the construction of peer effect measures, with a deadline of January 31, 2025.

Uploaded by

hussainsaadeddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

---

title: "Homework - Introductory Econometrics 2024WS (BA)"


subtitle: "University of Vienna"
author: 'Group: [Name Surname], [Name Surname], [Name Surname]'
date: "31.01.2025"
output:
pdf_document: default
word_document: default
---

```{r setup, include=FALSE}


knitr::opts_chunk$set(echo = TRUE)
```

# Preamble from R

This is an R Markdown document. Markdown is a simple formatting syntax for


authoring HTML, PDF, and MS Word documents. For more details on using R Markdown
see <http://rmarkdown.rstudio.com>.

**Important**: When you click the **Knit** button a document will be generated that
includes both content as well as the output of any embedded R code chunks within
the document.

**Advice**: You can first run your code on a simple R script and, when it works
correctly, you can transpose the code into this R Markdown file. You will be
evaluated both on your code and your interpretation.

**Support**: You can find support on how to code each task at these links:
[descriptive statistics](https://www.datacamp.com/doc/r/descriptives), [loops in R]
(https://www.w3schools.com/r/r_for_loop.asp), [linear
regression](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/
lm), [regression
output](https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf)
(to visualize the regression output of stargazer, insert an executable cell of R
code with this preamble: {r NAME_OF_CODE, results = "asis"}), [data visualization]
(https://ggplot2.tidyverse.org/).

**Rules**: You are allowed to work in groups of up to three members. To address


challenges and write the code, you are encouraged to seek help and support from
internet sources, blogs, websites, and AI tools such as ChatGPT, Claude or MS
Copilot. Simon and I will not respond to questions related to coding or its
interpretation. This assignment accounts for 20% of your total grade.

**Deadline**: Friday, January 31, 2025 (31.01.2025)

## First task

Remove [Name Surname] and replace it with the names from the members of your group
at the beginning of this file.

# Research Project: Peer Effects in Academic Productivity

## Research Question

"In the social sciences, the term “peer effects” has been widely used to describe
the various ways in which individual behaviors and attitudes can be influenced by
friends, acquaintances, and the wider social environment"
([Source](https://oxfordre.com/education/display/10.1093/acrefore/
9780190264093.001.0001/acrefore-9780190264093-e-849?d=%2F10.1093%2Facrefore
%2F9780190264093.001.0001%2Facrefore-9780190264093-e-
849&p=emailAwDWIpvnTJaIY#:~:text=Summary,and%20the%20wider%20social
%20environment.)).

What is the relationship between peers and individual productivity? Do teams with
highly productive members boost the productivity of their members? What about
institutions or firms? These are **central** questions in economics and business.
If interested, see this paper:
[link](https://drive.google.com/file/u/0/d/1fQwh6KVsd-p-DG-i-l4ydGv-lDPXGHRa/view?
usp=sharing&pli=1) (Teams: Heterogeneity, Sorting and Complementarity).

In this project, we examine the influence of peers within academic institutions


(universities in the United States) on individual academic productivity, measured
through a publication index (the higher the index, the higher the academic output).
Mind that this framework can also be extended to other contexts: how productive co-
workers affects one's key performance metrics, how peers in school affect one's
grades or test scores, how neighbors or friends affect consumption or location
choices.

## Data

From Baser and Pema (2003):


[link](https://cran.r-project.org/web/packages/wooldridge/wooldridge.pdf) (pag. 20)

At this [link](https://cran.r-project.org/web/packages/wooldridge/wooldridge.pdf),
you can find the **description** of each variable and the overall dataset.

If you have problem using the .csv file on Moodle, you can also use the R package
"Woolridge" and run data('big9salary').

# Homework Tasks

Read each task carefully and evaluate both your code and the interpretation of the
output.

## 1. Packages

### 1.a Load packages

At first, it is crucial to load important packages from R (code is given):

```{r}
library(tidyverse)
library(ggplot2)
library(stargazer)
library(plm)
```

If these packages are not present in your library, you **should install** them. You
can find detailed descriptions of these packages online.

## 2. Describe the dataset

In this first section, provide a brief description of the dataset and include
summary statistics for the key variables. Additionally, visualize the data to
illustrate the correlation between these key variables.

### 2a. Load the dataset


- Load the dataset from your current working directory and visualize the first 5
lines:

```{r}
# Load dataset big9salary.csv from your own path.
# HINT: replace
# "C:/Users/Lorenzo Navarini/Desktop/teaching/homework/phd_returns/"
# with the folder path where you placed your file csv

data <- read_csv('C:/Users/Lorenzo


Navarini/Desktop/teaching/homework/phd_returns/big9salary.csv')

# Visualize the first lines of the dataset


```

### 2b. Summary statistics of important variables

- Obtain the summary statistics for the key variables for this analysis: pubindx
and lsalary. Moreover, produce an histogram of pubindx and interpret it, by looking
at the mean and the median of pubindx (as included in the summary statistics).

- In order to aid the analysis and the interpretation, generate pubindx10, a


variable where pubindx is divided by 10, and show the resulting summary statistics.

- Generate year dummy (call them year95 and year99):

```{r}
data$year95 <- data$year==95
data$year99 <- data$year==99
```

### 2c. Correlation between Academic Productivity and Salary

- Produce a scatter plot with a fitted line between pubindx and lsalary.
Moreover, compute the correlation between pubindx10 and salary. Interpret it.

- Estimate two models: (1) regress lsalary on pubindex10 with a simple OLS
(pooled cross-sectional data), and (2) use a fixed effects estimator to regress
lsalary on pubindex10. Interpret the difference between these two outcomes. (The
code is given, but it is based on pubindx, and not pubindx10. You can use the same
structure of the code for obtaining the output for different regressions.)

```{r}
# HINT: lm for pooled cross-sectional data, plm for panel data fixed effects
lsalary_pubindx10 <- lm(lsalary ~
pubindx, data = data)

lsalary_pubindx10_fe <- plm(lsalary ~


pubindx,
data = data,
index = c("id", "year"),
model = "within")

```

```{r, results = "asis"}

stargazer(lsalary_pubindx10, lsalary_pubindx10_fe,
header=FALSE, float=FALSE,
notes="Numbers in parentheses are standard errors.",
omit.stat = c("adj.rsq", "ser", "f"), no.space = TRUE)
```

- Now, include in both specifications also controls for occupation (assoc, prof,
chair), gender (female), experience and experience squared, and time fixed effects.
Interpret the table. To put things into perspective, what is the salary premium,
ceteris paribus, to someone who go from 0 to 3.5 (the mean) in the publication
index? Interpret the wage premium.

## 3. Clean the dataset

### 3a. From dummy to categorical

- In the dataset, you have dummy variables for each institution (osu, iowa,
indiana, purdue, ...). Create a single categorical variable, such that each
category is linked to one university dummy.

```{r}
# HINT: execute a for loop, where you loop
#over the list of universities (uni_list) and
# you assign each category to a single variable

# University list: osu iowa indiana purdue msu minn mich wisc illinois
uni_list <- c("osu", "iowa", "indiana", "purdue", "msu", "minn", "mich", "wisc",
"illinois")
peers_effect <- list()

```

- Describe the new variable and check that you correctly constructed a
categorical variable (for instance, compare a tabulation of your categorical
variable with a tabulation of a single dummy variable, such as osu, and check the
number of observations).

## 4. Peer effects

- At first, compute an average publication index by **institution** and **year**


and use it as a proxy for peer effects. Give an interpretation to this proxy. (See
the code in the last step of this homework for inspiration.) Regress the
publication index on university dummies and interpret the results.

- Regress the publication index on university dummies and peer effects and create
a table where you present the results compared to the first model. Interpret the
results. Eventually, explain what variation are you capturing with this measure of
peer effects relative to the university dummies.

- Check the variation within each university in the peer effects by producing a
box plot of peer effects over the university variable. Interpret it.

- Is there a way to compute peer effects differently? You can try the so-called
leave-one-out mean. You compute individual peer effects which are year-specific, by
removing the individual year-specific observation from the year-specific peer
effects computation. The code is given, interpret it and give the intuition behind
it. For instance, how do you remove the individual from the peers effect
computation? Describe the summary statistics and the relative histogram.
(IMPORTANT: remove " , eval=FALSE" from the code to execute the code in the final
file. Be careful about the names of the variables and the name of your data.)

```{r , eval=FALSE}
# Leave-one-out mean code (remove eval=FALSE)
data$peers_effect_loo <- 0
peers_effect_loo <- list()

for (uni in uni_list) {


for (i in 1:nrow(data)) {
data$temp <- data$pubindx10
data$temp[i] <- NA
tempy <- data$year[i]
peers_effect_loo[[uni]] <- mean(data$temp[data[[uni]] == 1 & data$year ==
tempy],na.rm = TRUE)
if (data[[uni]][i] == 1 & data$year[i] == tempy) {
data$peers_effect_loo[i] <- peers_effect_loo[[uni]]
}
}
}

hist(data$peers_effect_loo)
summary(data$peers_effect_loo)

```

- Generate a box plot of the leave-one-out peer effects. Interpret the difference
with the average peer effects and the relative variation within institutions.

- Now, that you have a new peer effect measure (peers_effect_loo), regress again
pubindx10 on university dummies and leave-one-out peer effects (peers_effect_loo).
Interpret the results and give the intuition about the differences with the
previous results.

## 5. Compute the peer effect on publication index

- Regress again pubindx10 on leave-one-out peer effects (peers_effect_loo), but


taking into account controls, such as gender, role, experience and experience
squared, and year fixed effects. Moreover, estimate the same specification using a
fixed effect estimator. Interpret the results and the relative difference for
**each** coefficient. Give also a detailed description of the difference between a
pooled cross-sectional and the fixed effect estimator.

You might also like