0% found this document useful (0 votes)
15 views3 pages

Annotated-Lab 1 Spring 2025 Assignment - RMD

The document is an R Markdown file for a lab assignment that involves data analysis using R. It includes tasks such as reporting variable names from a COVID dataset, summarizing wage statistics from CPS data, and visualizing participation in girls' sports. The document also contains R code chunks for data manipulation and analysis, along with instructions for submission.

Uploaded by

warnertrey07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Annotated-Lab 1 Spring 2025 Assignment - RMD

The document is an R Markdown file for a lab assignment that involves data analysis using R. It includes tasks such as reporting variable names from a COVID dataset, summarizing wage statistics from CPS data, and visualizing participation in girls' sports. The document also contains R code chunks for data manipulation and analysis, along with instructions for submission.

Uploaded by

warnertrey07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

---

title: "Lab 1"


author: "Trey Warner"
date: "`r Sys.Date()`"
output:
html_document:
toc: yes
toc_float: yes
pdf_document:
toc: yes
---

```{r}
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(eval = TRUE)
knitr::opts_chunk$set(include = FALSE)
```

```{r}
library(tidyverse)
library(haven)
library(kableExtra)
library(lubridate)
```

```{r}
girls_participation <- read.csv("~/downloads/girls_participation")
```

```{r}
cps_2018 <- read_dta("~/downloads/cepr_org_2018.dta")
```

```{r}
cps_2019 <- read_dta("~/downloads/cepr_org_2019.dta")
```

```{r}
covid_data <- read.csv("~/downloads/United_States_COVID-
19_Cases_and_Deaths_by_State_over_Time_-_ARCHIVED.csv")

```

Create an R Markdown file that completes each of the listed tasks, **Submit both
the markdown file and the output on Carmen**. Submit the markdown file first so
that you can see the output document when you look at most recent submission.

Each question will be worth 10 points and 10 points will be awarded for getting the
entire document to knit. If you have a code chunk that you cannot get to run, then
you can mark the code as echo=TRUE and eval=FALSE. This should show the code, but
not run it.

## 1. Report a list of variable names for the Covid data from the class 1
exercises.

```{r echo=TRUE, include=TRUE}


names(covid_data)
```
## 2. Find the summary statistics of the four wage variables in the CPS data
(cepr_org_2019).

```{r echo=TRUE, include=TRUE}


summary(cps_2019$wage4)
```

## 3. What is the average number of weekly new cases in Ohio according to the CDC
Covid data?

```{r echo=TRUE, include=TRUE}


covid_data[covid_data$state == 'OH', ]$new_case %>% mean()
```

## 4. Use the CPS data to find the average wage4 by race (wbhaom). You will need to
add the remove missing option to your commands. For example

```{r echo=TRUE, include=TRUE}


group_by(cps_2019, wbhaom) %>% summarize(mean(wage4, na.rm=TRUE))
```

## 5 Do all states have the same number of observations in the Covid data?

```{r echo=TRUE, include=TRUE}


(covid_data %>% group_by(state) %>% summarize(length(state)))$`length(state)` %>%
unique() %>% length() == 1
```

## 6 What was the most popular girl's sport in 2018?

```{r echo=TRUE, include=TRUE}


girls_participation %>% filter(year == 2018) %>% select(-year) %>%
summarise(across(everything(), ~ if(is.numeric(.)) sum(., na.rm = TRUE))) %>%
summarise(names(.)[which.max(.)]) %>% pull
```

## 7. The Tidyverse lab starts with the code necessary to creat a bar graph of
football participation in 2010. Use that code as a template to make a graph of
volleyball participation for girls in 2015 in the 5 states that have the most
participants.

```{r echo=TRUE, include=TRUE}


girls_participation %>% filter(year == 2015) %>% select(X, volleyball) %>%
arrange(desc(volleyball)) %>% head(5) %>% ggplot(aes(X, volleyball)) + geom_col()
```

## 8. Find a dataset on <https://data.fivethirtyeight.com/> or


<http://www.masteringmetrics.com/resources/>. Import the data, list variable names,
and find basic summary statistics.

```{r}
pres <- read.csv("~/downloads/polls/president_polls.csv")
```

```{r echo=TRUE, include=TRUE}


names(pres)
```
```{r echo=TRUE, include=TRUE}
pres %>% as_tibble %>% select(where(is.numeric)) %>% summary
```

## 9. Find a package at
<https://cran.r-project.org/web/packages/available_packages_by_name.html> that
contains some data or code to scrape data from another source. Open some data and
create a summary of at least 1 variable. (There are packages that can retrieve
census data, world bank data, Covid data, sports data, and if you want to be boring
you can even use the wooldridge data.)

get average S&P daily return

```{r}
library(quantmod)
getSymbols('^GSPC')
```

```{r echo=TRUE, include=TRUE}


GSPC %>% as_tibble %>% mutate(ret=(GSPC.Close / lag(GSPC.Close)))%>%
summarize(mean(ret, na.rm=TRUE)) %>% .[[1]]
```

You might also like