Data-Analysis-using-R
Data-Analysis-using-R
2025-04-14
R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF,
and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the
output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that
generated the plot.
Data Source
library(wooldridge) library(tidyverse) library(stargazer)
data(“wage1”)
Data Overview
head(wage1) summary(wage1) str(wage1)
Descriptive Statistics
summary(wage1 %>% select(wage, educ, exper, tenure))
Data Visualization Histogram of Wage
ggplot(wage1, aes(x = wage)) + geom_histogram(aes(y = ..count..), binwidth = 1, color = “black”, fill =
“blue”) + labs(title = “Histogram of Wage”, x = “Wage”, y = “Frequency”)
Scatterplot of Wage vs. Education
ggplot(wage1, aes(x = educ, y = wage)) + geom_point() + labs(title = “Scatterplot of Wage vs. Education”,
x = “Education (Years)”, y = “Wage”) + geom_smooth(method = “lm”, se = FALSE, color = “red”)
Regression Analysis
Bivariate Regression: Wage on Education
reg1 <- lm(wage ~ educ, data = wage1) summary(reg1)
Bivariate Regression: Wage on Experience
reg2 <- lm(wage ~ exper, data = wage1) summary(reg2)
Multivariate Regression: Wage on Education, Experience, and Tenure
reg3 <- lm(wage ~ educ + exper + tenure, data = wage1) summary(reg3)
Regression Table
stargazer(reg1, reg2, reg3, title = “Regression Results”, type = “text”, dep.var.labels = “Wage”, covari-
ate.labels = c(“Education”, “Experience”, “Tenure”))
Conclusion
R Code Explanation and Important Considerations
1
• R Markdown/Quarto Setup:
– The YAML header (the part between the ---) sets the title, author, date, and output format of
the document.
– knitr::opts_chunk$set(echo = TRUE) ensures that R code is shown in the output document.
• Loading Packages and Data:
– library(wooldridge) loads the wooldridge package.
– library(tidyverse) loads the tidyverse package, which includes ggplot2 for plotting.
– library(stargazer) loads the stargazer package for creating regression tables.
– data("wage1") loads the wage1 dataset.
• Data Overview:
– head(), summary(), and str() provide initial information about the data.
• Descriptive Statistics:
– summary(wage1 %>% select(wage, educ, exper, tenure)) calculates descriptive statistics for
the selected variables. The %>% is the pipe operator from tidyverse, making the code more
readable.
• Data Visualization:
– ggplot2 is used to create the histogram and scatterplot. It’s part of the tidyverse.
– In the histogram, aes(y = ..count..) ensures that the y-axis shows the frequency.
– In the scatterplot, geom_smooth(method = "lm", se = FALSE, color = "red") adds a linear
regression line.
• Regression Analysis:
– lm(wage ~ educ, data = wage1) performs a linear regression of wage on educ.
– summary(lm_model) displays the regression results (coefficients, t-statistics, p-values, R-squared).
– I’ve included interpretations of the regression output within the text. This is crucial!
• Regression Table:
– stargazer() creates a formatted regression table. The type = "html" argument is suitable for
display in a web browser or for including in an HTML document. You can change it to "text"
for plain text output or "latex" for LaTeX output (if you’re using LaTeX). covariate.labels
relabels the variables in the table.
• Inline R Code:
– I’ve used inline R code (e.g., \${r mean(wage1$wage)}) to insert calculated values directly into
the text. This makes the report dynamic and ensures that the numbers are consistent with the
analysis.
• Interpretation: I’ve provided detailed interpretations of the statistics and regression results. This is
essential for the assignment.
• Reproducibility: The R Markdown/Quarto document is reproducible because it contains both the
code and the narrative. If you run the document, you’ll get the same results.
1. Save: Save the code as an R Markdown file (e.g., assignment.Rmd) or a Quarto file (assignment.qmd).
2. Install Packages: Make sure you have the necessary packages installed: r install.packages(c("wooldridge",
"tidyverse", "stargazer", "knitr"))
3. Run the Document: In RStudio, open the R Markdown/Quarto file and click the “Knit” button (or
use the rmarkdown::render() or quarto::quarto_render() function in the console) to generate the
output document (PDF, Word, or HTML).
2
4. Adapt:
• Dataset: If you want to use a different dataset, change the data("wage1") line and adjust the
variable names in the code accordingly. Use data(package = "wooldridge") to see a list of the
datasets.
• Variables: Select different variables for your descriptive statistics, visualizations, and regressions.
• Interpretations: Modify the interpretations to match your chosen dataset and variables.
• Output Format: Change the format in the YAML header if you want a different output format
(e.g., format: word).
• Title/Author/Date: Update the title, author, and date.
This comprehensive example should give you a very strong starting point for your assignment! Remember
to adapt it carefully to your chosen dataset and provide thorough interpretations.