0% found this document useful (0 votes)
28 views

Problem Set #1

The document contains instructions for completing an econometrics problem set analyzing a dataset of US labor market statistics related to women's wages. 1. The dataset was analyzed using histograms, z-scores and linear regression models to examine the relationship between wages, education, experience and race. 2. A multiple linear regression model found that years of education, experience and being black all significantly impacted wages, with black women earning less on average than non-black women. 3. However, it is noted that other undisclosed factors beyond just race could also contribute to the observed wage differences between black and non-black women.

Uploaded by

cflores48
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Problem Set #1

The document contains instructions for completing an econometrics problem set analyzing a dataset of US labor market statistics related to women's wages. 1. The dataset was analyzed using histograms, z-scores and linear regression models to examine the relationship between wages, education, experience and race. 2. A multiple linear regression model found that years of education, experience and being black all significantly impacted wages, with black women earning less on average than non-black women. 3. However, it is noted that other undisclosed factors beyond just race could also contribute to the observed wage differences between black and non-black women.

Uploaded by

cflores48
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Econometrics: Problem Set

ESADE Fall 2023

Completed by: Cristina Flores, Chiara Sartori, and Cezembre de Lesquen

Data: Dataset is a representative sample of US labor market statistics related to women.


lwage = logarithm of wage
yrs_school = years of schooling
ttl_experience=total work experience
black=dummy variable that takes value 1 if women is black

Instructions:
1. Upload file nlsw88.csv into R
> nlsw88 <- read.csv('nlsw88.csv')
> View(nlsw88)

2. Make a histogram of lwage variable. Do you have any outliers?

To create the histogram we used the following code.


> hist(nlsw88$lwage,
+ xlab = "lwage",
+ main = "Histogram of Logarithm of Wage Distribution",
+ breaks = sqrt(nrow(nlsw88)) # set number of bins
+ hist(nlsw88$lwage)

To identify if we have outliers we used the following code.


> z_score <-(nlsw88$lwage-mean(nlsw88$lwage))/sd(nlsw88$lwage)
> outliers<-abs(z_score)>1.96
> print((length(nlsw88$lwage[outliers])))

> ncol(nlsw88)*nrow(nlsw88)

These are the two histograms:

Image 1 Image 2

The reason why these two histograms have outliers can be identified by looking at the
histogram and observing that some bars are significantly taller and shorter than the majority. On
the other hand we used bins to experiment with bin widths to highlight details or emphasize the
overall distribution which clearly can be seen in Image 2.

The z-score method gave us the following answer:

[1] 106

[1] 8984

So of 8984 values 106 are outliers.

Therefore, out of 8984 values, 106 are outliers meaning that, based on a certain z-score
threshold, 106 values in the dataset deviate significantly from the mean.

3. Estimate the following model 1, using Ordinary Least Squares (OLS):


To estimate the following model 1 we used the following code.
> model_1 <-lm(lwage ~ yrs_school,data=nlsw88)
> summary(model_1)
> coefficients(model_1) # model coefficients
>(Intercept) yrs_school
0.65257774 0.09291988

4. What is your point OLS estimate of beta_1 hat? Construct 99% confidence interval
for beta_1 hat?

To construct the 99% confidence interval for beta_1 hat we used the following code.
> conf_interval<-confint(model_1,level = 0.99)
> print(conf_interval)

We proceed by examining the 0.5% and 99.5% values of the variable "yrs_school" and
subsequently formulate the corresponding interval.

0.5 % 99.5 %
(Intercept) 0.50364239 0.8015131
yrs_school 0.08174972 0.1040900

Therefore, OLS Point estimate of beta 1 hat = 0.09292


0.08174972 <= Beta_1 Hat <= 0.1040900

5. Compute the covariance between lwage and yrs_school variables. Compute the
variance of yrs_school variable. Estimate beta_1 hat coefficient using the statistical
measures you have computed in this step.

To create a covariance matrix we use the following code.


> covariance <- cov(Book2$lwage,Book2$yrs_school)
The covariance between lwage and yrs_school is = 0.6043267

To calculate the variance of the variable yrs_school we used the following code.
> var(Book2$yrs_school)
The variance of the yrs_school variable is = 6.50374

To calculate the estimated beta_1 hat coefficient using the statistical measure computed we used
the following code.
> cov(Book2$lwage,Book2$yrs_school)/var(Book2$yrs_school)

The estimatebeta_1 hat coefficient is= 0.09291988

6. “For any simple linear regression, the model forecast for mean value of the
regressor is the mean value of y variable”. Statement is TRUE or FALSE? Explain
briefly.
We know that the model of a simple linear regression looks like this:

And the model forecast for the mean value of the regressor is:
To determine whether it matches the mean value of Y, you must provide its expression:

It's important to note that in a simple linear regression model, there is an assumption that the
error term 𝑢 has a mean of zero, expressed as E[𝑢] = 0.

Therefore,the statement is TRUE.


7. Compute residuals and report the average of the residuals.
To compute the residuals and report the average of the residuals we used this code.
> resid <- residuals(model_1) # residuals
> avg_residual<- sum(resid)/nrow(nlsw88)
> print(avg_residual)

The residual and the average of the residual is: 5.535513e-18

8. Estimate multiple linear regression: regress logarithm of wage on years of schooling


(yrs_school), total work experience (ttl_experience), and dummy variable (black).
Model 2.
To estimate the following model 2 we used the following code.
> model_2 <- lm(lwage ~ yrs_school + ttl_exp + black, data = nlsw88)
> summary(model_2)

The estimate multiple linear regression is:

9. Interpret all the regression coefficients.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.560314 0.101986 -25.105 < 2e-16 ***
yrs_school 0.132494 0.007284 18.189 < 2e-16 ***
ttl_exp 0.069832 0.003984 17.527 < 2e-16 ***
black -0.188827 0.041584 -4.541 5.9e-06 ***
In a multiple linear regression analysis predicting the logarithm of wages (lwage) based on years
of schooling (yrs_school), total work experience (ttl_exp), and a dummy variable indicating
whether a woman is black (black), the interpretation of the regression coefficients is as follows:

● Intercept: The expected logarithm of pay is 0.397540 when all selected independent
variables are set to 0.
● Years_school: Holding all other variables constant, an increase of one unit in years of
schooling raises the expected logarithm of pay by 0.076128, on average.
● Ttl_exp: Holding all other variables constant, an increase of one unit in total work
experience raises the expected logarithm of salary by 0.040124, on average.
● Black: On average, a black woman (with the dummy variable set to 1) earns 0.108496
less than a non-black woman, assuming all other independent variables remain constant.

The statistical significance of these coefficients is indicated by the t-values, which are
significantly far from zero, and the very small p-values for small alpha levels. Consequently, we
can confidently reject the Null Hypothesis, which posits that the true value of each coefficient is
zero.

10. Could you claim that there is a racial discrimination based on women race?
In the multiple linear regression analysis presented earlier, the dummy variable "black"
exhibits a negative coefficient of -0.108496. This implies that, on average, the logarithm of
wages for black women tends to be lower. The significance of this negative coefficient, at a very
low alpha level, provides initial evidence supporting the assertion that black women earn less
than their non-black counterparts.

However, it's crucial to acknowledge that factors beyond race may contribute to this
observed difference. Incorporating additional potential explanatory variables into the model
would be enlightening. Examining whether the "black" dummy variable remains statistically
significant after considering factors such as the specific professions pursued, educational
attainment, cost of living at their place of employment, and others, is essential. This investigation
aims to discern whether the lower average salary for black women is predominantly influenced
by their race or if other variables play a significant role.

A more conclusive understanding of the impact of race on earnings will only emerge
when the model incorporates additional variables beyond year of education and total experience,
and still yields a statistically significant coefficient for the "black" dummy variable.

You might also like