0% found this document useful (0 votes)
5 views

Problem Set

Problem Sey

Uploaded by

nsv73359
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Problem Set

Problem Sey

Uploaded by

nsv73359
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

EC331 - Introduction to Econometrics

Homework 2
Please submit the problem set to [email protected] no later than 17:00 on
Wednesday, 2nd December. Failure to do so will will result in 50% reduction in your assignment
grade. To gain full credit, please attempt all parts of all questions to the best of your ability. You
can work in groups of up to 6 people. Each group should return only 1 assignment with all group
members’ names written on it. Good luck!

1. The first question aims at learning how to get familiar with a data set. On the Moodle, you
can find the Titanic data. Please use R to answer the questions and turn in your R code
together with your answers

a) describe the data set. How many observations and how many variables does it have?

b) summarize the variable age. What is the mean and standard deviation of this vari-
able? What is the min and the max?

c) summarize the variable survived. What is the mean of this variable? Why does it
have more observations than age? (Hint: try to list the variable if you are puzzled by
the last question.)

d) tabulate the variable pclass. What fraction of passengers were travelling in each
class?

e) Cross-tabulate class and survived (tab pclass survived). Which type of pas-
sengers had better chances of surviving?

f) Generate a new variable for passengers younger than 18 (you might call it children. Note
that we do not observe the age for all passengers, therefore you need to be careful when
generating this variable (i.e. dealing with missing values). What fraction of passengers
were children? What can you say about the survival rates of children compared to
older passengers?

h) Which proportion of male and female passengers survived in each class? (tab pclass
gender, summarize(survived))

2. For this question we will use a dataset from a randomized experiment conducted by Mari-
anne Bertrand (University of Chicago) and Sendhil Mullainathan (Harvard), who sent 4,870
fictitious resumes out to employers in response to job advers in Boston and Chicago in 2001.
The resumes differ in various attributes including the names of the applicants, and different
resumes were randomly allocated to job openings. Some of the names are distinctly white
sounding and some distinctly black sounding. The researchers collecting these data were
interested to learn whether black sounding names obtain fewer callbacks for interview than
white names.
Download the data set bm.xls from moodle.

1
a) The data set contains a dummy variable (0-1 variable) for whether the applicant has
computer skills (computerskills). Tabulate this variable by black to find the cross-
tabulation of computerskills and race, and display the percentages of computer skills in
each race group. Do computer skills look balanced across race groups?
b) Another way to calculate the fraction of black and white resumes with computer skills
is simply to compute the mean in these subgroups. Carry out a t-test for whether mean
computer skills are the same for blacks and whites.
Report
1. the mean of the variable for whites.
2. the mean of the variable for blacks.
3. the difference in the means.
4. the t-statistic.
5. the p-value for the null hypothesis that the two means are the same.
Do you find any evidence that this variable differs for whites and blacks in the resume
data?
c) The data set contains a variable education, which takes on four values (high school
droupouts – “HSD”; high school graduates – “HGS”, some college – “some col”, and
college degree – “col +”). It also contains a category for resumes not reporting any
education. Use the education variable to create a new dummy for resumes indicating
some college or more (i.e., those in the some college category plus those in the college
and more category). What fraction of respondents has at least some college education?
Carry out a t-test for the equality of means of your some college or more variable. Does
education look balanced across race groups in the resumes?
d) What do you make of the overall results on resume characteristics? Why do we care
about whether these variables look similar across race groups?
e) The variable of interest on the data set is the variable call, which indicates a call back
for interview. Carry out a t-test for the equality of the call back rates by race. Do you
find differences in call back rates by race?
f) What do you conclude from the results of the Bertrand and Mullainathan experiment?

3) In the previous question, we looked at the experimental Bertrand and Mullainathan resume
data. For this question download the data set cps.dta, which comes from the response to
the monthly US Current Population Survey (CPS) in 2001, a large labour market survey.
This data set contains data on 8,891 individuals living in Boston and Chicago. We want to
use these data to compare the skills of real live blacks and whites, and their employment
outcomes and see how they differ from the resume findings in problem set 2.

a) The data set contains a variable education, which take on four values (high school
dropouts, high school graduates, some college, and college degree and more). Use the
education variable to create a new dummy variable indicating some college or more (i.e.,
those in the some college category plus those in the college and more category). What
fraction of respondents have at least some college education?

b) Carry out a t-test for whether the mean of your some college or more variable is the
same for blacks and whites. You can do this with the ttest command. E.g., if your

2
variable is called somecol, you would type,

Report
1. the mean of the variable for whites.
2. the mean of the variable for blacks.
3. the difference in the means.
4. the t-statistic.
5. the p-value for the null hypothesis that the two means are the same.

Do you find any evidence that this variable differs for whites and blacks in the CPS data?

c) Calculate the t-statistics for the difference in means for the years of experience variable
(yearsexp). Do you find evidence that this variable differs significantly for whites and
blacks?

d) Discuss your results from (b) and (c). Why do your conclusions for the education and
experience variable differ? Why do we care about whether these variables look similar
by race?

e) Calculate the t-statistics for the difference in means for whether the individual has a job
(employed). Do you find any evidence that this variables differs significantly for whites
and blacks?

f) In light of your results from (d) and (e), what do you think we can conclude about
racial discrimination in employment from the CPS data? Why might we compare the
call back results from the resume data to the employment results in the CPS data to
draw conclusions about discrimination?

You might also like