0% found this document useful (0 votes)

19 views15 pages

GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024

The document outlines solutions for Problem Set 3 in an Applied Econometrics course, focusing on building a small dataset, exploring income patterns, and analyzing regression results. It includes tasks such as generating summary statistics, running regressions with robust standard errors, and interpreting coefficients related to income determinants. The document emphasizes the importance of understanding omitted variable bias and the limitations of drawing causal conclusions from cross-sectional data.

Uploaded by

AmRonPaulian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views15 pages

GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024

Uploaded by

AmRonPaulian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Econ 535, Applied Econometrics Problem set 3 - SOLUTIONS Due

7pm, March 25

Part 1: Building a Small Data Set and Revisiting Omitted Variable Bias

There are multiple sources of data available online (e.g.

https://ourworldindata.org/, https://www.worldvaluessurvey.org/wvs.jsp,
https://data.worldbank.org/, etc). In this part of the problem set you will find data
yourself on one of these, or some other, site. The total number of observations can
vary greatly – but should be no less than N=50 (but it can be much higher,
depending on your choice of data). You will build a small data set that should
include at least three variables, selected as follows:

 Start by identifying two variables where you would hypothesis that one is
causally related to the other: one outcome variables, Y , and one independent
variable X 1.
 Then identify a second independent variable X 2 . X 2 should also be such that
your hypothesis is that it is related to the outcome variable Y .

Once you have put together your data set, start by getting to know your data set by
reporting summary statistics for your (at least three) variables: min, max, mean,
standard deviation, etc.

Thereafter, run three regressions (using robust standard errors): one where you
regress Y on X 1 , one where you regress Y on X 2 , and one where you regress Y on
both X 1 and X 2 . For all regressions, use robust standard errors. Report the results
either using the Stata output, or in a table similar to what you would see in an
academic paper. Interpret all coefficients and discuss their statistical significance.
Observe how the coefficient on X 1 and X 2 changes (or not) between the binary and
the multiple regressions, and discuss the change in terms of the existence (or not)
of omitted variable bias.

(5 points) Answers will vary depending on the data set chosen.

Part 2: Explaining Income Patterns

In this assignment, you will explore the determinants of income using data from
the National Longitudinal Survey of Youth. Download the file PS3_Data.dta, which
contains a sample of about 1000 respondents. This data set began with roughly
12,000 American teenagers about 20 years ago and has been following them since.
Respondents are classified as white, black or Hispanic.

1. First, get to know your data set.

a) (1 point) First show summary statistics by using the command “sum”

(summarize). What fraction of the sample is female? What is the average
age? What are the minimum, maximum and average monthly incomes in the
sample? What fraction is black and what fraction is Hispanic?
sum

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

age | 995 20.2603 1.576226 16 23

black | 995 .0854271 .2796568 0 1

hispanic | 995 .0572864 .232506 0 1

income | 995 887.7364 344.2031 258.9612 2618.175

single | 995 .7366834 .4406542 0 1

-------------+---------------------------------------------------------

married | 995 .2633166 .4406542 0 1

yrs_educ | 995 11.95678 1.50557 4 17

urban | 995 .801005 .399445 0 1

male | 995 .5366834 .4989033 0 1

46.2% of the sample is female. The average age is 20.26 and the minimum,
maximum and average incomes in the sample are $258.96, $2618.18 and $887.74
respectively. 8.5% of the sample is black, and 5.7% of the sample is Hispanic.

b) (1 point) Now generate a new variable called “minority” that is equal to 1 if

a person is black or Hispanic, and 0 otherwise. What fraction of the sample
is made up of whites?

gen minority=black==1|hispanic==1

. sum minority

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

minority | 995 .1427136 .3499564 0 1

85.7% of the sample is made up of whites.

2. (2 points) Regress income on the variables male, minority and years of
education. For this and all subsequent regressions, use robust standard errors.
Report your results. Which coefficients are statistically significant?
reg income male minority yrs_educ, robust

Linear regression Number of obs = 995

F(3, 991) = 50.50

Prob > F = 0.0000

R-squared = 0.1329

Root MSE = 321

------------------------------------------------------------------------------

| Robust

income | Coefficient std. err. t P>|t| [95% conf. interval]

-------------+----------------------------------------------------------------

male | 233.9662 20.32987 11.51 0.000 194.0717 273.8608

minority | -33.09229 29.04318 -1.14 0.255 -90.08548 23.9009

yrs_educ | 44.10299 6.514053 6.77 0.000 31.32007 56.88591

_cons | 239.5634 79.93118 3.00 0.003 82.70959 396.4172

The coefficients on male and yrs_educ are significant at the 1% level (at least). The
coefficient on minority is not statistically significant.

3. (2 points) Now regress income on the variables male, minority, years of

education and married. Which coefficients are significant and at what level? Also,
what is the estimated change in income associated with an additional year of
education for males, and for females?

reg income male minority yrs_educ married, robust

Linear regression Number of obs = 995

F(4, 990) = 39.10

Prob > F = 0.0000

R-squared = 0.1397
Root MSE = 319.89

------------------------------------------------------------------------------

| Robust

income | Coefficient std. err. t P>|t| [95% conf. interval]

-------------+----------------------------------------------------------------

male | 237.9132 20.4348 11.64 0.000 197.8127 278.0137

minority | -24.59293 29.00056 -0.85 0.397 -81.50256 32.3167

yrs_educ | 46.32538 6.546817 7.08 0.000 33.47815 59.17261

married | 65.27789 23.16271 2.82 0.005 19.82424 110.7315

_cons | 192.4708 81.68977 2.36 0.019 32.16577 352.7758

------------------------------------------------------------------------------

Now the coefficients on male, married, and years of education are significant at the
1% level (at least). The coefficient on “minority” is not significant.
The return to a year of schooling for men and women would be 46.33. (We cannot
know whether the return to a year of schooling would be different for men and
women. To know this, we have to create a new variable (gen male_educ = male *
years_educ) and add it to the list of explanatory variables, but you are not asked to
do that here.)
4. (3 points) Now generate a new variable that is equal to the interaction of the
dummies for “male” and “married” (call it male_married). Run the same regression
as in 3, but include your new variable among the independent variables. Interpret
the coefficient on married. Interpret the coefficient on male. Interpret the
coefficient on the interaction of male and married. Which of these three
coefficients are statistically significant, and at what level?
gen male_married=male*married

. reg income male minority yrs_educ married male_married, robust

Linear regression Number of obs = 995

F(5, 989) = 32.03

Prob > F = 0.0000

R-squared = 0.1459

Root MSE = 318.9

------------------------------------------------------------------------------

| Robust

income | Coefficient std. err. t P>|t| [95% conf. interval]

-------------+----------------------------------------------------------------

male | 205.1602 23.37881 8.78 0.000 159.2824 251.0379

minority | -26.09316 28.87601 -0.90 0.366 -82.75845 30.57213

yrs_educ | 46.11461 6.578909 7.01 0.000 33.20438 59.02483

married | 2.281599 22.58367 0.10 0.920 -42.03582 46.59902

male_married | 122.9917 44.75587 2.75 0.006 35.16428 210.819

_cons | 213.3017 81.60683 2.61 0.009 53.15929 373.4441

The coefficient on married decreases from about 65.3 to 2.3, and is not statistically
significant anymore. This coefficient (2.3) can be interpreted as the predicted
difference in income between the married women and single women in our sample.
The coefficient on male is 205.2 and can be interpreted as the predicted difference
in income between single men and single women. It is statistically significant at
the 1% level (at least). The coefficient on male_married is 123, and is statistically
significant at the 1% level. This coefficient (123) can be interpreted as telling us
that the predicted change in income going from single to married depends on
whether you’re male or female.

5. (2 points) A policy maker comes across your results and notes that married
young people earn more than unmarried young people. She therefore suggests that
it would be good policy to promote marriage among young members of the labor
market. Is this a correct conclusion to draw? Why or why not?

This is probably not a causal relationship. It could, for example, be the case that
young people wait to get married until they have enough income to support a
partner or a family. If this is the case, than the causality would run in the opposite
direction – i.e. higher income would cause marriage and not the other way around.
There are several examples like this that would make for good answers. In general,
because this is simply cross-sectional data, we can’t prove causality, so it would be
irresponsible to encourage policies that don’t have a demonstrated causal effect on
our outcome of interest (here, income).

6. (2 points) Generate a variable called min_educ, the interaction between minority

and educ. Regress income on minority, educ and min_educ. What is the estimated
return to a year of schooling for whites? For minorities? Is there evidence to
suggest that this return is different for whites than for minorities in this sample?
gen min_educ= minority*yrs_educ

. reg income minority yrs_educ min_educ, robust

Linear regression Number of obs = 995

F(3, 991) = 10.31

Prob > F = 0.0000

R-squared = 0.0213

Root MSE = 341.03

------------------------------------------------------------------------------

| Robust

income | Coefficient std. err. t P>|t| [95% conf. interval]

-------------+----------------------------------------------------------------

minority | -18.62319 147.1104 -0.13 0.899 -307.3069 270.0606

yrs_educ | 31.04021 7.919749 3.92 0.000 15.49881 46.58161

min_educ | -1.972122 12.9715 -0.15 0.879 -27.42689 23.48264

_cons | 522.4759 94.9886 5.50 0.000 336.074 708.8778

The estimated “return” to a year of schooling for whites would be 31.04, while the
“return” for minorities would be 29.07 = (31.04 – 1.97). There is no evidence based
on this regression, however, to reject the null that the coefficient on min_educ is
different from zero (p=0.879). Thus, we don’t have any evidence to suggest that
there is a different education premium for whites vs. minorities in this sample.

Part 3: Explore the determinants of ln(income).

1. (3 points) Continue using the same data. Make a regression table like those
usually present in academic papers. The dependent variable is always ln(income)
(this requires you to generate a new variable) and all of these regressions should
use robust standard errors. The independent variables of interest are male, black
and years of education; add these sequentially to each of the 3 models (i.e. your
first model should just be a regression of ln(income) on male, the second should
include male and black as the independent variables and the third should include
all 3 independent variables. Standard errors should be placed below coefficients in
parentheses, along with stars for statistical significance (* p<.1 ** p<.05 ***
p<.01). The R2 and sample size for each regression should be reported at the
bottom of each column.
gen lincome=ln(income)

. eststo clear

.
. eststo: reg lincome male, robust

Linear regression Number of obs = 995

F(1, 993) = 107.57
Prob > F = 0.0000
R-squared = 0.0950
Root MSE = .33408

------------------------------------------------------------------------------
| Robust
lincome | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
male | .2169101 .0209135 10.37 0.000 .1758704 .2579498
_cons | 6.608175 .0137173 481.74 0.000 6.581257 6.635093
------------------------------------------------------------------------------
(est1 stored)

.
. eststo: reg lincome male black, robust

Linear regression Number of obs = 995

F(2, 992) = 54.99
Prob > F = 0.0000
R-squared = 0.0991
Root MSE = .3335

------------------------------------------------------------------------------
| Robust
lincome | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
male | .2176794 .0209102 10.41 0.000 .1766461 .2587128
black | -.0799085 .0362088 -2.21 0.028 -.1509632 -.0088538
_cons | 6.614588 .0139254 475.00 0.000 6.587262 6.641915
------------------------------------------------------------------------------
(est2 stored)

. eststo: reg lincome male black yrs_educ, robust

Linear regression Number of obs = 995

F(3, 991) = 49.46
Prob > F = 0.0000
R-squared = 0.1424
Root MSE = .32556

------------------------------------------------------------------------------
| Robust
lincome | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
male | .2428386 .0210778 11.52 0.000 .2014764 .2842009
black | -.0677965 .0348131 -1.95 0.052 -.1361124 .0005195
yrs_educ | .0492704 .0071684 6.87 0.000 .0352034 .0633373
_cons | 6.010936 .0889598 67.57 0.000 5.836365 6.185507
------------------------------------------------------------------------------
(est3 stored)

. esttab , se r2 star( .1 .05 * .01)

------------------------------------------------------------
(1) (2) (3)
lincome lincome lincome
------------------------------------------------------------
male 0.217*** 0.218*** 0.243***
(0.0209) (0.0209) (0.0211)

black -0.0799** -0.0678*

(0.0362) (0.0348)
yrs_educ 0.0493***
(0.00717)

_cons 6.608* 6.615* 6.011***

(0.0137) (0.0139) (0.0890)
------------------------------------------------------------
N 995 995 995
R-sq 0.095 0.099 0.142
------------------------------------------------------------
Standard errors in parentheses
* p<.1, ** p<.05, *** p<.01

2. (2 points) Using column 3, interpret each of the three coefficients on male,

black, and educ, and discuss whether each is statistically significant at the 5%
level.
The coefficient on male is equal to 0.243. This implies that on average being male
is associated with having 21.7 percent higher earnings than females, holding race
and education constant. The coefficient is statistically significant at the 1% level.
The coefficient on black is equal to -0.0678. This implies that on average being
black is associated with 6.78% lower earnings than non-blacks, holding gender and
education constant. The coefficient is statistically significant at the 10% level.
The coefficient on years of education is equal to 0.0493. This implies that on
average an additional year of education is associated with an increase in earnings
of about 4.93 percent, holding race and gender constant. The coefficient is
statistically significant at the 1% level.

3. (3 points) Is the return to education different for blacks and non-blacks? If so, by
how much? Explain in detail how you would answer this question, then answer it
by running a fourth regression.
We want to assess the relationship of one independent variable (education) with
the outcome variable (lnincome) depending on the level of another independent
variable (black). This is an ideal case to use interactions.
We first make an interaction variable that is equal to black*yrs_educ. If the
coefficient on this variable is statistically different from zero, then we could argue
that the returns to education are different for blacks and non-blacks.
gen black_ed = black*yrs_educ

. reg lincome male black yrs_educ black_ed, robust

Linear regression Number of obs = 995
F(4, 990) = 37.89
Prob > F = 0.0000
R-squared = 0.1431
Root MSE = .32558

------------------------------------------------------------------------------
| Robust
lincome | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
male | .2430082 .0210995 11.52 0.000 .2016033 .2844132
black | -.3642078 .2752017 -1.32 0.186 -.9042534 .1758378
yrs_educ | .0475133 .0074795 6.35 0.000 .0328357 .0621909
black_ed | .0252565 .0232488 1.09 0.278 -.0203661 .0708791
_cons | 6.031893 .092684 65.08 0.000 5.850014 6.213773

Since the coefficient on black_ed is not statistically significant, we fail to reject the
implicit null that this coefficient is equal to zero. Thus, we cannot say with any
certainty that the returns to education for blacks and non-blacks are different.

4. (1 point) Why might it make sense to use ln(income) as a dependent variable

than income, as you did in part 1?
Using the log of income allows us to assume that the explanatory variables are
associated with percent changes (or differences) in income, rather than absolute
changes in income. It may more sense that an additional year of education would
be associated with a 5 percentage point increase in monthly income rather than a
particular dollar amount (such as $200). The dollar value of an additional year of
education can thus vary by income level.

Part 4: The impact of education on income.

1. (1 point) Continue using the same data. Regress income on education and
education2 (this requires you to generate a new variable). Report your results.
gen ed2 = yrs_ed^2

. reg income yrs_educ ed2, robust

Linear regression Number of obs = 995
F(2, 992) = 12.95
Prob > F = 0.0000
R-squared = 0.0196
Root MSE = 341.15

------------------------------------------------------------------------------
| Robust
income | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
yrs_educ | 43.85142 45.62291 0.96 0.337 -45.67708 133.3799
ed2 | -.5104324 2.016529 -0.25 0.800 -4.467584 3.446719
_cons | 437.5441 258.0703 1.70 0.090 -68.88224 943.9704
------------------------------------------------------------------------------

2. (1 point) Can you interpret the coefficient on yrs_educ? Why or why not? If you
can interpret it, do so.
You cannot directly interpret the magnitude of the coefficient on yrs_educ since we
can’t change yrs_educ and hold ed2 constant. In order to see the relationship
between educ and income, we must look at a particular level of income. We could
create a table using sample education values, the regression constant and
coefficients on yrs_educ and ed2 to calculate sample incomes and changes in
income associated with an additional year of education. We can then calculate the
changes in income at different levels of education.
3. (1 point) What's the predicted difference in income between people with 11 and
12 years of educ?

( 437.54+ 4 3.85∗12−0.5 1∗1 22 )−( 437.54+ 43.85∗11−0.51∗1 12) =3 2.12

4. (1 point) What's the predicted difference in income between people with 15 and
16 years of educ?

( 437.54+ 43.85∗1 6−0.51∗1 6 2) −( 437.54+ 43.85∗1 5−0.51∗1 52 )=28.04

5. (1 point) Are the answers to 3 and 4 the same? Why or why not? Discuss.
They are different because the change in income associated with a change in
education now depends on the level of education. Since the sign on the education
is positive and on the square term is negative, we know that income is increasing
in education at a decreasing rate. Therefore, in this sample, going from 15 to 16
years of education is associated with a smaller increase in monthly income
($28.04) than going from 11 to 12 years of education ($32.12).

6. (1 point) Why might it make sense to include education2 as an explanatory

variable in addition to education?
We might believe that the predicted change in income associated with a change in
education depends on the level of education one receives. Including ed2 in our
regression allows for a nonlinear relationship between education and income. In
other words, we might believe that the return to education is greater for an
additional year of education at the high school level than for an additional year
after graduate school – that would yield coefficients like those that we see, where
the coefficient on education is positive and that on education squared is negative.

7. (2 points) You should find that the coefficient on “educ” is statistically

insignificant. Does this mean that education isn't a significant predictor of income?
test yrs_educ ed2

( 1) yrs_educ = 0
( 2) ed2 = 0

F( 2, 992) = 12.95
Prob > F = 0.0000

No, this does not necessarily mean that education is not a significant predictor of
income. It is possible that since yrs_ed and ed2 are correlated, they do not appear
to be individually significant, but they are jointly significant. To determine whether
education is a significant predictor of income, we must test the hypothesis that
both education related coefficients, yrs_educ and ed2, are jointly zero taken
together. When we run an F-test for this purpose, we find that education is, in fact,
a statistically significant predictor of income (p=0.000). Hence, we reject our
implicit null hypothesis. (Note that the test above is already reported in the
regression table from 1, so you don’t have to run the test separately, you can also
just refer to that Stata output).

Part 5: Before answering the five questions below, please read the following
(slightly edited) excerpt from a science blog about the concept p-hacking:

“Most scientists are careful and scrupulous in how they collect data and carry out
statistical tests. However, there are ways in which statistical techniques can be
misused and abused to show effects which are not really there. To avoid reporting
spurious results as fact and giving air to bad science, we must be able to recognize
when such methods may be in use. This piece introduces one such technique
known as ‘p-hacking’. It is one of the most common ways in which data analysis is
misused to generate statistically significant results where none exists, and is one
which we should remain vigilant against.

A scrupulous scientist should go in with a well-motivated hypothesis which she put

to the test in an experiment or with using observational data. This is a baseline
assumption of scientific testing: that the scientist forms a prior hypothesis (ideally
based on a theory) which they then put to the test. Suppose, however, a scientist
took the opposite approach. Suppose they started off with the conclusion they want
to reach, and were not particularly concerned with scientific ethics. In this case,
they could use statistical testing to manufacture this result through selective
reporting.

To take a toy example, suppose you wanted to establish a link between chocolate
and baldness. You could then get a group of 10,000 men (a pretty big sample size
by all accounts) to report on their consumption of M&Ms, Twix and Mars Bars over
a period of time. In addition, you record the rate of going bald in the group over
time. Once you have your chocolate and baldness data, you run tests on everything
you can think of. Do men who eat only M&Ms go bald younger? Do young men who
eat both Mars and M&Ms but not Twix go bald on top more often than the front?
Do older, unmarried men who don’t exercise and eat non chocolate bars have a
lower incidence of baldness? Run enough of these tests and you are eventually
bound to get a result that is ‘statistically significant’.

A p-value of for example 0.01 indicates the probability of a result occurring

randomly just 1 in 100 times. This is generally judged to be pretty highly
significant as it’s rather unlikely the association came about by chance. This is
based on the assumption, of course, that you are not running hundreds of tests in
order to find the 1 in a 100 occurrence. P-hacking is particularly insidious because
it can be hard to detect. With a plausible explanation for why the ‘hypothesis’ was
proposed, results generated by torturing the data in this way can be hard to
distinguish from genuine studies.”

1. (1 point) What is p-hacking according to the text above?

A) A misuse of data sampling to find patterns in data where no real
underlying effect exists.
B) A misuse of data analysis to find patterns in data where no real
underlying effect exists.
C) A misuse of experimental design to find patterns in data where no real
underlying effect exists. D) All of the above.

2. (1 point) How should a scientist who is genuinely interested in the

relationship between chocolate consumption and baldness go about
their research?
A) It isn’t possible, so they should not attempt this.
B) They should limit their sample size to at most 1000 men.
C) They should go into their research with a clear, motivated
hypothesis that can be tested with a limited number of tests.
D) A misuse of data sampling and analysis, and of experimental design, to
find patterns in data where no real underlying effect exists.
3. (1 point) Which of the following is a red flag for potential p-
hacking?
A) A sensationalized finding.
B) A result which goes against the majority of existing research on a topic.
C) A higher than expected number of studies reporting p-values just
below 0.05.
D) All of the above.

4. (1 point) Pre-registration of scientific hypotheses before

experiments are run and data are analyzed is seen as one weapon in
the quest against p-hacking. Which is the main reason why this is?
A) It forces the scientist to formulate clear, testable hypotheses.
B) It makes it possible for those evaluating the research to make sure that
the scientists have not run an inappropriate number of tests in search of
significant results.
C) The pre-registration can set threshold for statistical significance ahead of
the analysis being conducted.
D) All of the above.

5. (1 point) Can more stringent standards for at what level a result is

considered statistically significant eliminate the problem of p-
hacking?
A) Yes, because a really low p-value of for example 0.001 is not possible to
generate through p-hacking.
B) Yes, especially if all scientists agree to the new level of statistical
significance.
C) No, but it may make it harder to p-hack.
D) No, p-hacking is as easy regardless of whether the p-value needs to be
0.05 or 0.01 for a result to count as statistically significant.

Introduction to Time Series and Forecasting【solution manual 】
79% (24)
Introduction to Time Series and Forecasting【solution manual 】
46 pages
Ps 3
No ratings yet
Ps 3
13 pages
Handbook of Quantitative Supply Chain Analysis
100% (7)
Handbook of Quantitative Supply Chain Analysis
818 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Generalised Linear Models and Bayesian Statistics
No ratings yet
Generalised Linear Models and Bayesian Statistics
35 pages
SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis
No ratings yet
SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis
12 pages
PDF
No ratings yet
PDF
9 pages
PDF PDF
100% (1)
PDF PDF
14 pages
Fixed Versus Random Effects
No ratings yet
Fixed Versus Random Effects
82 pages
Logit Probit
No ratings yet
Logit Probit
66 pages
ch4 Dummy
No ratings yet
ch4 Dummy
54 pages
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
No ratings yet
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
48 pages
CTY#18MAT41#MODULE-3#Binomial and Poisson Distribution-Problems
No ratings yet
CTY#18MAT41#MODULE-3#Binomial and Poisson Distribution-Problems
21 pages
Week 11 Lecture 20
No ratings yet
Week 11 Lecture 20
16 pages
Quick Stata Guide
No ratings yet
Quick Stata Guide
22 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Appendix C: Time Value of Money
No ratings yet
Appendix C: Time Value of Money
15 pages
(A) Regress Log of Wages On A Constant and The Female Dummy. Paste Output Here
No ratings yet
(A) Regress Log of Wages On A Constant and The Female Dummy. Paste Output Here
5 pages
Econ 251 PS5 Solutions
No ratings yet
Econ 251 PS5 Solutions
16 pages
Cross Section Answers
No ratings yet
Cross Section Answers
22 pages
Bivariate Regression - Lab #8: - Pwcorr Age Income98, Sig
No ratings yet
Bivariate Regression - Lab #8: - Pwcorr Age Income98, Sig
3 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
Nonwhite - .0729731 .4437979 0.16 0.869 - .7988879 .9448342
No ratings yet
Nonwhite - .0729731 .4437979 0.16 0.869 - .7988879 .9448342
11 pages
Bài tập KTL - Exercise
No ratings yet
Bài tập KTL - Exercise
14 pages
Topic 24 - Hypothesis Tests and Confidence Intervals in Multiple Regression Question
No ratings yet
Topic 24 - Hypothesis Tests and Confidence Intervals in Multiple Regression Question
10 pages
Econ 251 PS4 Solutions
No ratings yet
Econ 251 PS4 Solutions
11 pages
FinalExam Fall2020 Updated GB213
No ratings yet
FinalExam Fall2020 Updated GB213
11 pages
Class 10 Multilevel Models
No ratings yet
Class 10 Multilevel Models
42 pages
Anova and The Design of Experiments: Welcome To Powerpoint Slides For
No ratings yet
Anova and The Design of Experiments: Welcome To Powerpoint Slides For
22 pages
A) Vogel's Approximation B) Least Cost Entry Method B) North West Corner Rule D) Row Minima Method
No ratings yet
A) Vogel's Approximation B) Least Cost Entry Method B) North West Corner Rule D) Row Minima Method
1 page
PS3 Stata
No ratings yet
PS3 Stata
3 pages
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
No ratings yet
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
16 pages
5103A1
No ratings yet
5103A1
6 pages
Department of Economics Problem Set
No ratings yet
Department of Economics Problem Set
5 pages
A4-+PresentationTemplate Research
No ratings yet
A4-+PresentationTemplate Research
18 pages
ps5 Fall+2015
No ratings yet
ps5 Fall+2015
9 pages
Analysis of Variance & Multivariate Analysis of Variance
No ratings yet
Analysis of Variance & Multivariate Analysis of Variance
20 pages
Xtfef Sthelp
No ratings yet
Xtfef Sthelp
3 pages
MCA 3rd Sem Game Theory
No ratings yet
MCA 3rd Sem Game Theory
17 pages
Fi Snish
No ratings yet
Fi Snish
3 pages
Multiple Price List Design Explanation
100% (1)
Multiple Price List Design Explanation
5 pages
ECMT1020 - Week 06 Workshop
No ratings yet
ECMT1020 - Week 06 Workshop
4 pages
Practice Exam2 PDF
100% (1)
Practice Exam2 PDF
8 pages
Econ 1630 HW1
No ratings yet
Econ 1630 HW1
6 pages
Ees 404
No ratings yet
Ees 404
10 pages
Uji Validitas Dan Reliabilitas (Lubis)
No ratings yet
Uji Validitas Dan Reliabilitas (Lubis)
5 pages
Empirical Exercises 6
No ratings yet
Empirical Exercises 6
7 pages
Lab Exercises Answer
No ratings yet
Lab Exercises Answer
13 pages
Im ch01
No ratings yet
Im ch01
11 pages
CORRELATION and REGRESSION
No ratings yet
CORRELATION and REGRESSION
4 pages
Psych Stat Reviewer Midterms
No ratings yet
Psych Stat Reviewer Midterms
10 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
No ratings yet
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
143 pages
AE 2023 Lecture7
No ratings yet
AE 2023 Lecture7
40 pages
Homework 2 Questions
No ratings yet
Homework 2 Questions
7 pages
Teori-Response Surface Methodology
No ratings yet
Teori-Response Surface Methodology
3 pages
Stata Output
No ratings yet
Stata Output
10 pages
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manual 1
100% (51)
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manual 1
26 pages
Final Exam Questions
No ratings yet
Final Exam Questions
14 pages
Franciele - Bloco de Notas
No ratings yet
Franciele - Bloco de Notas
6 pages
Results - Practical 2 - Econometrics
No ratings yet
Results - Practical 2 - Econometrics
4 pages
CB2203 2023-24 Sem B Assignment 2
No ratings yet
CB2203 2023-24 Sem B Assignment 2
3 pages
L9.1 2023
No ratings yet
L9.1 2023
47 pages
Midterm Exam 1 - Specimen Paper - v3
No ratings yet
Midterm Exam 1 - Specimen Paper - v3
4 pages
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
No ratings yet
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
9 pages
First Binary
No ratings yet
First Binary
2 pages
05 Week Economicsofeducation
No ratings yet
05 Week Economicsofeducation
11 pages
Instrumental Variable in Regression
No ratings yet
Instrumental Variable in Regression
28 pages
Instrumental Variable in Regression
No ratings yet
Instrumental Variable in Regression
28 pages
STATA Training For Staff
No ratings yet
STATA Training For Staff
23 pages
Regn Lect 5
No ratings yet
Regn Lect 5
9 pages
Topic 1 Class Exercises
No ratings yet
Topic 1 Class Exercises
5 pages
DRAFT - 15.5 Exercises For Exam 2
No ratings yet
DRAFT - 15.5 Exercises For Exam 2
14 pages
GMU Econ535-Applied Econometrics Problem Set2 (PS2) Solutions Spring 2024
No ratings yet
GMU Econ535-Applied Econometrics Problem Set2 (PS2) Solutions Spring 2024
14 pages
Ansprac 2
No ratings yet
Ansprac 2
6 pages
A5 Final Hussein: E M Se M .
No ratings yet
A5 Final Hussein: E M Se M .
9 pages
HMW - 3 Causal
No ratings yet
HMW - 3 Causal
4 pages
Agent-Based Axelrod Tournament Introduction
No ratings yet
Agent-Based Axelrod Tournament Introduction
12 pages
Chapter - 5 - Panel Data Analysis
No ratings yet
Chapter - 5 - Panel Data Analysis
53 pages
AE6207 - Solution 1 - 2024
No ratings yet
AE6207 - Solution 1 - 2024
8 pages
Ques7 Output Shared
No ratings yet
Ques7 Output Shared
1 page
283
No ratings yet
283
7 pages
Interpreting Linear Regression
No ratings yet
Interpreting Linear Regression
3 pages
Math Assignment Unit 7
No ratings yet
Math Assignment Unit 7
5 pages
Assignment 3
No ratings yet
Assignment 3
1 page
Examec605d19 20
No ratings yet
Examec605d19 20
2 pages
Mock Test Econ
No ratings yet
Mock Test Econ
2 pages
Solutions Manual to accompany Introduction to Linear Regression Analysis
From Everand
Solutions Manual to accompany Introduction to Linear Regression Analysis
Douglas C. Montgomery
1/5 (1)
Key Key and the Spider on Economics and Strategic Moves
From Everand
Key Key and the Spider on Economics and Strategic Moves
Hurdis V Davis
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)

GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024

Uploaded by

GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024

Uploaded by

Econ 535, Applied Econometrics Problem set 3 - SOLUTIONS Due

There are multiple sources of data available online (e.g.

(5 points) Answers will vary depending on the data set chosen.

Part 2: Explaining Income Patterns

1. First, get to know your data set.

a) (1 point) First show summary statistics by using the command “sum”

Variable | Obs Mean Std. dev. Min Max

age | 995 20.2603 1.576226 16 23

black | 995 .0854271 .2796568 0 1

hispanic | 995 .0572864 .232506 0 1

income | 995 887.7364 344.2031 258.9612 2618.175

single | 995 .7366834 .4406542 0 1

married | 995 .2633166 .4406542 0 1

yrs_educ | 995 11.95678 1.50557 4 17

urban | 995 .801005 .399445 0 1

male | 995 .5366834 .4989033 0 1

b) (1 point) Now generate a new variable called “minority” that is equal to 1 if

Variable | Obs Mean Std. dev. Min Max

minority | 995 .1427136 .3499564 0 1

85.7% of the sample is made up of whites.

Linear regression Number of obs = 995

F(3, 991) = 50.50

Prob > F = 0.0000

Root MSE = 321

income | Coefficient std. err. t P>|t| [95% conf. interval]

male | 233.9662 20.32987 11.51 0.000 194.0717 273.8608

minority | -33.09229 29.04318 -1.14 0.255 -90.08548 23.9009

yrs_educ | 44.10299 6.514053 6.77 0.000 31.32007 56.88591

_cons | 239.5634 79.93118 3.00 0.003 82.70959 396.4172

3. (2 points) Now regress income on the variables male, minority, years of

reg income male minority yrs_educ married, robust

Linear regression Number of obs = 995

F(4, 990) = 39.10

Prob > F = 0.0000

income | Coefficient std. err. t P>|t| [95% conf. interval]

male | 237.9132 20.4348 11.64 0.000 197.8127 278.0137

minority | -24.59293 29.00056 -0.85 0.397 -81.50256 32.3167

yrs_educ | 46.32538 6.546817 7.08 0.000 33.47815 59.17261

married | 65.27789 23.16271 2.82 0.005 19.82424 110.7315

_cons | 192.4708 81.68977 2.36 0.019 32.16577 352.7758

. reg income male minority yrs_educ married male_married, robust

Linear regression Number of obs = 995

F(5, 989) = 32.03

Prob > F = 0.0000

Root MSE = 318.9

income | Coefficient std. err. t P>|t| [95% conf. interval]

male | 205.1602 23.37881 8.78 0.000 159.2824 251.0379

minority | -26.09316 28.87601 -0.90 0.366 -82.75845 30.57213

yrs_educ | 46.11461 6.578909 7.01 0.000 33.20438 59.02483

married | 2.281599 22.58367 0.10 0.920 -42.03582 46.59902

male_married | 122.9917 44.75587 2.75 0.006 35.16428 210.819

_cons | 213.3017 81.60683 2.61 0.009 53.15929 373.4441

6. (2 points) Generate a variable called min_educ, the interaction between minority

. reg income minority yrs_educ min_educ, robust

Linear regression Number of obs = 995

F(3, 991) = 10.31

Prob > F = 0.0000

Root MSE = 341.03

income | Coefficient std. err. t P>|t| [95% conf. interval]

minority | -18.62319 147.1104 -0.13 0.899 -307.3069 270.0606

yrs_educ | 31.04021 7.919749 3.92 0.000 15.49881 46.58161

min_educ | -1.972122 12.9715 -0.15 0.879 -27.42689 23.48264

_cons | 522.4759 94.9886 5.50 0.000 336.074 708.8778

Part 3: Explore the determinants of ln(income).

Linear regression Number of obs = 995

Linear regression Number of obs = 995

. eststo: reg lincome male black yrs_educ, robust

Linear regression Number of obs = 995

. esttab *, se r2 star(* .1 ** .05 *** .01)

black -0.0799** -0.0678*

_cons 6.608*** 6.615*** 6.011***

2. (2 points) Using column 3, interpret each of the three coefficients on male,

. reg lincome male black yrs_educ black_ed, robust

4. (1 point) Why might it make sense to use ln(income) as a dependent variable

Part 4: The impact of education on income.

. reg income yrs_educ ed2, robust

. esttab , se r2 star( .1 .05 * .01)

_cons 6.608* 6.615* 6.011***