0% found this document useful (0 votes)
18 views7 pages

Assignment 03 AK

Uploaded by

neilarora6969
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

Assignment 03 AK

Uploaded by

neilarora6969
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ECON 333 D100 Statistical Analysis of Economic Data

Spring 2024

Assignment 03

1. A researcher carefully selected a simple random sample of 176 men and got the following estimate
for his linear regression model (W is the wage rate, in $/hour, and A is the worker’s age, in years):

̂ = 12.21 + 0.24𝐴
𝑊
𝑅2 = 0.038

a. What does that number 0.24 in front of A mean?

̂ , that is, the estimated regression predicts that the average


It’s the marginal effect of A on 𝑊
wage will be ¢24/hr higher for a man who is one year older.

b. What does that number 12.21 mean?

Short answer: Nothing, really…

Formally, it’s what the predicted average wage for a newborn man before his umbilical cord
falls off. Which, of course, is not really meaningful when applied to the real world where
nobody signs a work agreement before they turn one year old.

One could try and assign it some meaning if they want. For instance, one could think about
this number as one determining the overall level of wages (how high, vertically, the
regression line is) and then use an interpretation to organize their thinking about the labour
market with the help of this idea. Or something like it :)

c. What does that number 0.038 in the second line mean? What are its units of measurement?
The coefficient of determination (aka ‘overall fit’ or simply ‘R-squared’). The common
interpretations of R2 (and many people take these as one thing expressed in various ways to
apply to different questions) are:

• It measures the part of the variation in the dependent variable that is predictable
from the independent variable.
• It measures how well the estimated regression line fits the data.
• It measures how accurate the predictions (that are based on the estimated regression
line) of the estimated model will be.

The R2 does not have units of measurement.

1
2. Use the spreadsheet Growth (see the assignment instructions), which contains data on average
growth rates from 1960 through 1995 for 63 countries, along with variables that are potentially
related to growth. A detailed description of the original data is given in Growth_Description (PDF
file in the assignment instructions). In this exercise, you will investigate the relationship between
growth and trade.

a. Construct a scatterplot of the average annual growth rate (growth) on the average trade share
(tradeshare) - copy & paste your R output here. Does there appear to be a relationship
between the variables? Answer in 1-2 sentences.

> plot(Asmt03$tradeshare, Asmt03$growth, main="Rate of Growth and Trade",


+ xlab="Trade Share ", ylab="Growth Rate ")

It seems that there is a (weak) positive relationship between the two variables. Or some people
may look at it and say, “I see no relationship, especially if I ignore an obvious outlier in the
right-top corner.”

The relationship you see depends on how hard you look, and in my case, it is biased because I
learned before from the international trade & globalization researchers that trade is good for
growth and have already done the estimation of the regression line. The truth is we are looking
for expected patterns if we already have some preconceptions about what they should be (and so
we see the expected patterns even if they are not really there). One should be careful making
judgments about patterns when a relationship is weak.

2
b. Using all observations, run a regression of growth on tradeshare. Copy & paste your R
output here (see page 3 if you are not sure what I ask for here).

> print(summary(lm(Asmt03$growth~Asmt03$tradeshare)))
Call:
lm(formula = Asmt03$growth ~ Asmt03$tradeshare)
Residuals:
Min 1Q Median 3Q Max
-4.2305 -0.8643 0.1445 1.0649 3.4251

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5272 0.4353 1.211 0.23058
Asmt03$tradeshare 2.2352 0.6871 3.253 0.00186 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.584 on 61 degrees of freedom
Multiple R-squared: 0.1478, Adjusted R-squared: 0.1339
F-statistic: 10.58 on 1 and 61 DF, p-value: 0.001863

c. Use the regression to predict the growth rate for a country with a trade share of 0.4 and for
another with a trade share equal to 0.8.

̂ = 0.5272 + 2.2352 × 𝑇𝑟𝑎𝑑𝑒𝑆ℎ𝑎𝑟𝑒


𝐺𝑟𝑜𝑤𝑡ℎ

̂ = 0.5272 + 2.2352 × 0.4 = 1.4%


𝐺𝑟𝑜𝑤𝑡ℎ

̂ = 0.5272 + 2.2352 × 0.8 = 2.3%


𝐺𝑟𝑜𝑤𝑡ℎ

d. Make a histogram for growth (the dependent variable in your regression). Copy & paste your
R output (command line & graph) here. Does it appear roughly normally distributed?

It’s on the next page (did not fit here :(

It is roughly normally distributed (the bulk of the growth rates are in the middle [between 1% and
3%], there are the tails that go off to the sides, one can imagine a rough bell shape enveloping
the histogram, etc.)

hist(Asmt03$growth)

3
e. I think that if one wanted to run a regression with the data in assignment 2 (1229 records of
average hourly earnings, degrees, sex, and age), they would not care to check whether the
dependent variable there is normally distributed, but here in Growth data set, we do want to
look at a histogram. Why is it important to check that distribution here, but one could safely
ignore it there (in CPS96_15)?

Because the sample size in Growth is relatively small (n = 63).

The sample size in CP96_15 was definitely large (n = 1229), so we could rely on the Central
Limit Theorem to accept the regression estimates (its t-values, p-values, F-statistic, etc.) and use
them for hypothesis tests, confidence intervals, and so on. We usually are fine with n = 63 as a
‘large enough’ set. Remember, I suggested as a rule of thumb in class that n > 50 is large enough
to rely on the Central Limit Theorem unless there is something ‘special’ about your data. Well,
there is something special here – the sample is rather large compared to the population (there
are fewer than 200 countries around the world, so the sample is about 1/3 of the population –
think about what it means if you are into thinking deeper :)

f. Construct a scatterplot of the average annual growth rate (growth) on the measure of
education (yearsschool) - copy & paste your R output here. Does there appear to be a
relationship between the variables? Answer in 1-2 sentences.

4
> plot(Asmt03$yearsschool, Asmt03$growth, main="Rate of Growth and Education"
, xlab="Average number of years of schooling in 1960 ", ylab="Growth Rate ")

There seems to be a relationship. And it appears stronger than we saw in part (a), with the
growth rate vs. trade share. What exactly you would see as a relationship? That depends :)

‘Mechanically’ (I call it that because that’s what R will do by default), one would probably detect
a positive correlation (yellow line).

It seems to me, however, that the relationship is probably more interesting than that. It’s non-
linear (orange line) and it suggests that we should think about stuff :) For instance, one obvious
(to an economist) interpretation of this non-linear relationship is that there is a diminishing
marginal return on education (with respect to economic growth). In fact, I did hear from
Economic Development people that they do contribute high growth rates in Southeast Asia in the
last several decades to ‘smart’ decisions of their governments to invest in education a lot, and
especially their decisions to invest in primary and secondary education rather than post-
secondary education (like universities).

g. Using all observations, run a regression of growth on yearsschool. Copy & paste your R
output here (see page 3 if you are not sure what I ask for here).

5
> print(summary(lm(Asmt03$growth~Asmt03$yearsschool)))
Call:
lm(formula = Asmt03$growth ~ Asmt03$yearsschool)
Residuals:
Min 1Q Median 3Q Max
-3.6777 -1.1330 -0.1487 0.9534 4.4342
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.72247 0.36790 1.964 0.05412 .
Asmt03$yearsschool 0.26529 0.07737 3.429 0.00109 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.571 on 61 degrees of freedom


Multiple R-squared: 0.1616, Adjusted R-squared: 0.1478
F-statistic: 11.76 on 1 and 61 DF, p-value: 0.001093

It may be interesting to compare this to the regression in part (b) and think what the differences
(in coefficients’ standard errors, p-values, the regression R-squared) mean :)

6
=====================================================

This is approximately how your R regression output (copy-pasted from R to here) will look like:

You might also like