Assignment A (Hand In)
Assignment A (Hand In)
a.) EViews:
Completed table:
Age < 30 Age > = 30
Male 411 443
Female 270 292
b.) EViews:
13.07690
SE ( AHE m ) =
√ 854
¿ 0.4474823059 …
Therefore, the 90% confidence interval for the population mean of wages for female workers is [18.763,
19.975] to 3 decimal places.
f.) Null & Alternative hypothesis: △=¿ gender wage gap (difference between AHE for males & females)
H 0: △ = 2
H 1: △ > 2
g.) T-statistic:
22.55427−19.36897
t=
√
2 2
13.07690 11.23 198
+
854 562
¿ 4.887639031 …
= 4.888 (3dp)
h.) Is H0 rejected?
In a one-sided right-tail, we reject H0 if t≥ t*
4.888 ≥ 1.64
I.e., 4.888 lies in the right-hand side purple rejection region in the graph
Therefore, we reject the null hypothesis at the 5% significance level
Question 2:
a.)
EViews:
The above table indicates that bytest scores (in points) are
positively correlated with higher expected values of ed (in
years). E.g., the lowest bytest scores correspond to an
expected value of the minimum years of education, and
the highest correspond to an expected value of just over 1
year lower than the maximum.
c.) EViews:
OLS Regression (compact format):
^ =8.836+0.098∗bytest
ed
se: (0.133) (0.003)
t-stat: (66.478) (36.752)
p-value: (0.000) (0.000)
R2 = 0.227
n = 3796
d.) Interpretation of the coefficients: the estimated intercept ( ^ β 0) tells us that if the high school test score
(bytest) is 0, then the estimated years of education (ed) will be 8.836 years (3dp). Additionally, the
estimated slope ( ^β 1) tells us that if the high school test score (bytest) increases by one point, the
estimated years of education (ed) will increase on average by 0.133 years (3dp).
e.) Interpretation of the R-squared measure: R2 measures the proportion of the variation in ed that is
explained by the model. In the above model R2 = 0.227 (3dp). This indicates that the point score
obtained in the high school test (bytest) accounts for approximately 22.7% of variation in the years of
education (ed).
Even though this percentage seems somewhat low, it is still significant in accounting for the variance in
ed which is likely contingent on several different variables, e.g., socio-economic status. Additionally, as
aforementioned, the coefficients of the intercept and slope are statistically significant. Therefore, this
model is still useful, as it allows us to draw important conclusions about the relationships between the
variables.
g.) Method: to complete this question, I subbed the test scores into the term ‘bytest’ in
^ =9.070+0.097∗bytest when black = 1, and ed
ed ^ =8.571+0.102∗bytest when black = 0.
Discussion of results: for both black and non-black individuals there exists a positive correlation
between ed^ and bytest. However, this correlation is stronger when black = 1, indicating that the same
test scores (in points) are associated with higher levels of education (in years) for black individuals
(compared to their non-black counterparts).
h.) Is the relationship between test scores & completed education causal? Causality means that a specific
action leads to a specific, measurable consequence, i.e., a causal relation between two events exists if
the occurrence of the first causes the other. In considering whether test scores and completed
education have a causal relationship we must consider that just because two variables are associated
(i.e., correlated) does not mean that one causes the other. We have determined that these variables
are positively correlated, now we must consider if they are causally related.
However, this is not the case in the given data, as the factors that impact ed also impact bytest. For
example, the variable black explains 1.081% of variation in ed, but also explains 9.902% of variation
in bytest. This
indicates that
the variable
black, which
impacts ed, is
not unrelated to
bytest,
therefore
violating this
first assumption
2. Xi and Yi should be independent and identically distributed. I.e. ed and bytest should be i.i.d. This
assumption is fulfilled by random sampling, and therefore is met in this case.
3. Large outliers are unlikely. This involves an assumption of finite kurtosis for both Xi (bytest) and Yi
(ed), which is plausible in this case. For example, bytest is capped, as the best you can do in a
standardised test is full marks, and the worst is no marks. Additionally, ed is capped within the
range of possibility, which in this case, is 12-18 years. Because these variables have finite ranges,
they also both adhere to the assumption that Xi and Yi have nonzero finite fourth moments.
Even though this data adheres to the second and third assumptions of causality, we cannot conclude that the
relationship between test scores and completed education uncovered in (c) and (f) is causal. This is due to the
fact that this relationship appears to violate the first assumption.