Data Analysis Exam
Data Analysis Exam
𝑅𝑅2 = 0.321
a- Interpret the estimated regression model and the value of the R-squared
Intercept: 𝛽𝛽̂0 = 12.6: The average of absenteeism days per year in this company is
12.6 when workers are totally unsatisfied.
Slope: 𝛽𝛽̂1 = −1.2: Absenteeism days are reduced by 1.2 when satisfaction level at
work increases by 1. When the worker is completely satisfied, that is when 𝑥𝑥 = 10,
the model predicts that she will be absent 0.6 days.
R-squared: work satisfaction helps explain 32.1% of the variability of absenteeism
days for this sample of workers.
b- Test the null hypothesis that work satisfaction does not produce any significant effect
on labour absenteeism at a 1% significance level.
𝐻𝐻0 : 𝛽𝛽1 = 0
�
𝐻𝐻1 : 𝛽𝛽1 ≠ 0
𝛽𝛽̂1
𝑡𝑡 = ~𝑡𝑡45−2
𝑠𝑠𝑠𝑠(𝛽𝛽̂1 )
−1.2
𝑡𝑡 = = −13.63
0.088
The critical value of the test at 1% significance level is 2.58
1
DATA ANALYSIS FOR ECONOMICS: PS3
As a result, we have that |−13.63| > 2.58, so that we reject the null hypothesis at
1% significance level. We can conclude that work satisfaction is statistically significant
to determine absenteeism (at 1% significance level).
c- The level of work satisfaction of a different worker is 6. Find the predicted labour
absenteeism days per year for this worker.
Using the OLS regression function: a worker from that company who reports a
satisfaction level of 6 is predicted to be absent from work 5.4 days per year.
d- In your opinion, explain one application of the above model from the perspective of
the Human Resources department of the company.
Check work satisfaction frequently in order to try to motivate those workers who
report being unsatisfied, and give a premium to those who are already satisfied.
CURIOSITY: HENRY FORD CASE STUDY
Perhaps Henry Ford was the first to discover the full use of the efficiency-wage
theories. The Ford Motor Company began to pay its workers $5.00 per day in 1914
when the average wage at that time was between $2.00 and $3.00 per day. This
significantly increased the amount of people who were waiting in line to receive a job
from this company. Henry Ford believed that by paying above the equilibrium wage
it would secure the business for the future.
He seemed to think that by paying his workers a higher wage it would inevitably
lower costs. And evidence shows that this has been the case in production for the
Ford Motor Company ever since. Worker productivity increased across the board
because they knew they were not going to find the type of pay they were receiving
anywhere else. It created an incentive for them to stay with the Ford company and
work hard.
2 Consider a SLRM relating the annual number of crimes on college campuses (crime)
to student enrollment (enroll) with the following estimation results:
2
DATA ANALYSIS FOR ECONOMICS: PS3
b- Calculate two-tailed test to find whether the variable enroll should be included in the
regression model (at 1% significance level).
1.27
𝑡𝑡𝛽𝛽�1 = = 11.54
0.11
We reject the null hypothesis since 11.54 > 2.58 at 1% significance level and
therefore, enrollment is statistically significant in explaining the behavior of crimes.
In the SLRM this test is also the individual significant test of (b), so we just proved
in (b) that the variable was significant at 1% (so it is also at 5%) which is equivalent
to say that the model is useful.
Since 𝐹𝐹1,95(0.05) = 3.94, we reject the null. And, therefore the model is useful al 5%
significance level.
3 Consider the model: 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖 + 𝛽𝛽3 𝑥𝑥3𝑖𝑖 + 𝛽𝛽4 𝑥𝑥4𝑖𝑖 + 𝛽𝛽5 𝑥𝑥5𝑖𝑖 + 𝑢𝑢𝑖𝑖
a- 𝛽𝛽2 = 0
I would perform an individual two-tailed t-test for the second explanatory variable
because it is about testing the individual significance of the second explanatory
variable.
3
DATA ANALYSIS FOR ECONOMICS: PS3
a- Interpret the OLS slope coefficient of the SLRM. Is size statistically significant?
When there is an increase of one square meter in the property size, on average price
increases by 1.2%.
The t-ratio of the variable floorm2 is equal to 0.012/0.0005= 24 which is much
greater than the critical value of the standard normal with 1% significance level (2.58),
which implies that size is statistically significant to explain property prices in London
4
DATA ANALYSIS FOR ECONOMICS: PS3
When there is a 1% increase in the distance from the property to the city center, on
average price decreases by 0.291%, keeping size constant.
The t-ratio of the variable log(dholborn) is equal to -17.12 which in absolute value is
much greater than the critical value of the standard normal with 1% significance level
(2.58), which implies that distance is statistically significant to explain property prices
in London.
c- Predict the price of a 120 square meter property that is 7 km away from the Holborn
station using Model 2.
log�
(𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝)𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 11.68 + 0.0104 ∗ 120 − 0.291 ∗ log (7)
log� (𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝)𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 11.68 + 1.248 − 0.566 = 12.36
� 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑒𝑒 12.36 = 233,281.23 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
d- How does the effect of size on property prices change respect to the estimation result
in the SLRM (section a)? Why?
The unrestricted model is Model 3 (k=4) , and the restricted model is Model 2. Thus:
117.803 − 113.431
(𝑆𝑆𝑆𝑆𝑆𝑆𝑟𝑟 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑢𝑢𝑢𝑢 )/𝑞𝑞 2
𝐹𝐹 = = = 23.01
𝑆𝑆𝑆𝑆𝑆𝑆𝑢𝑢𝑢𝑢 113.431
𝑛𝑛 − 𝑘𝑘 − 1 1199 − 4 − 1
Since 23.01 > 𝐹𝐹2,1194(0.01) = 4.62, we reject the null hypothesis at 1% significance
level and therefore, age and buyage are jointly statistically significant. We prefer the
unrestricted version of the model.
5
DATA ANALYSIS FOR ECONOMICS: PS3
Since 𝐹𝐹4,1194(0.01) = 3.33, we reject the null. And, therefore the model is useful al
1% significance level.
g- If you were part of the team in the consultancy firm and had to choose a model,
which one would you choose and why?
I would choose model 3, given the answers in e) and f). All the tests are consistent
with choosing Model 3, also if we compute Adjusted R-squared will be the highest
among the three models.
6
DATA ANALYSIS FOR ECONOMICS: PS3
a- Interpret the slope coefficient in Model 1 and test its individual significance.
If unemployment rate increases by one percentage point, on average, poverty rate
will increase by 0.731%. This is a realistic result, as poverty rate tends to increase if
there are many individuals without a job.
0.731 − 0
𝑡𝑡𝑈𝑈 = = 7.945
0.092
𝑐𝑐 = 2.58 two-tailed t-test with 56 degrees of freedom at 1%
We reject the null hypothesis since 7.945 > 2.58 at 1% significance level and
therefore, unemployment rate variable is statistically significant in explaining the
behavior of poverty rate.
0.721
𝑡𝑡𝑈𝑈 = = 6.801
0.106
We reject the null hypothesis since 6.801 > 2.58 at 1% significance level and
therefore, unemployment rate variable is an individually significant variable
explaining the behavior of poverty rate.
0.305
𝑡𝑡𝐹𝐹 = = 0.175
1.742
We fail to reject the null hypothesis since 0.175 < 2.58 at 1% significance level and
therefore, family size is not statistically significant at 1% level to explain poverty rate.
First we need to get the R-squared Given the relationship between adjusted R-
squared and R-squared:
7
DATA ANALYSIS FOR ECONOMICS: PS3
(1 − 𝑅𝑅2 )(58 − 1)
0.510 = 1 −
(58 − 2 − 1)
0.510 = 1 − (1 − 𝑅𝑅2 )1.0364
(1 − 𝑅𝑅2 )1.0364 = 0.49
𝑅𝑅2 = 0.527
Since 𝐹𝐹2,55(0.01) = 5.01, we reject the null. And, therefore 𝑈𝑈 and 𝐹𝐹 are jointly
statistically significant at 1% level (the model is useful)
It is a positive effect because you might expect that more individuals in a household,
holding everything else constant, implies less resources and, consequently, the
poverty probability increases. It is not significant because this effect may not be so
relevant to understand poverty rate variability. As you can see, when comparing the
adjusted determination coefficients in Model 1 and in Model 2 is the same. That is,
adding famsize variable does not increase the explanatory power of the second model
if compared with the first one. This is consistent with famsize variable not being
individually significant.
d- In Model 3 we add two new explanatory variables: EDU and URBAN. Test whether
this inclusion helps to improve the quality of the model. Is model 3 the best in terms
of goodness-of-fit? Why?
8
DATA ANALYSIS FOR ECONOMICS: PS3
421.457 − 374.675
(𝑆𝑆𝑆𝑆𝑆𝑆𝑟𝑟 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑢𝑢𝑢𝑢 )/𝑞𝑞 2
𝐹𝐹 = = = 3.308
𝑆𝑆𝑆𝑆𝑆𝑆𝑢𝑢𝑢𝑢 374.675
𝑛𝑛 − 𝑘𝑘 − 1 58 − 4 − 1
Since 3.308 > 𝐹𝐹2,53(0.05) = 3.17, we reject the null hypothesis at 5% significance
level and therefore, the new two explanatory variables introduced in the third model
are jointly statistically significant. That is, the inclusion of the new two explanatory
variables helps us to understand better the behavior of our dependent variable.
Therefore, our preferred specification will be the third model.
Yes, model 3 is the best in terms of explanatory power because its adjusted
determination coefficient is larger than the ones associated to Model 1 and Model 2.
This result is consistent with the result of the above F-test.
e- Are the effects of these two new variables the expected ones?
The effects are the expected ones. The more education more chances to have a job
and therefore not suffering poverty (education having a negative effect on poverty
rate). Living in urban areas may imply more opportunities and therefore a lower
poverty rate than in rural areas (negative effect of urban on poverty rate).
0.424
𝑡𝑡𝑈𝑈 = = 2.554
0.166
We fail to reject the null hypothesis since 2.554 < 2.58 at 1% significance level and
therefore, unemployment rate is not statistically significant in Model 3.