Group Assignment

Debre Birhan University
College of Business and Economics
Department of Management, (MBA) Extension Program
Basic Econometrics (MBA631)
Group Assignment
Prepared by: - Section B, Group 4 Members

1) Semon Worku
2) Bisrat Demere
3) Rahel Zewudu
4) Tizitawu Lema
5) Tamirat Fikre
6) Girum Lakewu
7) Tadelech Dejene
Submitted to – Dr. Muhammed Siraj

Submission Date – 19 March 2022
1. Suppose you are interested to investigate factors that determine fertility rate particularly on
those who are living in developing world. And, after reviewing some literatures, you come up
with the following model:
𝑪𝒉𝒊𝒍𝒅𝒓𝒆𝒏𝒊=𝜷𝟎+𝜷𝟏𝒆𝒅𝒖𝒄𝒊+𝜷𝟐𝒂𝒈𝒆𝒊+𝜷𝟑𝒊𝒏𝒄𝒐𝒎𝒆+ 𝜺𝒊
Where:
Children: the number of children women have.
educ: level of education of women
age: age of women
income: annual income of women in US dollar
A. (5 points) Discuss what sign of the relationship do you expect between number of children
woman have (fertility) and the specified independent variables? What shape of the relationship
do you expect? What other factors (and why) might influence women’s fertility?
Answer
A) As women have a better education which will take time and effort to do so, having
children is going to be difficult because with handing both would be very hard. We are
not saying it is not manageable but only few which is this case are calling the error value
does it or have the endurance to do it. Literature data show that the higher the level of a
woman’s educational attainment, the fewer children she is likely to bear. Given that
fewer children per woman and delayed marriage and childbearing could mean more
resources per child and better health and survival rates for mothers and children. So, we
expect the coefficient for education to have a negative sign after regression with the set of
data were are given.
B) As age goes up, women will have a better chance of having children. Of course there
exists a limit where they can’t no longer have children but we are focusing on the point
where they can conceive and be able to have children and as they mature their bodies
would be more capable to have children. So, we expect the coefficient for education to
have a positive sign after regression with the set of data were are given.
C) Income and fertility is the association between monetary gain on one hand, and the
tendency to produce offspring on the other. There is generally an inverse correlation
between income and the total fertility rate within and between nations. The higher the
degree of education and GDP per capita of a human population, subpopulation or social
stratum, the fewer children are born in any developed country. Many studies find fertility
is lower among better educated women and is often higher among women whose families
1|Page
own more land and assets. So, we expect the coefficient for education to have a negative
sign after regression with the set of data were are given.
There many other factors that has influence on women’s fertility. We were able to gather some
of them and are presented below.
Alcohol
Some studies report that drinking more than 5 units of alcohol a week may reduce female fertility
but others state that low to moderate alcohol consumption may be associated with higher
pregnancy rates than non-drinkers. Once pregnant, excessive alcohol consumption may lead to
birth defects and developmental delay. The Royal College of Obstetricians and Gynecologists
and the Department of Health recommend that women trying to get pregnant should avoid
alcohol because there is no ‘safe’ limit.
Smoking
Women who smoke are 3 times more likely to experience a delay in getting pregnant than non-
smokers. Even passive smoking can be harmful. Smoking reduces a woman’s ovarian reserve
(so her ovaries will have fewer eggs in them than a woman of the same age who does not smoke)
and damages the cilia inside the fallopian tube (which are important for transporting the egg
and/or embryo along the fallopian tube into the uterus). In men, smoking may reduce sperm
quantity and quality.
Previous Pregnancy
Couples are more likely to get pregnant if they have previously achieved a pregnancy together
(irrespective of whether or not that pregnancy resulted in the birth of a baby) compared to
couples that have never been pregnant.
Duration of subfertility
The longer couples have been trying to get pregnant, the less likely they are to be successful. If a
couple have been trying to get pregnant for less than 3 years they are almost twice as likely to get
pregnant than couples who have been trying for more than 3 years.
Over-the-counter and Recreational Drugs
Non-steroidal anti-inflammatory drugs such as ibuprofen can interfere with ovulation. Aspirin
may interfere with implantation. Recreational drugs such as marijuana and cocaine may interfere
with ovulation and/or the function of the fallopian tube. The fallopian tube is important for
transporting the egg from the ovary where it is released, to the womb (uterus) where an embryo
will hopefully implant. Fertilization occurs in the fallopian tube. Anabolic steroids, which are
abused by some body-builders, inhibit the production of sperm and this may be permanent even
if the drug is stopped.
2|Page
Medical Conditions
Some women may have medical conditions that can affect their fertility. These may or may not
be known about when starting to try for a family. Some of these conditions may be more
general, for example thyroid disease and vitamin D deficiency whilst others may be more
specific, for example, polycystic ovary syndrome and endometriosis.
B. (10 points) Suppose one of your colleagues provides you data that were collected in
Botswana. The data consist of children, education, age and living place (place)1 of sample
women. Using the data ‘fertile’
i. Estimate the model and interpret the coefficients.
ii. Test each estimated coefficient using t-test at 5% significance level.
iii. Test the overall significance of the model.
iv. What could you say about the overall results?
v. Conduct the Breusch-Pagan test and white test for heteroscedasticity. Does its result justify the
use of robust standard errors? If yes, re-estimate the model using the robust standard errors and
comment on the difference with respect to the model estimated by simple OLS.
vi. Test whether there is presence of multicollinearity in the data and comment the results.
Answers
For this part we used gretl for performing regression and based on the results we performed
interpretation as follows.
Model 2: OLS, using observations 1-4246

Dependent variable: children
Coefficient Std. Error t-ratio p-value

const −1.95380 0.0949597 −20.58 <0.0001 ***
educ −0.0883125 0.00600781 −14.70 <0.0001 ***
age 0.172671 0.00272876 63.28 <0.0001 ***
income −1.45419e-07 2.91661e-07 −0.4986 0.6181
Mean dependent var 2.259303 S.D. dependent var 2.190434

Sum squared resid 9074.411 S.E. of regression 1.462594
R-squared 0.554466 Adjusted R-squared 0.554151
F(3, 4242) 1759.722 P-value(F) 0.000000
Log-likelihood −7637.191 Akaike criterion 15282.38
3|Page
Schwarz criterion 15307.80 Hannan-Quinn 15291.36
After get our explanatory variable’s coefficient we can now form our model with tangible data.
Childreni = -1.95380 – 0.0883125educi + 0.172671agei – 1.45419e-07incomei
Childreni = -1.95380 – 0.0883125educi + 0.172671agei – 3.04710incomei
I) Interpretation
 If we are not accounting for women level of education, their age and their income values,
the number of children they have would decrease by 1.95380.
 If women education level increased by one level, the number of children they have would
decrease by 0.0883125 assuming their age and annual income are fixed.
 If women age is increased by 1 year, the number of children they have would increase by
0.172671 assuming the level of education and annual income are fixed.
 If women annual income increased by 1$, the number of children they have decreases by
3.04710 assuming their level of education and their age are fixed.
II) Test each estimated coefficient using t-test at 5% significance level

First we hypothesize our assumptions before comparing tcrt and tcal.
Ho: 𝜷1 = 0 HA: 𝜷1 ≠ 0
𝜷2 = 0 𝜷2 ≠ 0
𝜷3 = 0 𝜷3 ≠ 0
We now calculate tcrt using gretl and compare the result with tcal of each coefficient of
explanatory variable.
t(4242)
right-tail probability = 0.025
complementary probability = 0.975
two-tailed probability = 0.05
Critical value = 1.96052

So from gretl we have tcrt = 1.96052
4|Page
For 𝜷1 which is level of education of women,
tcal = │−14.70│ > tcrt = 1.96052
So we should reject null hypothesis and conclude level of education of women has a significant
effect on the number of children women have. So, in our case, we can say that level of education
is statistically significant.
For 𝜷2 which is age of women,

tcal = 63.28 > tcrt = 1.96052
So we should reject the null hypothesis and conclude that age of women has a significant effect
on the number of children women have. So, in our case, we can say, age of women is statistically
significant.
For 𝜷3 which is annual income of women

tcal = │−0.4986│ < tcrt = 1.96052
So we should not reject the null hypothesis. In our sample data and α we are unable to show that
annual income has a significant effect on the number of children women have. So, in our case,
we can say, annual income is statistically insignificant.
III) Test the Overall Significance of the Model

For overall significance test we are using F-test for evaluating the model and we build our
assumptions all over again.
Ho: 𝜷1 = 0 HA: at least one 𝜷i ≠ 0

𝜷2 = 0
𝜷3 = 0
From the gretl data we got Fcal = 1759.722. Now we calculate Fcrt using R2, n and k using gretl.
R2 = 0.554466, n = 4246, k =4
F (3, 4246)
Since Fcrt = 3.1192 < Fcal = 1759.722 we reject the null hypothesis. Even though annual income
of women is not statistically significant, the explanatory variables as a whole can determine the
number of children women have. Also R2 = 0.554466 shows to what extent the explanatory
variables explain the dependent variable which is number of the women’s children.
5|Page
IV) What Could You Say About the Overall Results
Even though annual income of women is not statistically significant, the other two have (age of
women and level of education od women) were statistically significant. In over all, the variables
as a whole can have significant determination over the dependent variable. Please bear in mind
that the number of observation also has significant effect especially on t-test.
V) Conduct the Breusch-Pagan test and white test for heteroscedasticity. Does its result
justify the use of robust standard errors? If yes, re-estimate the model using the
robust standard errors and comment on the difference with respect to the model
estimated by simple OLS.
Let’s first do regression on Breusch-Pagan test and see the results in gretl.
A) Breusch-Pagan Test for Heteroscedasticity

OLS, using observations 1-4246
Dependent variable: scaled uhat^2
coefficient std. error t-ratio p-value

-----------------------------------------------------------
const −1.29864 0.114386 −11.35 1.90e-029 ***
educ −0.0714828 0.00723683 −9.878 9.15e-023 ***
age 0.0997277 0.00328698 30.34 3.61e-183 ***
income −6.81308e-07 3.51327e-07 −1.939 0.0525 *
Explained sum of squares = 4128.73
Test statistic: LM = 2064.365571,

with p-value = P(Chi-square(3) > 2064.365571) = 0.000000
Since our P-value is = 0.000000 < 0.05, the null hypothesis of homoscedasticity is rejected and
heteroscedasticity assumed.
6|Page
B) White's Test for Heteroscedasticity
Dependent variable: uhat^2

--------------------------------------------------------------
const −1.29129 0.828709 −1.558 0.1193
educ 0.123213 0.0689513 1.787 0.0740 *
age 0.0344551 0.0496848 0.6935 0.4880
income −5.93256e-08 3.98387e-06 −0.01489 0.9881
sq_educ 0.00131759 0.00316440 0.4164 0.6772
X2_X3 −0.00979254 0.00192334 −5.091 3.71e-07 ***
X2_X4 −4.90956e-08 1.93710e-07 −0.2534 0.7999
sq_age 0.00397821 0.000747963 5.319 1.10e-07 ***
X3_X4 −2.82541e-07 9.27295e-08 −3.047 0.0023 ***
sq_income 1.20290e-011 4.48308e-012 2.683 0.0073 ***
Unadjusted R-squared = 0.256997
Test statistic: TR^2 = 1091.209936,

Since our p-value = 0.000000 < 0.05, we reject the null hypothesis and assume the model
presents hetroscedacity.
Now in both methods hetroscedacity problem is shown and one way of converting that to
homoskedacity is using robust standard errors but we can’t be sure what kind of hetroscedacity
problem we are facing.
So now let’s do robust standard errors with hetroscedacity with gretl.

Heteroskedasticity-robust standard errors, variant HC1

const −1.95380 0.0952919 −20.50 <0.0001 ***
educ −0.0883125 0.00618041 −14.29 <0.0001 ***
age 0.172671 0.00344708 50.09 <0.0001 ***
income −1.45419e-07 2.41995e-07 −0.6009 0.5479
7|Page
F(3, 4242) 965.2803 P-value(F) 0.000000
One way to correct hetroscedacity is to apply robust standard errors. The model estimated by the
simple OLS has standard errors that were not consistent but in robust standard error method we
got consistent standard errors even when the error them was heteroscedastic. This method is
mostly applicable when number of observation such as ours is huge in number.
VI) Test whether there is presence of multicollinearity in the data and comment the
results.
Collinearity is a linear association between two predictors. Multicollinearity is a situation where
two or more predictors are highly linearly related. In general, an absolute correlation coefficient
of >0.7 among two or more predictors indicates the presence of multicollinearity.
First let’s check with correlation matrix and see our explanatory variables relationship with one
another.
Correlation coefficients, using the observations 1 - 4246

5% critical value (two-tailed) = 0.0301 for n = 4246
age educ income

1.0000 -0.3043 -0.0229 age
1.0000 -0.0014 educ
1.0000 income
Here in the above table we can clearly see no two or more than two variables are linearly related
or their absolute correlation coefficient < 0.7. So, can conclude that this model’s data does not
have multicollinearity.
C. (10 points) Suppose you redefine the model as:

𝑪𝒉𝒊𝒍𝒅𝒓𝒆𝒏𝒊=𝜷𝟎+𝜷𝟏𝒆𝒅𝒖𝒄𝒊+𝜷𝟐𝒂𝒈𝒆𝒊+𝜷𝟑𝒂𝒈𝒆𝒊𝟐+𝜷𝟒𝒍𝒏(𝒊𝒏𝒄𝒐𝒎𝒆)+𝜷𝟓𝑫_𝒑𝒍𝒂𝒄𝒆+ 𝜺𝒊
Where:
𝑎𝑔𝑒𝑖2∶ woman’s age in square form,
ln(income): Woman’s annual income in logarithmic form
D_place: Dummy variable of 1 for woman who are living in urban, and 0, in rural.
8|Page
i. Define the variables you need and estimate the equation.
ii. Interpret the coefficients
iii. Test each estimated coefficient using t-test at 5% significance level.
iv. Test the overall significance of the model.
v. What could you say about the overall results?
vi. Test the presence of heteroscedasticity and multicollinearity problems in the data.
vii. Provide overall comments on the estimated results in model (B) and model (C). Is there any
result difference?
Answers
I) Define the Variables You Need and Estimate the Equation
First let’s regret our data based on the model we are given with gretl then we can interpret it.


Const −2.28426 0.273799 −8.343 <0.0001 ***
Educ −0.0792902 0.00569582 −13.92 <0.0001 ***
Age 0.313323 0.0160131 19.57 <0.0001 ***
age2 −0.00257161 0.000263032 −9.777 <0.0001 ***
l_income −0.0909209 0.0164283 −5.534 <0.0001 ***
Place −0.935078 0.0485165 −19.27 <0.0001 ***

F(5, 4240) 1287.594 P-value(F) 0.000000
After get our explanatory variable’s coefficient we can now form our model with tangible data.
9|Page
Childreni = −2.28426 −0.0792902educi + 0.313323agei −0.00257161age2i
−0.0909209ln(income) −0.935078D_place
II) Interpretation
 If all explanatory variables and the dummy variable was excluded from making an effect
on the model, the number of children women have would decrease by 2.28426.
 If women education level increased by one level, the number of children women have
decreases by 0.0792902 assuming the other explanatory variable and the dummy variable
are fixed (constant).
 If women age is increased by 1 year, the number of children women have increases by
0.313323 assuming the other explanatory variable and the dummy variable are fixed
(constant).
 If women age is doubled, the number of children women have decreases by
(constant).
 If women annual income is in log form, the number of children women have decreases by
(constant).
 Since our dummy (D_place) is encoded as 1 for women living in urbans areas and 0
women living in rural areas, our coefficient tells us that women living in urban areas have
0.935078 much less number of children than women living in rural areas assuming the
other explanatory variable are fixed (constant).
III) Test each estimated coefficient using t-test at 5% significance level
First we hypothesize our assumptions before comparing tcrt and tcal.

Ho: 𝜷1 = 0 HA: 𝜷1 ≠ 0
𝜷2 = 0 𝜷2 ≠ 0
𝜷3 = 0 𝜷3 ≠ 0
𝜷4 = 0 𝜷4 ≠ 0
𝜷5 = 0 𝜷5 ≠ 0
10 | P a g e
We now calculate tcrt using gretl and compare the result with tcal of each coefficient of
explanatory variable.
t(4240)
two-tailed probability = 0.05
So from gretl we have tcrt = 1.96052. Now let’s find tcal for the explanatory variables.
For 𝜷1 which is level of education of women,
tcal = │−13.92│> tcrt = 1.96052
So, we should reject null hypothesis and conclude level of education of women has a significant
effect on the number of children women have. So, in our case, we can say that level of education
is statistically significant.
For 𝜷2 which is age of women,

tcal = 19.57 > tcrt = 1.96052
So, we should reject the null hypothesis and conclude age of women has a significant effect on
the number of children women have. So, in our case, we can say that age of women is
statistically significant.
For 𝜷3 which is squared form of age of women,

tcal = │−9.777│> tcrt = 1.96052
So, we should reject the null hypothesis and conclude that squared form of age of women has a
significant effect on the number of children women have. So, in our case, we can say that
squared form of age of women is statistically significant.
For 𝜷4 which is logarithmic form of annual income of women,

tcal = │−5.534│> tcrt = 1.96052
11 | P a g e
So, we should reject the null hypothesis and conclude that logarithmic form of annual income of
women has a significant effect on the number of children women have. So, in our case, we can
say that logarithmic form of annual income of women is statistically significant.
For 𝜷5 which is dummy variable for place of 1 for woman who are living in urban, and 0, in
rural.
tcal = │−19.27│> tcrt = 1.96052
So, we should reject the null hypothesis and conclude that the dummy variable of the place
women live has a significant effect on the number of children women have. So, in our case, we
can say that the dummy variable of the place women live is statistically significant. Note that
D_place represents women living in the urban areas explicitly and women living the rural areas
are the reference group.
IV) Test the Overall Significance of the Model

For overall significance test we are using F-test for evaluating the model and we build our
assumptions all over again.
Ho: 𝜷1 = 0 HA: at least one 𝜷i ≠ 0

𝜷2 = 0
𝜷3 = 0
𝜷4 = 0
𝜷5 = 0
From the gretl data we got Fcal = 1287.594. Now we calculate Fcrt using α, n and k using gretl.
R2 = 0.602921, n = 4246, k = 6.
F(5, 4240)
Since Fcrt = 2.56948 < Fcal = 1287.594, we reject the null hypothesis that is Ho:
𝜷1=𝜷2=𝜷3=𝜷4=𝜷5=0 and confirm that level of education of women, age of women, squared form
of age of women, logarithmic form of annual income of women and dummy variable of the place
12 | P a g e
women live are significant determinant of the number of children women have. Also R 2 =
0.602921 shows the degree the two explanatory variables explain the dependent variable. So, we can
deduce the model is adequate.
V) What Could You Say about the Overall Results?

In overall the second model contains explanatory variables that are significant determinant of the
dependent variable that can influence the result of number of children women have clearly. The
model’s each explanatory variable as a component and as whole has significant value in
determining the dependent variable effectively. In this model there exists dummy variable that
assessed if living areas of women could eventually affect the number of children they have and it
has given tangible results for it in the regression model.
VI) Test the Presence of Heteroscedasticity and Multicollinearity Problems in the

Data
For hetroscedacity let’s use white test find out the result.
Ho: our model is homoscedastic
HA: our model is heteroscedastic
White's test for heteroscedasticity

Dependent variable: uhat^2
Omitted due to exact collinearity: sq_age
---------------------------------------------------------------
const 10.3229 7.12607 1.449 0.1475
educ 0.198344 0.181470 1.093 0.2745
age −0.461700 0.850171 −0.5431 0.5871
age2 0.0221074 0.0426874 0.5179 0.6046
l_income −1.63365 0.579510 −2.819 0.0048 ***
place −2.39795 1.41665 −1.693 0.0906 *
13 | P a g e
sq_educ 0.000517171 0.00259712 0.1991 0.8422
X2_X3 −0.0145600 0.0106214 −1.371 0.1705
X2_X4 9.71392e-05 0.000167706 0.5792 0.5625
X2_X5 0.00488549 0.00939719 0.5199 0.6032
X2_X6 −0.0328336 0.0281863 −1.165 0.2441
X3_X4 −0.000152693 0.000947021 −0.1612 0.8719
X3_X5 0.0280652 0.0269200 1.043 0.2972
X3_X6 −0.0558723 0.0818270 −0.6828 0.4948
sq_age2 1.53985e-08 7.59735e-06 0.002027 0.9984
X4_X5 −0.000905417 0.000434345 −2.085 0.0372 **
X4_X6 0.00129852 0.00130627 0.9941 0.3203
sq_l_income 0.0597375 0.0201752 2.961 0.0031 ***
X5_X6 0.334951 0.0837428 4.000 6.45e-05 ***
Unadjusted R-squared = 0.262407
Test statistic: TR^2 = 1114.181091,

Since our p-value = 0.000000 < 0.05, we reject the null hypothesis and assume the model
presents hetroscedacity.
So now let’s try to correct our model into homoscedastic.
Model 11: Heteroskedasticity-corrected, using observations 1-4246


Const −2.42249 0.181675 −13.33 <0.0001 ***
Educ −0.0657919 0.00419762 −15.67 <0.0001 ***
14 | P a g e
Age 0.232143 0.0124751 18.61 <0.0001 ***
age2 −0.00119918 0.000238516 −5.028 <0.0001 ***
l_income −0.00738102 0.0103147 −0.7156 0.4743
Place −0.454391 0.0363338 −12.51 <0.0001 ***
Statistics based on the weighted data:

F(5, 4240) 1440.066 P-value(F) 0.000000
Statistics based on the original data:

Now let’s check for multicollinearity in our model

Collinearity is a linear association between two predictors. Multicollinearity is a situation where
two or more predictors are highly linearly related. In general, an absolute correlation coefficient
of >0.7 among two or more predictors indicates the presence of multicollinearity.
First let’s check with correlation matrix and see our explanatory variables relationship with one
another.
Correlation coefficients, using the observations 1 - 4246

5% critical value (two-tailed) = 0.0301 for n = 4246
Educ age age2 l_income place

1.0000 -0.3043 -0.3020 0.0021 0.1690 educ
1.0000 0.9881 -0.0256 -0.2910 age
15 | P a g e
1.0000 -0.0337 -0.2815 age2
1.0000 0.0438 l_income
1.0000 place
Here in the above table we can clearly see no two or more than two variables are linearly related
or their absolute correlation coefficient < 0.7. So, can conclude that this model’s data does not
have multicollinearity.
VII) Provide Overall Comments On the Estimated Results in Model (B) and Model
(C). Is There Any Result Difference?
In model B we saw that the model was tested with F-test and shown that the model was adequate
even though annual income of women was not statistically significant. But the rest (level of
education of women and age of women) had effectively influenced the dependent variable that is
number of children women have. The large number of observation also had significant effect on
t-test during when we tried to find individual significance. In model C there more added
explanatory variable and also a dummy variable for the place where women lived. In this model
all explanatory variables have shown significant level of determinacy over the dependent
variable. The explanatory variables as a single unit and also as a whole have shown their
significant value in determining the dependent variable effectively. In this model there exists
dummy variable that assessed if living areas of women could eventually affect the number of
children they have and it has given tangible results for it in the regression model. So generally,
model C was more fit than model B interims of explaining and fitting the given data.
Reference
https://fertilitynetworkuk.org/fertility-faqs/factors-affecting-fertility/
https://en.wikipedia.org/wiki/Income_and_fertility
https://blog.clairvoyantsoft.com/correlation-and-collinearity-how-they-can-make-or-break-a-
model
16 | P a g e
17 | P a g e

Group Assignment

Uploaded by

Copyright:

Available Formats

Group Assignment

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group Assignment

Uploaded by

Copyright:

Available Formats

Debre Birhan University

College of Business and Economics

Department of Management, (MBA) Extension Program

Basic Econometrics (MBA631)

Prepared by: - Section B, Group 4 Members

Submitted to – Dr. Muhammed Siraj

Model 2: OLS, using observations 1-4246

Coefficient Std. Error t-ratio p-value

Mean dependent var 2.259303 S.D. dependent var 2.190434

Childreni = -1.95380 – 0.0883125educi + 0.172671agei – 1.45419e-07incomei

Childreni = -1.95380 – 0.0883125educi + 0.172671agei – 3.04710incomei

II) Test each estimated coefficient using t-test at 5% significance level

Critical value = 1.96052

For 𝜷2 which is age of women,

For 𝜷3 which is annual income of women

III) Test the Overall Significance of the Model

Ho: 𝜷1 = 0 HA: at least one 𝜷i ≠ 0

Critical value = 3.1192

A) Breusch-Pagan Test for Heteroscedasticity

coefficient std. error t-ratio p-value

Explained sum of squares = 4128.73

Test statistic: LM = 2064.365571,

coefficient std. error t-ratio p-value

Unadjusted R-squared = 0.256997

Test statistic: TR^2 = 1091.209936,

So now let’s do robust standard errors with hetroscedacity with gretl.

Model 8: OLS, using observations 1-4246

Coefficient Std. Error t-ratio p-value

Mean dependent var 2.259303 S.D. dependent var 2.190434

Correlation coefficients, using the observations 1 - 4246

age educ income

C. (10 points) Suppose you redefine the model as:

Model 5: OLS, using observations 1-4246

Coefficient Std. Error t-ratio p-value

Mean dependent var 2.259303 S.D. dependent var 2.190434

III) Test each estimated coefficient using t-test at 5% significance level

First we hypothesize our assumptions before comparing tcrt and tcal.

For 𝜷2 which is age of women,

For 𝜷3 which is squared form of age of women,

For 𝜷4 which is logarithmic form of annual income of women,

IV) Test the Overall Significance of the Model

Ho: 𝜷1 = 0 HA: at least one 𝜷i ≠ 0

Critical value = 2.56948

V) What Could You Say about the Overall Results?

VI) Test the Presence of Heteroscedasticity and Multicollinearity Problems in the

White's test for heteroscedasticity

Test statistic: TR^2 = 1114.181091,

So now let’s try to correct our model into homoscedastic.

Model 11: Heteroskedasticity-corrected, using observations 1-4246

Coefficient Std. Error t-ratio p-value

Statistics based on the weighted data:

Statistics based on the original data:

Now let’s check for multicollinearity in our model

Correlation coefficients, using the observations 1 - 4246

Educ age age2 l_income place

You might also like