Group Assignment
Group Assignment
Group Assignment
Group Assignment
Answer
A) As women have a better education which will take time and effort to do so, having
children is going to be difficult because with handing both would be very hard. We are
not saying it is not manageable but only few which is this case are calling the error value
does it or have the endurance to do it. Literature data show that the higher the level of a
woman’s educational attainment, the fewer children she is likely to bear. Given that
fewer children per woman and delayed marriage and childbearing could mean more
resources per child and better health and survival rates for mothers and children. So, we
expect the coefficient for education to have a negative sign after regression with the set of
data were are given.
B) As age goes up, women will have a better chance of having children. Of course there
exists a limit where they can’t no longer have children but we are focusing on the point
where they can conceive and be able to have children and as they mature their bodies
would be more capable to have children. So, we expect the coefficient for education to
have a positive sign after regression with the set of data were are given.
C) Income and fertility is the association between monetary gain on one hand, and the
tendency to produce offspring on the other. There is generally an inverse correlation
between income and the total fertility rate within and between nations. The higher the
degree of education and GDP per capita of a human population, subpopulation or social
stratum, the fewer children are born in any developed country. Many studies find fertility
is lower among better educated women and is often higher among women whose families
1|Page
own more land and assets. So, we expect the coefficient for education to have a negative
sign after regression with the set of data were are given.
There many other factors that has influence on women’s fertility. We were able to gather some
of them and are presented below.
Alcohol
Some studies report that drinking more than 5 units of alcohol a week may reduce female fertility
but others state that low to moderate alcohol consumption may be associated with higher
pregnancy rates than non-drinkers. Once pregnant, excessive alcohol consumption may lead to
birth defects and developmental delay. The Royal College of Obstetricians and Gynecologists
and the Department of Health recommend that women trying to get pregnant should avoid
alcohol because there is no ‘safe’ limit.
Smoking
Women who smoke are 3 times more likely to experience a delay in getting pregnant than non-
smokers. Even passive smoking can be harmful. Smoking reduces a woman’s ovarian reserve
(so her ovaries will have fewer eggs in them than a woman of the same age who does not smoke)
and damages the cilia inside the fallopian tube (which are important for transporting the egg
and/or embryo along the fallopian tube into the uterus). In men, smoking may reduce sperm
quantity and quality.
Previous Pregnancy
Couples are more likely to get pregnant if they have previously achieved a pregnancy together
(irrespective of whether or not that pregnancy resulted in the birth of a baby) compared to
couples that have never been pregnant.
Duration of subfertility
The longer couples have been trying to get pregnant, the less likely they are to be successful. If a
couple have been trying to get pregnant for less than 3 years they are almost twice as likely to get
pregnant than couples who have been trying for more than 3 years.
Over-the-counter and Recreational Drugs
Non-steroidal anti-inflammatory drugs such as ibuprofen can interfere with ovulation. Aspirin
may interfere with implantation. Recreational drugs such as marijuana and cocaine may interfere
with ovulation and/or the function of the fallopian tube. The fallopian tube is important for
transporting the egg from the ovary where it is released, to the womb (uterus) where an embryo
will hopefully implant. Fertilization occurs in the fallopian tube. Anabolic steroids, which are
abused by some body-builders, inhibit the production of sperm and this may be permanent even
if the drug is stopped.
2|Page
Medical Conditions
Some women may have medical conditions that can affect their fertility. These may or may not
be known about when starting to try for a family. Some of these conditions may be more
general, for example thyroid disease and vitamin D deficiency whilst others may be more
specific, for example, polycystic ovary syndrome and endometriosis.
B. (10 points) Suppose one of your colleagues provides you data that were collected in
Botswana. The data consist of children, education, age and living place (place)1 of sample
women. Using the data ‘fertile’
i. Estimate the model and interpret the coefficients.
ii. Test each estimated coefficient using t-test at 5% significance level.
iii. Test the overall significance of the model.
iv. What could you say about the overall results?
v. Conduct the Breusch-Pagan test and white test for heteroscedasticity. Does its result justify the
use of robust standard errors? If yes, re-estimate the model using the robust standard errors and
comment on the difference with respect to the model estimated by simple OLS.
vi. Test whether there is presence of multicollinearity in the data and comment the results.
Answers
For this part we used gretl for performing regression and based on the results we performed
interpretation as follows.
3|Page
Schwarz criterion 15307.80 Hannan-Quinn 15291.36
After get our explanatory variable’s coefficient we can now form our model with tangible data.
I) Interpretation
If we are not accounting for women level of education, their age and their income values,
the number of children they have would decrease by 1.95380.
If women education level increased by one level, the number of children they have would
decrease by 0.0883125 assuming their age and annual income are fixed.
If women age is increased by 1 year, the number of children they have would increase by
0.172671 assuming the level of education and annual income are fixed.
If women annual income increased by 1$, the number of children they have decreases by
3.04710 assuming their level of education and their age are fixed.
We now calculate tcrt using gretl and compare the result with tcal of each coefficient of
explanatory variable.
t(4242)
right-tail probability = 0.025
complementary probability = 0.975
two-tailed probability = 0.05
4|Page
For 𝜷1 which is level of education of women,
tcal = │−14.70│ > tcrt = 1.96052
So we should reject null hypothesis and conclude level of education of women has a significant
effect on the number of children women have. So, in our case, we can say that level of education
is statistically significant.
So we should reject the null hypothesis and conclude that age of women has a significant effect
on the number of children women have. So, in our case, we can say, age of women is statistically
significant.
So we should not reject the null hypothesis. In our sample data and α we are unable to show that
annual income has a significant effect on the number of children women have. So, in our case,
we can say, annual income is statistically insignificant.
F (3, 4246)
right-tail probability = 0.025
complementary probability = 0.975
Since Fcrt = 3.1192 < Fcal = 1759.722 we reject the null hypothesis. Even though annual income
of women is not statistically significant, the explanatory variables as a whole can determine the
number of children women have. Also R2 = 0.554466 shows to what extent the explanatory
variables explain the dependent variable which is number of the women’s children.
5|Page
IV) What Could You Say About the Overall Results
Even though annual income of women is not statistically significant, the other two have (age of
women and level of education od women) were statistically significant. In over all, the variables
as a whole can have significant determination over the dependent variable. Please bear in mind
that the number of observation also has significant effect especially on t-test.
V) Conduct the Breusch-Pagan test and white test for heteroscedasticity. Does its result
justify the use of robust standard errors? If yes, re-estimate the model using the
robust standard errors and comment on the difference with respect to the model
estimated by simple OLS.
Let’s first do regression on Breusch-Pagan test and see the results in gretl.
Since our P-value is = 0.000000 < 0.05, the null hypothesis of homoscedasticity is rejected and
heteroscedasticity assumed.
6|Page
B) White's Test for Heteroscedasticity
OLS, using observations 1-4246
Dependent variable: uhat^2
Since our p-value = 0.000000 < 0.05, we reject the null hypothesis and assume the model
presents hetroscedacity.
Now in both methods hetroscedacity problem is shown and one way of converting that to
homoskedacity is using robust standard errors but we can’t be sure what kind of hetroscedacity
problem we are facing.
7|Page
Sum squared resid 9074.411 S.E. of regression 1.462594
R-squared 0.554466 Adjusted R-squared 0.554151
F(3, 4242) 965.2803 P-value(F) 0.000000
Log-likelihood −7637.191 Akaike criterion 15282.38
Schwarz criterion 15307.80 Hannan-Quinn 15291.36
One way to correct hetroscedacity is to apply robust standard errors. The model estimated by the
simple OLS has standard errors that were not consistent but in robust standard error method we
got consistent standard errors even when the error them was heteroscedastic. This method is
mostly applicable when number of observation such as ours is huge in number.
VI) Test whether there is presence of multicollinearity in the data and comment the
results.
Collinearity is a linear association between two predictors. Multicollinearity is a situation where
two or more predictors are highly linearly related. In general, an absolute correlation coefficient
of >0.7 among two or more predictors indicates the presence of multicollinearity.
First let’s check with correlation matrix and see our explanatory variables relationship with one
another.
Here in the above table we can clearly see no two or more than two variables are linearly related
or their absolute correlation coefficient < 0.7. So, can conclude that this model’s data does not
have multicollinearity.
8|Page
i. Define the variables you need and estimate the equation.
ii. Interpret the coefficients
iii. Test each estimated coefficient using t-test at 5% significance level.
iv. Test the overall significance of the model.
v. What could you say about the overall results?
vi. Test the presence of heteroscedasticity and multicollinearity problems in the data.
vii. Provide overall comments on the estimated results in model (B) and model (C). Is there any
result difference?
Answers
I) Define the Variables You Need and Estimate the Equation
First let’s regret our data based on the model we are given with gretl then we can interpret it.
After get our explanatory variable’s coefficient we can now form our model with tangible data.
9|Page
Childreni = −2.28426 −0.0792902educi + 0.313323agei −0.00257161age2i
−0.0909209ln(income) −0.935078D_place
II) Interpretation
If all explanatory variables and the dummy variable was excluded from making an effect
on the model, the number of children women have would decrease by 2.28426.
If women education level increased by one level, the number of children women have
decreases by 0.0792902 assuming the other explanatory variable and the dummy variable
are fixed (constant).
If women age is increased by 1 year, the number of children women have increases by
0.313323 assuming the other explanatory variable and the dummy variable are fixed
(constant).
If women age is doubled, the number of children women have decreases by
0.00257161 assuming the other explanatory variable and the dummy variable are fixed
(constant).
If women annual income is in log form, the number of children women have decreases by
0.0909209 assuming the other explanatory variable and the dummy variable are fixed
(constant).
Since our dummy (D_place) is encoded as 1 for women living in urbans areas and 0
women living in rural areas, our coefficient tells us that women living in urban areas have
0.935078 much less number of children than women living in rural areas assuming the
other explanatory variable are fixed (constant).
10 | P a g e
We now calculate tcrt using gretl and compare the result with tcal of each coefficient of
explanatory variable.
t(4240)
right-tail probability = 0.025
complementary probability = 0.975
two-tailed probability = 0.05
Critical value = 1.96052
So from gretl we have tcrt = 1.96052. Now let’s find tcal for the explanatory variables.
For 𝜷1 which is level of education of women,
tcal = │−13.92│> tcrt = 1.96052
So, we should reject null hypothesis and conclude level of education of women has a significant
effect on the number of children women have. So, in our case, we can say that level of education
is statistically significant.
11 | P a g e
So, we should reject the null hypothesis and conclude that logarithmic form of annual income of
women has a significant effect on the number of children women have. So, in our case, we can
say that logarithmic form of annual income of women is statistically significant.
For 𝜷5 which is dummy variable for place of 1 for woman who are living in urban, and 0, in
rural.
tcal = │−19.27│> tcrt = 1.96052
So, we should reject the null hypothesis and conclude that the dummy variable of the place
women live has a significant effect on the number of children women have. So, in our case, we
can say that the dummy variable of the place women live is statistically significant. Note that
D_place represents women living in the urban areas explicitly and women living the rural areas
are the reference group.
From the gretl data we got Fcal = 1287.594. Now we calculate Fcrt using α, n and k using gretl.
R2 = 0.602921, n = 4246, k = 6.
F(5, 4240)
right-tail probability = 0.025
complementary probability = 0.975
Since Fcrt = 2.56948 < Fcal = 1287.594, we reject the null hypothesis that is Ho:
𝜷1=𝜷2=𝜷3=𝜷4=𝜷5=0 and confirm that level of education of women, age of women, squared form
of age of women, logarithmic form of annual income of women and dummy variable of the place
12 | P a g e
women live are significant determinant of the number of children women have. Also R 2 =
0.602921 shows the degree the two explanatory variables explain the dependent variable. So, we can
deduce the model is adequate.
Since our p-value = 0.000000 < 0.05, we reject the null hypothesis and assume the model
presents hetroscedacity.
14 | P a g e
Age 0.232143 0.0124751 18.61 <0.0001 ***
age2 −0.00119918 0.000238516 −5.028 <0.0001 ***
l_income −0.00738102 0.0103147 −0.7156 0.4743
Place −0.454391 0.0363338 −12.51 <0.0001 ***
First let’s check with correlation matrix and see our explanatory variables relationship with one
another.
15 | P a g e
1.0000 -0.0337 -0.2815 age2
1.0000 0.0438 l_income
1.0000 place
Here in the above table we can clearly see no two or more than two variables are linearly related
or their absolute correlation coefficient < 0.7. So, can conclude that this model’s data does not
have multicollinearity.
VII) Provide Overall Comments On the Estimated Results in Model (B) and Model
(C). Is There Any Result Difference?
In model B we saw that the model was tested with F-test and shown that the model was adequate
even though annual income of women was not statistically significant. But the rest (level of
education of women and age of women) had effectively influenced the dependent variable that is
number of children women have. The large number of observation also had significant effect on
t-test during when we tried to find individual significance. In model C there more added
explanatory variable and also a dummy variable for the place where women lived. In this model
all explanatory variables have shown significant level of determinacy over the dependent
variable. The explanatory variables as a single unit and also as a whole have shown their
significant value in determining the dependent variable effectively. In this model there exists
dummy variable that assessed if living areas of women could eventually affect the number of
children they have and it has given tangible results for it in the regression model. So generally,
model C was more fit than model B interims of explaining and fitting the given data.
Reference
https://fertilitynetworkuk.org/fertility-faqs/factors-affecting-fertility/
https://en.wikipedia.org/wiki/Income_and_fertility
https://blog.clairvoyantsoft.com/correlation-and-collinearity-how-they-can-make-or-break-a-
model
16 | P a g e
17 | P a g e