0% found this document useful (0 votes)
33 views26 pages

Econometric Stata Report and Example

how use stata do econometric analysis and write a report.

Uploaded by

南侠展昭
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views26 pages

Econometric Stata Report and Example

how use stata do econometric analysis and write a report.

Uploaded by

南侠展昭
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

STATA

Project 2: Study the effect of being unemployed on health and wellbeing

This assignment requires the empirical analysis of the dataset: British Household Panel Survey:
Waves 1-11, 1991-2002: Teaching Dataset (Work, Family and Health). Description of the data are
provided in the BHPS User Guide, description of each variable is in the code book, and the
questionnaire is also provided for more details about each variable. Hint: Look at the variable
ajbstat.

INSTRUCTIONS

• The due date for this assignment is ANYTIME before 19th January 2022, 12pm, midday

• Based on your data analysis, write a short report. There is 3,500 words limit (tables and
figures excluded).
• Upload also your do file so that we can see how you have prepared your data and the
details of your analysis.

The following steps might help you in the analysis.

• Identify the outcome or outcomes of interest. What variables can be considered


outcome variables?
• Identify your main explanatory variable or variable of interest.
• Which are the most important controls? Justify your choices. You might want to
consider, for example, the most important socio-demographic information and the
variables relevant for this specific analysis. Be careful about multicollinearity.
• Note: This is a real dataset, and real data implies missing data, coding errors, etc.
Use your common sense to pick and modify the variables if needed.
• Once your variables are ready, use a regression technique to find the effects of X on
Y(s).
• Interpret your results carefully. Your regressions need to make sense (for example,
do you find what you expected?).
• Can your results be interpreted in a causal way? If not, why? How can you obtain
causal effects? Note: If you cannot solve this empirically, provide the rational of an
empirical model that would allow you to find causal estimates for this specific
research question.
• Write a report that contains a carefully done data analysis structured as follows:
Introduction, Data (a nice table with descriptive statistics on your main variables is
always welcome), Empirical Strategy (write down the model that you will estimate),
Results, and Conclusions.
• Find in the literature 2-3 papers that study a similar research question and use them
to complement your findings. Note: to search for a paper you can go to
scholar.google.com or search, for example, in Journal of Health Economics, Journal
of Development Economics, American Economic Review, Journal of Political
Economics, The Quarterly Journal of Economics, American Economic Journal:
Applied Economics, Journal of Development Studies, World Development, etc.
• If you want to paste Stata output into word use courier font 8 to make it legible. If
you want to include in the same table multiple regression outputs you can use the
following code:

eststo: reg y x

eststo: reg y x y z

esttab, se r2 star
Assessment Structure

In the summative, the data and the main research question will be provided. The summative
should follow this structure.

1) Introduction (10%)

Research question / motivation / previous literature


How are you going to answer the research questions? (brief)

2) Data (25%)

Explain the dataset:

- Define the dataset and what it collects


- Descriptive: number of individuals, descriptive of main outcomes and covariates with
tables and graphs.
- Missing values
- Transformations of variables (log, categories, square…)
- Potential censoring
- Endogeneity problems

3) Methods (20%)

Econometric models that will be used to answer the research question:

- Describe the models used (assumptions, equations and theoretical justification)

4) Results (25%)

Present the results with graphs and tables. Do not copy-paste the stata output – prepare nice
tables (like the ones in papers).

Interpret the results (magnitude and significance of the coefficients) of the final model and
alternative specifications.

5) Discussion (20%)

Summary of the results and relate them to previous literature


Discuss alternative model specifications
Critical approach and originality.
Explain the data limitations
Conclusions – further research
Unveiling relationship between health status and contemporaneous income

ABSTRACT

Objectives: To examine relationship between individual’s health status and contemporaneous income
and to investigate dynamics of individual’s health by various personal characteristics.

Setting: Analysis exploits British Household Panel Survey (BHPS) teaching dataset from 1991 to 2002
which includes data on 9,912 individuals that were interviewed during wave one and followed for
subsequent eleven waves.

Methods: Dichotomous variable was constructed to categorise individuals that are in “good health”
and in “less than good health”. Relationship between health and income is estimated using pooled logit
and fixed effects logit model separately for both genders. Random effects model was also performed so
fixed effects model can be properly tested. Fixed effects is an appropriate model to control for
unobserved heterogeneity among individuals. Two models were developed to model relationship
between income and health. First model includes basic socioeconomic characteristics. Model 2 builds
on model 1 and additionally includes certain variables that provide objective information on health
situations.

Results: Results from pooled model suggest that there is a positive relationship between health and
income for both genders and the relationship is slightly stronger for females. When including individual
effects, results from fixed effects model 1 suggest that positive relationship is still present and
significant, although when including other information on health situations, relationship becomes
insignificant. Sensitivity analysis of a different threshold for defining “good health” showed that results
were consistent for females, but not for males.

Conclusion: Relationship between income and health is very complex. Analysis is performed on BHPS
teaching dataset and attrition bias was not taken into account. In addition, analysis did not explore
impact that previous health state may have on current one, and relationship that permanent income may
have on health due to its cumulative effect. Finally, chosen measure of health is based on self-reported
health status which may not capture health status accurately and therefore may contain measurement
error. In order to draw more reliable conclusions, analysis should be conducted on a dataset that is more
representative for UK population, while taking into account above mentioned limitations because
results may have important implications on allocation of available public resources.

1. Introduction

1
There is an established positive association between health and socioeconomic status (SES). Deaton
(2002, pg. 14) explains that this relationship is usually referred as “gradient” in order to emphasise the
gradual relationship between the two. Thanks to economic development, advancement in technology
and improved social conditions health outcomes are improving over time. Wealthier individuals are
more likely to live in better environment and have healthier lifestyle and generally tend to be in better
health. It is not clear whether that is because they have more income or because they are in better health
which gives them greater ability to earn more income. Evans, Wolfe and Adler (2012) emphasise that
positive association between health and different indicators of SES, particularly income, has been
identified for individuals of all ages and across all countries where this relationship has been examined.

Motivation for observing this relationship arises from important policy implications this
relationship may have on allocation of available public resources. Populations’ health is always one of
the ultimate goals of public policies. Establishing relationship between health status and income
provides a signal where and how to displace resources. If the relationship is positive and significant,
then stronger redistribution policies should be implemented to improve aggregate health of population.
In addition, if health of population is determined by income, government’s interest should be to design
policies that are aimed at people in deprived areas, who are more likely to be in poor health. That way
they will be able to increase their income levels and secure themselves a better living environment
which would reflect positively on their health.

The goal of this paper is to get better understanding of the relationship between health and income
as an indicator of SES and possible pathways behind this relationship. Although primary focus is on the
nature of relationship between current income and health, in order to assess it properly, analysis includes
other personal characteristics of individuals. Paper is built on analysis of British Household Panel
Survey (BHPS) teaching dataset from 1991 to 2002 and includes econometric analysis of longitudinal
data to assess income-health relationship and the dynamics of individual health.

2. Income-health relationship

Motivation of researches is to isolate causal effects, but disentangling causality in income-health


relationship is challenging for several reasons. Grossman model (1972) of demand for health gave a
framework for analysis of socioeconomic inequalities and dynamics of individual health. Firstly, there
are couple of reasons why income would be important for individual’s health. Pathway through which
income could impact health is that having more income provides more routes to good health such as
safer environment via neighbour choice, easier access to exercise facilities, improved access to health
care, better nutrition which consequently reflects positively on individual’s health (Evans et al., 2012,
pg.23). For example, Deaton (2002) argues that probability of becoming disabled is much higher for
less educated, poorer and of lower social class individuals. On the other hand, health can also be
determinant of income. Poor health can limit someone’s ability to earn and those people may be

2
discriminated by employers offering them lower income due to perceived ability and productivity.
Grossman and Benham (1974) concluded that health impacts productivity which could induce
individual’s income. Smith (1999, 2004) studied income-health relationship and also considered the
problem of reversed causation and found that health events can impact individuals’ health and wealth.
Final scenario is that there could be factors that impact both income and health. For instance, genetics
and motivation can be a determinant of both, but these factors are hard to observe. Due to reversed
causality and unobservable factors it is hard to reveal causal effect of income on health. Evans (2002)
acknowledges that although association between health and income is well known, causal effect is less
clear.
Possible strategies to address causality in income-health relationship are to conduct natural
experiment or studying children. Experiments are rare in social sciences and very often not allowed or
impossible to conduct. Adams et al. (2002, pg. 3) indicate that even though there has been many debates
about causality behind this association, only few natural experiments permit causal pathways to be
definitively identified. Lindahl (2005) studied effect of winning the lottery to isolate income effect and
found that winners are in better health than losers, so there is a positive effect of income on health, but
there are issues of selection. Frijters et al. (2005) observed effect of reunification of Germany on health
of people that lived in East Germany and found relatively small health improvement due to increase in
income. Despite that, Jones et al. (2005) point that there is a long-term and cumulative effect of income
and other indicators of SES from childhood to the health of adults. The logic behind studding
relationship in children is that as children do not affect household income, they are affected by SES of
their parents. Higher income in the household implies better living conditions which improves
children’s health. This approach is not perfect, but provides major gains in overcoming problems of
reverse causality. Case et al. (2001) found evidence of income-health gradient in children and Currie at
al. (2007) obtained similar result for children in England.
Although there is clear evidence of existence of income-health relationship, causality between
health and income is not so clear and till now it is not possible to fully explain existing differences.
Jones et al. (2006, pg.4) emphasise that causal mechanisms behind this are complex and controversial.
Available dataset is limited for revealing causality because only adults are included in the dataset, but
it can be used for observing relationship between income and health. Main focus is put on relationship
between health and current income.

3. BHPS teaching dataset

Analysis exploits BHPS teaching dataset from 1991 to 2002 for individuals aged 20 and above.
Generally, BHPS is an annual survey of adults aged 16 and over in Great Britain from a nationally

3
representative sample of more than 5,000 households (Taylor et al, 2010, pg. A2-2). Teaching dataset
contains data for 9,912 individuals that were interviewed during wave one and followed for subsequent
eleven waves. Dataset includes information on socio-demographic, occupational and health
characteristics of individuals. Initial households were selected using a two-stage stratified systematic
method so each household address had an equal probability to be included in the sample (Taylor et al,
2010, pg. A4-1). Longitudinal dataset enables observation of socio-economic dynamics on household
and individual level.

3.1. Self-assessed health status and observed health outcomes

BHPS teaching dataset contains a variable that describes self-assessed health (SAH) status. SAH is
defined as an answer to: “Please think back over the last 12 months about how your health has been.
Compared to people of your own age, would you say that your health has on the whole been excellent
/ good / fair / poor / very poor?” SAH is a subjective perception of health status in relation to the
individuals’ concept of the normal health for their age group (Contoyannis et al., 2004, pg. 475). There
is an extensive use of SAH as a health outcome in previous studies that have examined the relationship
between health and socioeconomic status like for example Smith, 1999; Benzeval and Judge, 2001;
Adams et al., 2003; Contoyannis et al., 2004 or Jones at al., 2006. SAH is very simple, but powerful
variable. Idler and Kasl (1995) showed that SAH is a good measure in predicting subsequent mortality
and Burström and Fredlund (2001) proved that this holds for all socioeconomic groups. Used BHPS
dataset contains variable SAH for all waves, but not for wave 9 so that wave is excluded from the
analysis.
Variable SAH is an ordinal categorical variable, but even if we score these categories for example
from 1 to 5, differences between these categories are not the same. One of the methods of scaling SAH
suggested by O’Donnell et al. (2008) is to dichotomise SAH into categories that describe “good” and
“less than good” health. This approach avoids the imposition of certain scale for which it is assumed to
indicate how much more health is enjoyed in one category compared with another for individual
(O’Donnell et al., 2008, pg. 58). Some information may be lost with such transformation and cut-off
point is set arbitrarily. Despite that Salomon et al. (2004) point that if the threshold refers to a single
population, then such variable can be used in observing variation of defined level of health. For the
purpose of this analysis SAH variable is dichotomized into binary form. Variable that describes “good
health” (gdhl) equals 1 if an individual’s health state is “excellent” or “good” and 0 if otherwise. Other
responses describe health as “less than good”. This approach was used in many studies such as Buckley
at al. (2004) and Kuklys (2005). Sensitivity analysis of chosen cut-off point is in appendix 1.

3.2. Other variables that describe individual characteristics

Income is measured by eqinc variable which is equivalized annual household income. Logarithmic
transformation is applied to eqinc in order to allow for concavity that is present in relationship between

4
health and income. As Lindley and Lorgelly (2005) suggest, there is a U-shaped relationship between
age and health, so variables for age and age squared are included.

Many socioeconomic variables are included in the model. Education can have significant impact
on health and therefore it is included in the model. More educated individuals tend to heave healthier
living lifestyle and have higher stock of health-related knowledge. Variable educ describes education
and includes 4 categories where each category denotes particular level of attained qualification by the
end of the observed period. Levels are degree, HND and A level, CSE and O level and no obtained
qualification. Living alone or with a partner can also impact health. Marital status is described by mlstat
variable. It includes categories for married, divorced and separated, widowed and never married.
Variable hhsize describes household size. Analysis also includes variable econ for economic activity
which has following categories: employed, unemployed, retired, maternity leave and family care.
Especially, unemployment can put a lot of psychological pressure and stress on individual that can have
detrimental effect on individual’s health. Other categories that described economic activity in dataset
had a very small number of observations so they were excluded from analysis. Ethnicity is described
by variable race which is a binary variable that denotes whether individual is white or non-white.

Environmental factors can also impact health due to natural environment and quality of health care
in particular area, so variable region is included in the model with categories for London, England
without London, Wales, Scotland and Northern Ireland although in dataset there were no observations
for Northern Ireland.

Another set of included variables is that related to health. I have incorporated variables that could
provide objective information on health situations. For example, variable about number of GP visits
may not be a good option because some of GP visits could happen because of individual’s perception
that he needs to see a doctor and not because of objective need. Dummy variables smoker and hllt that
refers to whether individual’s health limits daily activities, are included in the model. Additionally,
dummy variable for each year was included in the model to control for latent temporal changes in the
environment which are not controlled with chosen regressors. Table 1 contains definitions of all
variables in the model and table 2 presents summary statistics of variables used in analysis.

4. Graphical analysis

Graphical representation of the data helps us in understanding the relationship between health,
income and other characteristics that can impact health. Analysis is performed by gender. Figure 1
shows distribution of good and less than good health by waves for males and females. For both genders,
majority of population is in good health. Observing distribution over time we can see that after initial

5
wave, proportion of individuals in good health and less than good health has a slight upward trend with
minor changes. From wave 1 to wave 2, proportion of individuals in good health increased, while
proportion of those in less than good health decreased. Initial change is slightly higher for females.
Although at first results seem misleading, because health should deteriorate over time, chosen measure
is based on self-reported health status that is reported relative to representative individual of their own
age. Because it is based on perception there is a potential for measurement error. Observed sample is
not adjusted for attrition and Contoyannis et al. (2004), who have observed income-health relationship
on BHPS dataset, emphasise that attrition is negatively correlated with initial health and is highest for
individuals in the worst health. In addition, only 2 categories are observed so we cannot observe for
example how many individuals went from “excellent” to “good” health or from “fair” to “poor” health.
Further analysis will show more evidence on dynamics of health.

Figure 2 shows distribution of good and less than good health by income quintiles for both genders.
There is a positive relationship between income and health for both genders. We can observe income-
health gradient for both males and females. Moving from lower to higher income quintile, proportion
of individuals in good health, increases. Upward trend is slightly stronger for males compared to
females.

Education is important determinant of health, so figure 3 shows how health is distributed among
both genders for different educational levels. There is a clear positive relationship between education
and health for both genders. Individuals with higher level of educational attainment are in better health.

Figure 4 shows dynamics of health by age groups for males and females. Age groups are defined
by a 10 year difference, starting at the age of 20. In males, proportion of individuals in good health is
decreasing up to 6th age group and after stagnation in 7th group, rises slightly in 8th age group.
Concurrently, proportion of males in less than good heath follows an inverse trend. In females,
proportion of individuals in good health decreases up to 7th age group and after rises slightly in 8th
group. Proportion of females in less than good health is increasing until 7th group and then decreases in
the 8th age group. In 7th group there are more females that perceive their health less than good. Observing
these graphs, it is important to bear in mind that outcome is based on perceived health and censoring is
not taken into account.

5. Estimation of income-health relationship

5.1. Model specification

Relationship between health and income is estimated using pooled logit and fixed effects logit
model separately for both genders. Initially, OLS was performed but as some fitted probabilities were
not in required range from 0 to 1, logistic regression was chosen. Coefficients were also compared with
probit model and because sample size is big, values of coefficients in both models were quite similar.
Logistic regression was a better option because it enables us to perform logit fixed effects model which

6
is an appropriate method for modelling income-health relationship with panel data. Fixed effects model
enables to control for unobservable heterogeneity among individuals.

Following Wooldridge (2012) model is specified as follows:

Pr (%&ℎ()* = 1|.)* , 0) ) = Λ(.′)* 4 ∙ +0) ), 7 = 1,2, … , :, ; = 1,2, . . , =

where Λ(.) is logistic cumulative density function. i denotes individuals in the sample and t denotes
time period or waves. Probability that an individual will be in a good health is a function of time-varying
regressors and time-invariant regressors represented by 1 x K dimensional vector .)* that impacts
dependant variable. 0) is a time-invariant error component specific to individual. All variables, but sex
and ethnicity are time varying. Goal is to estimate vector of parameters β to explore how they impact
health, but of prime interest is coefficient by the income variable. Chosen regressors are assumed to be
exogenous.

Estimation strategy is to estimate first pooled logistic model by grouping all observations together
and analyse an available dataset as a set of cross-sectional observations. Under the assumption that error
terms are not correlated with explanatory variables, such approach could provide estimates that are not
biased and consistent. However, some imposed assumptions can be quite restrictive as it is assumed
that all characteristics that are not time variant are not correlated with observed characteristics. For
instance, that would imply that innate ability, motivation or some genetic characteristics are not
correlated with income, which does not hold in practice. To make a use of panel data, fixed effects logit
model was applied. Fixed effect is an appropriate method to control for unobserved heterogeneity of
individuals as it is allowed that individual effect is correlated with observed characteristics. When such
correlation is allowed estimator is consistent, but inefficient as it dispenses with degrees of freedom,
one for each individual in the sample (Jones et al., 2007, pg.213). Another disadvantage of fixed effects
approach is that it does not give estimates on coefficients that are time in-variant. In this case that would
be gender and ethnicity and therefore analysis is performed separately for males and females. Random
effects was also estimated because it was used in estimating similar relationship between health status
and socioeconomic characteristics on BHPS dataset in studies conducted by Jones and Rice (2004),
Contoyannis et al. (2004) and Jones et al. (2006). After computing random effects model, Hausman test
was applied to test whether random effects estimates are efficient and consistent. Results of Hausman
test are in appendix 3. 2 models were estimated for both genders. One model includes socioeconomic
and environmental characteristics, while other builds on first model by including also set of health
variables. Likelihood ratio test was computed after pooled and fixed effects logit model, to examine
whether larger model 2 is better than simpler model 1. Results are presented in appendix 2. Idea is to
see how income-health relationship changes when controlling for variables that could provide objective
information on health situations. Fixed and random effects model were estimated for individuals that
had 5 or more observations during observed period to obtain more credible results.

7
5.2. Estimation results

Table 3 summarises results of both model 1 and model 2 for males. Table 4 presents these results
for females. Marginal effects from pooled logit, coefficients of fixed effect and random effect logit
model with corresponding p-values are reported in the tables. Marginal effects from pooled logit are
predicted probability that a person will be in a good health depending on the change of regressor, while
others are kept constant at their values. In fixed and random effects model, coefficients give us
estimated effect that each regressor has on the log-odds ratio of dependant variable. These coefficients
are not directly comparable, but we can observe direction of the change. Partial effects from fixed
effects logit model cannot be estimates unless we put a specific value of 0) . Following Wooldridge
(2002, pg.492), because the distribution of 0) is unrestricted, >(0) ) is not necessarily zero and it is
very hard to know which value to put. Estimation of partial effects in this case would require specifying
distribution of 0) . Results from pooled model suggest that there is a positive relationship between
health and income for both genders, although the relationship is slightly stronger for females. In model
1 for males, if income increases by 1%, holding other variables constant, probability that a person will
be in a good health is expected to increase for 3.9% and for females, if income increases by 1%, holding
other things equal, probability that a person will be in a good health is expected to increases for 4.6%.
By inclusion of other variables that provide information on health situations the effect of income is
lower. In model 2 for males, increasing income by 1%, while holding other variables constant,
probability that a person will be in a good health is expected to increase for 2.8% and for females, if
income increases by 1%, holding other things equal, probability that a person will be in a good health
is expected to increases for 3.7%. Results from likelihood ratio test suggest that model 2 compared to
model 1 fits data better. In males, age has a slightly negative, while for females a slightly positive
impact on health. Although, it may seem to be not in accordance with theory, due to not adjustment,
for censoring that result is not worrisome. In pooled model we can also observe health-education
gradient because lower levels of educational attainment have stronger negative impact on health. Again
the effect of educational attainment is slightly stronger for females. Being single in males and married
in females has small negative effect on health, but that is significant only in model 1. Economic activity
is important, particularly being unemployed which may put significant physiological pressure on
individual and have detrimental effect on health. Race is also important for health, while results for
effect that particular region has on health are mixed. Including information on health, relationship
between health and income is lessened.

Results from fixed effects and random effects model include individual effect. In fixed effects
model 1 income is a still significant predictor of health, but in model 2 when other information on health
situations are included, income is not significant. LR test showed that model 2 is better in explaining
variation in dependant variable. By exponentiating coefficients, we get odds ratios that describe change
of variable gdhl for a unit change of chosen regressor, while holding other variables constant. For

8
instance, for males in model 1 by increasing income for 1% multiplies the odds of being in good health
by expB4C(DEF7DGH = exp(0.053) = 1.054, or increases them by 5.4%, holding all other variables
constant. For females in model 1, if income increases by 1%, the odds of being in good health are
multiplied by exp(4M (DEF7DG) = exp(0.054) = 1.055, or increased by 5.5%, holding all other variables
constant. For males in model 2, if income increases by 1% the odds of being in good health are
multiplied by expB4C(DEF7DGH = exp(0.047) = 1.048, or increased by 4.8%, holding all other variables
constant. Finally, for females in model 2, if income increases by 1%, the odds of being in good health
are multiplied by exp(4M (DEF7DG) = exp(0.052) = 1.053, or increased by 5.3%, holding all other
variables constant. Once individual heterogeneity is controlled, education is not significant in the fixed
effects models both for males and females. That can sound surprising, but our sample consist from
individuals older than 20 years who in majority already finished their education, so probably there is
not enough variation in the dataset to capture true effect of education. Relationship between health and
unemployment is significant in model 1, suggesting that stress caused by being unemployed can have a
major effect on perceived health status. Effect is stronger for females than for males. In model 2 being
smoker has negative, but insignificant impact on health. Variable that describes whether individual’s
health limits its daily activities is a strong predictor of health. It suggests that if a person’s health does
not limit its usual activities, they are significantly more likely to be in good health. Results from model
2 are surprising. Income coefficient is positive but insignificant, suggesting that also unobserved
heterogeneity like genetics, as well as other personal characteristics, can have a major effect on health
status. Year dummy coefficients are positive, probably reflecting inflation and similar changes in the
environment that occurred over time.

6. Discussion

This paper is an attempt to examine contemporaneous relationship between health and income on
BHPS teaching dataset. Results from fixed effects model are considered to be most reliable, as they
take into account individual effects. Random effects model was also reported and tested because it was
used in previous studies that have investigated the relationship between health and income, for example
those conducted by Jones and Rice (2004), Contoyannis et al. (2004) and Jones et al. (2006). Hausman
test clearly showed that coefficients from random effects are not appropriate. To be even more certain,
quadrature approximation check was performed in the random-effects model. The results differentiated

9
a lot as number of quadrature points increased, which provided additional confidence that random-
effects model is not appropriate.
Although the positive relationship between health and income is well established in the literature,
for example Benzeval and Judge (2001) or Buckley et al. (2004), results from fixed effects model 2 are
quite surprising. It appears when taking into account effect of an individual and objective information
of health situations, income is not important in explaining differences in health status. Despite the fact
that marginal effects of majority of chosen variables in the pooled model are significant, by including
individual effect majority of them are not significantly different from 0. Unobserved factors have
important effect on present health status. Sensitivity analysis on different threshold for defining good
health suggests that obtained results are more sensitive in males. For males in the fixed effects model
1, effect of income is positive, but not significant. In females results are consistent for both model 1 and
2.
Despite mixed results, some important messages can be extracted from this analysis. This analysis
used logit fixed effects model which was more appropriate to model panel data then random effects
approach. Previous studies, such as Jones and Rice (2004), Contoyannis et al. (2004) and Jones et al.
(2006), that have used random effects model in estimating similar relationship between health status
and socioeconomic characteristics on BHPS dataset, did not provide clear explanation why such
approach was used nor did they provide comparison to the fixed effects model. Carro and Traferri
(2009) criticised some of these papers and estimated income-health relationship with fixed effects on
BHPS dataset. After taking into account different sources of bias in the dataset, they found evidence
that relationship between health and income in positive and statistically significant.
This study has some important limitations that may impact final result. The analysis is conducted
on teaching dataset but in order to obtain more credible results, analysis should be performed on raw
dataset. Sample was not adjusted for attrition. Individuals who remain in the dataset tend to be healthier,
so not taking into account attrition, income coefficient is likely to be biased. Contoyannis et al. (2004,
pg.485) point that attrition was highest between waves 1 and 2, and attrition rates are highest for
individuals in poor health. Many individuals had a very small number of observations in the observed
period, so analysis was limited on individuals that had 5 or more observations in the sample. Individuals
that have decided to participate in the survey are different from those that dropped out, so conclusions
are applicable only to the sample analysed. In interpreting results it is important to emphasise, that these
results although internally valid for the sample studied do not have external validity. Results are not
generalizable to entire UK population. Analysis was focused on contemporaneous relationship between
health and income, but there are evidence, for instance Buckley et al. (2004) and Jones et al. (2006),
that suggest that permanent income is better determinant of health. Buckley et al. (2004) emphasise that
income-health relationship is more determined by cumulative effect of income in earlier periods of life
than effect based on current annual income. That would give better understanding of individual’s
standard of living in earlier periods of life that impact its health status. Model did not take into account

10
health status of individual from previous period, which may also be important for current health status.
Contoyannis et al. (2004, pg. 495) reported on gradient across the estimated effects of previous health
status as individuals move from previous health status of being very poor to excellent. Finally, chosen
variable for measuring health is based on self-reported health status which may be prone to
measurement error. Individuals may not be the best judges of their own health and this measure may
not capture health status accurately. Because self-reported health is not homogenous among individuals,
this can lead to misestimating income-health relationship. Some more objective measure of health is
necessary, but available dataset is limited in reporting on real health conditions of individuals. Variable
hlstat, which was used to construct variable gdhl, among other available variables in the dataset was
the best one to approximately capture health state of an individual.

7. Conclusion

Aim of this analysis was to investigate relationship between health and contemporaneous income.
Data from BHPS teaching dataset from 1991 to 2002 were used in the analysis. Pooled logit model
provided clear evidence that there is a positive and significant association between income and health
for both genders. Results change to some extent when individual effects are taken into account. In model
1 which included a basic set of socioeconomic characteristics like income, age, education, marital status,
household size and economic activity; income has positive and significant effect on health. By inclusion
of information on health situations, observed relationship becomes statistically insignificant.
Relationship between income and health is very complex. This analysis was focused only on
association between income and health because available dataset is not appropriate to address causality
in this relationship. In addition, it has some important limitations and reached conclusions are only
applicable to the sample studied. Due to restrictions imposed on the sample, this sample is not
representative for UK population and conclusions are not generalizable. In order to draw more reliable
conclusions, present limitations should be properly addressed. Also, analysis should be conducted on a
dataset that is more representative for the UK population so results could be externally valid. Studying
this relationship is very important because results may have important policy implications on design of
health care and public policies and subsequently on the allocation of available public budget.

8. References
1. Deaton, A. (2002) Policy implications of the gradient of health and wealth. Health Affairs, 21(2),
13-30. Retrieved from: http://content.healthaffairs.org/content/21/2/13.full.pdf+html
2. Evans, W., Wolfe, B., & Adler, N. (2012) The SES and health gradient: A brief review of the
literature. In B. Wolfe, W. Evans, & T. E. Seeman (Eds.), The biological consequences of
socioeconomic inequalities (pp. 1–37). New York: Russell Sage. Retrieved from
https://www.russellsage.org/sites/all/files/wolfe%20intro.pdf

11
3. Grossman, M. (1972) On the concept of health capital and the demand for health. Journal of
Political Economy, 80(1): 223–255. Retrieved from
http://www.jstor.org/discover/10.2307/1830580?sid=21106392708793&uid=2&uid=4
4. Grossman, M., L. Benham (1974) Health, hours and wages, in: M. Perlman (ed), The economics
of health and medical care. Macmillan and Co., London, 205-233.
5. Smith, J.P. (1999) Healthy bodies and thick wallets: the dual relation between health and economic
status. Journal of Economic Perspectives 13 (2), 145–166. Retrieved from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3697076/
6. Smith, J. P. (2004) Unravelling the SES health connection, IFS Working Papers, Institute for Fiscal
Studies (IFS), No. 04/02. Retrieved from
http://www.econstor.eu/bitstream/10419/71512/1/379753057.pdf.
7. Evans, R. (2002) Interpreting and addressing inequalities in health: from Black to Acheson to Blair
to . . . ? In: Seventh OHE Annual Lecture (updated and expanded version). Office of Health
Economics. Retrieved from
http://courseweb.edteched.uottawa.ca/pop8910/PDF%20Files/Evans_report_2002.pdf
8. Adams, P., Hurd, M. D., McFadden, D., Merrill, A., Ribeiro, T. (2003) Healthy, wealthy, and wise?
Tests for direct causal paths between health and socioeconomic status. Journal of Econometrics,
112(1), 3-56. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.199.6328&rep=rep1&type=pdf
9. Lindahl, M. (2005) Estimating the Effect of Income on Health and Mortality Using Lottery Prizes
as an Exogenous Source of Variation in Income. Journal of Human Resources 40(1): 144–68.
Retrieved from http://www.econstor.eu/bitstream/10419/21514/1/dp442.pdf
10. Frijters, P., Haisken-DeNew J. P., Shields M. A. (2005) The Causal Effect of Income on Health:
Evidence from German Reunification. Journal of Health Economics. 24(5): 997–1017 Retrieved
from https://ideas.repec.org/a/eee/jhecon/v24y2005i5p997-1017.html
11. Jones, A. M., Doorslaer, E. V., Bago d’Uva, T., Balia, S., Gambin, L., Quevedo, C. H., Koolman
X. Rice, N. (2006) Health and wealth: empirical findings and political consequences. Perspektiven
der Wirtschaftspolitik, 7(Supplement), 93-112. Retrieved from
http://www2.eur.nl/bmg/ecuity/public_papers/ECuity3wp32JonesHealthandWealth.pdf
12. Case, A., Lubotsky, D., Paxson, C. (2001) Economic status and health in childhood: The origins of
the gradient (No. w8344). National Bureau of Economic Research. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.401.6627&rep=rep1&type=pdf
13.

12
15. Freed Taylor, M., & Brice, J. (1995). British Household Panel Survey User Manual, Volume A:
Introduction, Technical Report and Appendices. Colchester: University of Essex. Retrieved from
http://discover.ukdataservice.ac.uk/Catalogue/?sn=5038&type=Data%20catalogue
16. Contoyannis, P., Jones, A. M., & Rice, N. (2004). The dynamics of health in the British Household
Panel Survey. Journal of Applied Econometrics, 19(4), 473-503. Retrieved from
http://onlinelibrary.wiley.com/doi/10.1002/jae.755/full
17. Benzeval, M., Judge, K. (2001) Income and health: the time dimension. Social Science and
Medicine 52 (9), 1371–1390. Retrieved from
https://www.iser.essex.ac.uk/files/conferences/bhps/2001/docs/pdf/papers/benzeval.pdf
18. Idler EL, Kasl SV. (1995) Self-ratings of health: do they also predict change in functional ability?
Journal of Gerontology 50B: S344–S353. Retrieved from
http://psychsocgerontology.oxfordjournals.org/content/50B/6/S344.short
19. Burström B, Fredlund P. (2001) Self-rated health: is it as good a predictor of subsequent mortality
among adults in lower as well as in higher social classes? Journal of Epidemiology and Community
Health 55: 836–840. Retrieved from http://jech.bmj.com/content/55/11/836.full
20. Buckley, N. J., Denton, F. T., Robb, A. L., & Spencer, B. G. (2004) The transition from good to
poor health: an econometric study of the older population. Journal of Health Economics, 23(5),
1013-1034. Retrieved from http://www.yorku.ca/nbuckley/papers/healthyagingjhe.pdf
21. O'Donnell, O. A., & Wagstaff, A. , Doorsaler, E., Lindelow, M. (2008). Analysing health equity
using household survey data: a guide to techniques and their implementation. World Bank
Publications.
22. Kuklys, W. (2005). Amartya Sen's capability approach: theoretical insights and empirical
applications. Springer Science & Business Media.
23. Lindley, J., & Lorgelly, P. (2005). The relative income hypothesis: does it exist over time? Evidence
from the BHPS. University of Sheffield. Retrieved from
http://eprints.whiterose.ac.uk/9923/1/SERP2005013.pdf
24. Jones, A. M., Rice, N., d'Uva, T. B., Balia, S. (2007). Applied health economics. Routledge.
25. Jones, A. M., Rice, N. (2003) Can economics contribute to an understanding of socioeconomic
inequality in health? Retrieved from www. idep-fr. org/IMG/docannexe/fichier/601/JonesRice. pdf.
26. Wooldridge, J. (2012). Introductory econometrics: A modern approach. Cengage Learning.
27. Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT press.
28. Carro, J. M., & Traferri, A. (2009). Correcting the bias in the estimation of a dynamic ordered
probit with fixed effects of self-assessed health status. (Working paper). Retrieved from
http://www.bu.edu/econ/files/2012/11/dp205.pdf

9. Tables

13
Table 1: Variable definitions

Variable Definition
gdhl “Good health” equals 1 if SAH is excellent or good, 0 otherwise.
ldgh “Less than good health” equals 1 if SAH is fair, poor or very poor, 0 otherwise.
eqinc Equivalized annual household income in pounds.
lneqinc Logarithmic transformation of EQINC.
sex 1 if individual is female, 0 if male.
age Age of individual at the date of interview.
age2 Age squared.
race 0 "White" 1 "Non-white"
educ 1 "Degree" 2 "HND/A" 3 "O/CSE" 4 "Noqual"
mlstat 1 "Married" 2 "Divorced/Separated" 3 "Widowed" 4 "Never married"
hhsize Household size.
econ 1 "Employed" 2 "Unemployed" 3 "Retired" 4 "Maternity leave" 5 "Family care"
region 1 "London" 2 "England without London" 3 "Wales" 4 "Scotland" 5 "Northen Ireland"
smoker 1 if individual is not a smoker, 0 otherwise.
hllt 1 if individual said that health does not limit his daily activities, 0 otherwise.

Table 2: Summary statistics

14
Variable Obs Mean Std. Dev. Min Max

gdhl 92410 .768261 .4219454 0 1


lgdhl 92410 .231739 .4219454 0 1
eqinc 69822 8915.212 7451.793 0 397319.8
lneqinc 69752 8.863684 .7002895 -.6526941 12.8925
sex 92410 .5418245 .4982503 0 1

age 70552 49.47167 16.68576 20 99


age2 70552 2725.857 1780.595 400 9801

qfedhi
2 66474 .3214941 .4670535 0 1
3 66474 .2576797 .43736 0 1

4 66474 .3205163 .4666786 0 1

mlstat
2 70524 .1000227 .3000324 0 1
3 70524 .0995122 .2993506 0 1
4 70524 .145213 .3523179 0 1

hhsize 70554 2.723191 1.308863 1 14

econ
2 67071 .0357979 .1857873 0 1
3 67071 .2424893 .4285919 0 1
4 67071 .0193675 .1378141 0 1

5 67071 .0910229 .2876439 0 1

1.race 92230 .0370812 .1889619 0 1

region
2 70484 .7601158 .4270157 0 1

3 70484 .0558425 .2296189 0 1


4 70484 .0886584 .2842521 0 1

1.smoker 68994 .73847 .4394711 0 1

1.hllt 70525 .8315916 .3742312 0 1

Table 3: Estimated regression results for model 1 and 2 for males

15
Model 1 Model 2
Pooled logit Pooled logit
Random Random
gdhl (marginal Fixed effects (marginal Fixed effects
effects effects
effects) effects)
lneqinc 0.039 (0.000) 0.053 (0.026) 0.174 (0.000) 0.028 (0.000) 0.047 (0.085) 0.169 (0.000)
age -0.003 (0.021) 0.223 (0.001) 0.032 (0.038) -0.002 (0.026) 0.184 (0.005) 0.011 (0.453)
age2 0.000 (0.032) -0.001 (0.000) 0.000 (0.003) 0.000 (0.011) -0.001 (0.000) 0.000 (0.226)
educ
2 -0.028 (0.000) -0.558 (0.074) -0.438 (0.002) -0.028 (0.000) -0.517 (0.108) -0.380 (0.003)
3 -0.046 (0.000) -0.771 (0.033) -0.607 (0.000) -0.034 (0.000) -0.700 (0.060) -0.477 (0.001)
4 -0.123 (0.000) -0.772 (0.053) -1.307 (0.000) -0.096 (0.000) -0.753 (0.066) -1.094 (0.000)
mlstat
2 -0.008 (0.427) 0.099 (0.478) -0.031 (0.790) 0.006 (0.512) 0.076 (0.594) 0.003 (0.979)
3 -0.001 (0.917) -0.198 (0.419) -0.037 (0.834) -0.006 (0.636) -0.226 (0.370) -0.087 (0.612)
4 -0.024 (0.006) -0.344 (0.034) -0.319 (0.005) -0.014 (0.081) -0.305 (0.065) -0.240 (0.024)
hhsize 0.014 (0.000) 0.003 (0.933) 0.050 (0.101) 0.014 (0.000) 0.004 (0.916) 0.064 (0.030)
econ
2 -0.067 (0.000) -0.232 (0.043) -0.447 (0.000) -0.024 (0.026) -0.151 (0.201) -0.292 (0.006)
3 -0.129 (0.000) -0.267 (0.038) -0.582 (0.000) -0.049 (0.000) -0.124 (0.357) -0.345 (0.002)
4 -0.181 (0.134) -1.188 (0.125) -1.652 (0.031) -0.129 (0.212) -1.102 (0.171) -1.459 (0.062)
5 -0.051 (0.093) -0.123 (0.668) -0.451 (0.085) -0.010 (0.671) 0.003 (0.993) -0.238 (0.373)
race -0.082 (0.000) - - -0.907 (0.001) -0.059 (0.000) - - -0.765 (0.001)
region
2 0.027 (0.003) -0.097 (0.736) 0.062 (0.667) 0.017 (0.044) -0.064 (0.827) 0.051 (0.698)
3 0.005 (0.711) 0.461 (0.424) -0.107 (0.643) 0.006 (0.661) 0.407 (0.489) -0.078 (0.711)
4 0.020 (0.112) -0.135 (0.873) -0.089 (0.670) 0.016 (0.177) -0.234 (0.789) -0.060 (0.751)
hhlt - - - - - - 0.357 (0.000) 1.610 (0.000) 2.388 (0.000)
smoker - - - - - - 0.048 (0.000) -0.198 (0.057) 0.267 (0.000)
dyear1 0.084 (0.000) 2.052 (0.001) 0.920 (0.000) 0.062 (0.000) 1.793 (0.004) 0.744 (0.000)
dyear2 0.069 (0.000) 1.782 (0.001) 0.768 (0.000) 0.051 (0.000) 1.544 (0.006) 0.620 (0.000)
dyear3 0.060 (0.000) 1.495 (0.003) 0.606 (0.000) 0.041 (0.000) 1.301 (0.009) 0.471 (0.000)
dyear4 0.060 (0.000) 1.380 (0.002) 0.619 (0.000) 0.041 (0.000) 1.195 (0.007) 0.489 (0.000)
dyear5 0.046 (0.000) 1.130 (0.003) 0.479 (0.000) 0.028 (0.012) 0.970 (0.010) 0.362 (0.000)
dyear6 0.040 (0.001) 0.922 (0.004) 0.388 (0.000) 0.025 (0.023) 0.795 (0.013) 0.297 (0.001)
dyear7 0.035 (0.004) 0.776 (0.003) 0.346 (0.000) 0.026 (0.020) 0.695 (0.008) 0.295 (0.001)
dyear8 0.019 (0.117) 0.512 (0.012) 0.192 (0.029) 0.012 (0.267) 0.449 (0.029) 0.157 (0.082)
dyear9 -0.005 (0.656) 0.056 (0.601) -0.046 (0.600) -0.008 (0.485) 0.036 (0.742) -0.063 (0.488)

Table 4: Estimated regression results for model 1 and 2 for females

16
Model 1 Model 2
Pooled logit Pooled logit
(marginal (marginal
gdhl effects) Fixed effects Random effects effects) Fixed effects Random effects
lneqinc 0.046 (0.000) 0.054 (0.027) 0.168 (0.000) 0.037 (0.000) 0.052 (0.056) 0.176 (0.000)
age 0.005 (0.000) 0.215 (0.000) 0.087 (0.000) 0.003 (0.001) 0.168 (0.001) 0.055 (0.000)
age2 0.000 (0.000) -0.002 (0.000) -0.001 (0.000) 0.000 (0.008) -0.001 (0.000) -0.001 (0.000)
educ
2 -0.040 (0.000) -0.607 (0.049) -0.470 (0.001) -0.021 (0.016) -0.625 (0.048) -0.345 (0.007)
3 -0.036 (0.000) -0.640 (0.053) -0.450 (0.002) -0.026 (0.002) -0.628 (0.064) -0.339 (0.009)
4 -0.142 (0.000) -0.615 (0.093) -1.296 (0.000) -0.109 (0.000) -0.635 (0.091) -1.072 (0.000)
mlstat
2 -0.024 (0.004) -0.034 (0.741) -0.132 (0.128) -0.008 (0.297) -0.032 (0.763) -0.077 (0.357)
3 -0.001 (0.861) 0.030 (0.830) -0.006 (0.953) -0.008 (0.311) -0.024 (0.868) -0.060 (0.538)
4 0.004 (0.647) -0.021 (0.886) -0.009 (0.930) 0.007 (0.406) -0.052 (0.792) 0.001 (0.989)
hhsize 0.013 (0.000) -0.017 (0.569) 0.026 (0.323) 0.011 (0.000) -0.012 (0.700) 0.035 (0.158)
econ
2 -0.115 (0.000) -0.409 (0.001) -0.563 (0.000) -0.066 (0.000) -0.339 (0.006) -0.440 (0.000)
3 -0.076 (0.000) -0.007 (0.942) -0.229 (0.005) -0.029 (0.000) 0.018 (0.848) -0.160 (0.049)
4 -0.047 (0.001) 0.016 (0.895) -0.164 (0.145) -0.019 (0.150) 0.033 (0.782) -0.122 (0.289)
5 -0.077 (0.000) -0.140 (0.051) -0.314 (0.000) -0.035 (0.000) -0.109 (0.137) -0.245 (0.000)
race -0.130 (0.000) 0.000 - -1.056 (0.000) -0.086 (0.000) - - -0.784 (0.204)
region
2 0.013 (0.132) 0.256 (0.227) 0.091 (0.458) 0.001 (0.929) 0.239 (0.319) 0.024 (0.000)
3 -0.033 (0.011) 0.913 (0.036) -0.201 (0.293) -0.016 (0.179) 1.054 (0.019) -0.130 (0.823)
4 0.001 (0.947) 0.607 (0.163) 0.025 (0.881) 0.005 (0.620) 0.627 (0.159) 0.041 (0.443)
hhlt - - - - - - 0.400 (0.000) 1.585 (0.000) 2.332 (0.782)
smoker - - - - - - 0.042 (0.000) -0.039 (0.067) 0.282 (0.000)
dyear1 0.088 (0.000) 1.519 (0.001) 0.769 (0.000) 0.063 (0.000) 1.252 (0.008) 0.602 (0.000)
dyear2 0.082 (0.000) 1.382 (0.001) 0.709 (0.000) 0.057 (0.000) 1.127 (0.016) 0.544 (0.000)
dyear3 0.068 (0.000) 1.155 (0.002) 0.567 (0.000) 0.040 (0.000) 0.923 (0.020) 0.404 (0.000)
dyear4 0.056 (0.000) 0.990 (0.003) 0.477 (0.000) 0.032 (0.000) 0.780 (0.018) 0.330 (0.000)
dyear5 0.050 (0.000) 0.864 (0.002) 0.430 (0.000) 0.028 (0.000) 0.682 (0.053) 0.303 (0.000)
dyear6 0.034 (0.002) 0.633 (0.008) 0.274 (0.000) 0.013 (0.001) 0.474 (0.053) 0.156 (0.038)
dyear7 0.029 (0.009) 0.503 (0.011) 0.223 (0.000) 0.015 (0.004) 0.390 (0.109) 0.146 (0.053)
dyear8 0.023 (0.042) 0.373 (0.017) 0.167 (0.029) 0.003 (0.748) 0.255 (0.159) 0.062 (0.409)
dyear9 -0.003 (0.760) 0.029 (0.733) -0.035 (0.629) -0.013 (0.210) -0.032 (0.716) -0.094 (0.216)

17
10. Figures

Males Females

.8
.8

Females

.6 .8
.6

Frequency
Frequency

.4 .6
.4

Frequency
.2 .4
.2

0 .2
0

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
mean of gdhl mean of lgdhl mean of gdhl mean of lgdhl

0
1 2 3 4 5
mean of gdhl mean of lgdhl

Figure 1: Distribution of good and less than


good health by waves for males and females

Males Figure 2: Distribution of good and less than good


health by income quintiles for males and females
.8
.6

Females
Frequency

.8
.4

.6
.2

Frequency
.4
0

1 2 3 4 5
mean of gdhl mean of lgdhl
.2

Figure 3: Distribution of good and less than


0

Degree HND/A O/CSE Noqual


mean of gdhl mean of lgdhl

Males

good health by education levels for males and


.8

females
.6
Frequency

Males Females
.4
.8

.8
.2
.6

.6
Frequency

Frequency
0

Degree HND/A O/CSE Noqual


.4

.4

mean of gdhl mean of lgdhl


.2

.2
0

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
mean of gdhl mean of lgdhl mean of gdhl mean of lgdhl

Figure 4: Distribution of good and less than good health by age groups for males and females

11. Appendix

18
Appendix 1 – Sensitivity analysis of chosen threshold for defining “good health”

Sensitivity analysis of chosen cut-off point for defining good health was performed. The goal
is to see how results change when definition of good health is changed. Variable gdhl2 defines good
health slightly different. It equals 1 when individual’s health status is “excellent”, “good” and “fair”
and 0 otherwise. Table 5 and 6 present results from fixed effects model for males and females.

Table 5: Fixed effects model 1 and 2 for males where dependant variable is gdhl2
Model 1 Model 2
gdhl Fixed effects Fixed effects
lneqinc 0.035 (0.094) 0.027 (0.090)
age 0.298 (0.001) 0.251 (0.024)
age2 -0.002 (0.000) -0.002 (0.000)
educ
2 -0.181 (0.810) 0.381 (0.641)
3 -0.730 (0.390) -0.194 (0.833)
4 -0.337 (0.712) 0.217 (0.824)
mlstat
2 -0.339 (0.177) -0.345 (0.188)
3 0.564 (0.149) 0.317 (0.429)
4 -0.772 (0.021) -0.639 (0.068)
hhsize -0.074 (0.236) -0.033 (0.624)
econ
2 -0.435 (0.019) -0.276 (0.170)
3 -0.294 (0.159) 0.038 (0.863)
4 -2.241 (0.032) -1.962 (0.159)
5 -0.202 (0.676) 0.262 (0.609)
region
2 -0.234 (0.678) -0.348 (0.545)
3 1.531 (0.236) 1.279 (0.347)
4 0.918 (0.558) 0.799 (0.637)
hhlt - - 1.915 (0.000)
smoker - - -0.663 (0.001)
dyear1 2.036 (0.018) 1.444 (0.170)
dyear2 1.876 (0.017) 1.435 (0.134)
dyear3 1.732 (0.014) 1.346 (0.116)
dyear4 1.212 (0.049) 0.868 (0.248)
dyear5 1.239 (0.020) 0.942 (0.146)
dyear6 0.930 (0.040) 0.740 (0.178)
dyear7 0.740 (0.048) 0.597 (0.185)
dyear8 0.204 (0.490) 0.071 (0.841)
dyear9 0.011 (0.949) -0.063 (0.743)

Table 6: Fixed effects model 1 and 2 for females where dependant variable is gdhl2

Model 1 Model 2
gdhl Fixed effects Fixed effects
lneqinc 0.050 (0.041) 0.061 (0.069)
age 0.541 (0.000) 0.551 (0.000)
age2 -0.002 (0.000) -0.002 (0.000)

19
educ
2 -0.295 (0.578) -0.396 (0.485)
3 -0.838 (0.144) -0.816 (0.180)
4 -0.527 (0.388) -0.629 (0.334)
mlstat
2 -0.237 (0.114) -0.282 (0.075)
3 0.346 (0.073) 0.259 (0.208)
4 0.338 (0.156) 0.302 (0.221)
hhsize -0.008 (0.860) 0.009 (0.852)
econ
2 -0.105 (0.048) 0.085 (0.051)
3 -0.240 (0.085) -0.090 (0.538)
4 -0.122 (0.498) -0.075 (0.689)
5 -0.270 (0.018) -0.171 (0.152)
region
2 0.046 (0.911) -0.103 (0.810)
3 0.413 (0.534) 0.480 (0.482)
4 0.089 (0.891) -0.044 (0.948)
hhlt - - 1.722 (0.000)
smoker - - -0.252 (0.085)
dyear1 4.132 (0.000) 4.068 (0.001)
dyear2 3.690 (0.000) 3.597 (0.001)
dyear3 3.414 (0.000) 3.321 (0.001)
dyear4 3.074 (0.000) 3.019 (0.000)
dyear5 2.604 (0.000) 2.545 (0.001)
dyear6 2.032 (0.001) 1.960 (0.001)
dyear7 1.608 (0.001) 1.581 (0.002)
dyear8 1.148 (0.002) 1.069 (0.005)
dyear9 0.271 (0.087) 0.189 (0.254)

Results from sensitivity analysis show that results are sensitive to different definitions of good
health in males, while in females results are comparable with original model. In males, coefficient of
income is positive, but not statistically significant from 0 in both models. In females in model 1
coefficient of income is significant and positive, but slightly lower than in the original model. In model
2 coefficient of income is not significant, as it is in the original model. Results emphasise that is
important to know what dependant variable describes in order to understand results properly.

Appendix 2 – Results of Likelihood ratio test

Likelihood ratio test was used for comparison of simpler model 1 and larger model 2 for pooled
and fixed effects logit model. Aim was to assess whether larger model is better in explaining variation
of dependant variable gdhl. Null hypothesis assumes that simpler model fits the data better, while
alternative hypothesis assumes that larger model is the better one.

Results for pooled model are presented below. Results for males are:
Likelihood-ratio test LR chi2(2) = 3043.87
(Assumption: m1 nested in m2) Prob > chi2 = 0.0000

20
Results for females are:
Likelihood-ratio test LR chi2(2) = 5396.13
(Assumption: m3 nested in m4) Prob > chi2 = 0.0000

Results for fixed effects logit model are presented below. Results for males are:
Likelihood-ratio test LR chi2(2) = 492.14
(Assumption: m5 nested in m6) Prob > chi2 = 0.0000

Results for females are:


Likelihood-ratio test LR chi2(2) = 847.66
(Assumption: m7 nested in m8) Prob > chi2 = 0.0000

For all test, p-value is lower than 0.001 which is a very high level of significance. That gives us strong
evidence to reject null hypothesis that model 1 is better is explaining variation in dependant variable
gdhl.

Appendix 3 – Results of Hausman test

Tables 7, 8, 9 and 10 present results from Hausman test for both model 1 and 2 for both genders.
Results clearly show that coefficients estimated from random effects model are biased. Null hypothesis
states that random effects estimator is efficient and consistent. When that would be the case, then there
should be no major differences between both estimators. Since the coefficient differ substantially, null
hypothesis is strongly rejected and random effects approach cannot be adopted.

Table 7: Hausman test for model 1 for males

21
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
FE0 RE0 Difference S.E.

lneqinc .0528047 .1737314 -.1209268 .0259299


age .2228172 .0323692 .190448 .0627293
age2 -.0012531 -.0004399 -.0008132 .0001607
2bn.educ -.557592 -.4378048 -.1197873 .27851
3.educ -.7714234 -.607158 -.1642654 .3268465
4.educ -.7719725 -1.307426 .5354531 .3665184
2bn.mlstat .0993338 -.0312955 .1306293 .0759478
3.mlstat -.1981438 -.0373049 -.1608389 .1683395
4.mlstat -.3442498 -.3192066 -.0250432 .1173479
hhsize .0029904 .049765 -.0467746 .0187949
2bn.econ -.2317176 -.4467024 .2149848 .0424807
3.econ -.2671118 -.5821987 .3150869 .0636181
4.econ -1.18825 -1.652276 .4640256 .11527
5.econ -.1229822 -.4514041 .328422 .1168578
2bn.region -.0965614 .0615364 -.1580978 .2486981
3.region .4605219 -.1069146 .5674366 .5271239
4.region -.1353205 -.0889184 -.0464021 .8238451
dyear1 2.052334 .9204346 1.131899 .610221
dyear2 1.781553 .7680802 1.013472 .5519719
dyear3 1.494985 .6055615 .8894234 .4916604
dyear4 1.38023 .6188998 .7613306 .4288988
dyear5 1.129769 .4794694 .6502998 .3663088
dyear6 .9216443 .3878077 .5338366 .3064244
dyear7 .7762285 .3460108 .4302177 .2457575
dyear8 .5116756 .1917739 .3199018 .1844057
dyear9 .0564892 -.0459588 .102448 .0628842

b = consistent under Ho and Ha; obtained from xtlogit


B = inconsistent under Ha, efficient under Ho; obtained from xtlogit

Test: Ho: difference in coefficients not systematic

chi2(25) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 125.36
Prob>chi2 = 0.0000

Table 8: Hausman test for model 2 for males


Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
FE0 RE0 Difference S.E.

lneqinc .047175 .1692466 -.1220716 .0287393


age .1836052 .0110441 .1725611 .0631879
age2 -.0009372 -.0001588 -.0007784 .0001755
2bn.educ -.5165432 -.3801381 -.1364051 .2945131
3.educ -.6995822 -.4770803 -.2225018 .3432372
4.educ -.7531374 -1.094245 .3411074 .3834148
2bn.mlstat .0762215 .0030303 .0731912 .0856497
3.mlstat -.2262148 -.0870582 -.1391565 .1851796
4.mlstat -.3049398 -.2395845 -.0653553 .1262185
hhsize .0038426 .0641424 -.0602998 .0211359
2bn.econ -.1509446 -.2920744 .1411298 .0497979
3.econ -.1237525 -.3451292 .2213767 .0737535
4.econ -1.102164 -1.459174 .3570103 .1961419
5.econ .002612 -.2383467 .2409587 .1262819
2bn.region -.0642369 .0507262 -.1149631 .263876
3.region .406559 -.0776747 .4842337 .5497272
4.region -.2335337 -.0597674 -.1737663 .8400411
hllt 1.610076 2.3879 -.7778243 .0174957
smoker -.1975092 .2665956 -.4641048 .0723115
dyear1 1.792569 .7439771 1.048592 .610594
dyear2 1.544141 .6196057 .9245356 .5521796
dyear3 1.30097 .4713035 .8296665 .4920642
dyear4 1.194918 .4894366 .7054814 .4291932
dyear5 .9697001 .3621236 .6075765 .366438
dyear6 .7952252 .2973844 .4978409 .3066427
dyear7 .6952927 .2950824 .4002103 .2457643
dyear8 .4488277 .1574515 .2913762 .1845888
dyear9 .0362323 -.0626385 .0988708 .062627

b = consistent under Ho and Ha; obtained from xtlogit


B = inconsistent under Ha, efficient under Ho; obtained from xtlogit

Test: Ho: difference in coefficients not systematic

chi2(27) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2680.49
Prob>chi2 = 0.0000

22
Table 9: Hausman test for model 1 for females
Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
FE0 RE0 Difference S.E.

lneqinc .0539864 .1682215 -.1142351 .0206419


age .2151884 .0870433 .1281451 .046784
age2 -.0015195 -.0009728 -.0005467 .0001244
2bn.educ -.6069644 -.4698591 -.1371052 .2725928
3.educ -.6403227 -.4502093 -.1901134 .2959968
4.educ -.615082 -1.295944 .6808619 .3321544
2bn.mlstat -.0341802 -.1320161 .0978359 .0561008
3.mlstat .0302538 -.0060601 .036314 .0968516
4.mlstat -.0210254 -.0090925 -.011933 .103246
hhsize -.0172947 .0256016 -.0428963 .0158733
2bn.econ -.408967 -.5629955 .1540284 .0201281
3.econ -.0066002 -.2289121 .2223119 .0405773
4.econ .0155267 -.164236 .1797627 .0325915
5.econ -.1399167 -.3138025 .1738858 .0295982
2bn.region .2557565 .0908191 .1649374 .2011801
3.region .9126439 -.2010952 1.113739 .392236
4.region .6070054 .0250756 .5819299 .4018939
dyear1 1.519134 .7692621 .7498723 .4559916
dyear2 1.381759 .7092475 .6725112 .4120228
dyear3 1.155004 .5673997 .5876041 .3671548
dyear4 .989655 .4767641 .5128909 .3196658
dyear5 .8636559 .4302275 .4334284 .2734335
dyear6 .6325542 .2740357 .3585185 .2288278
dyear7 .5033159 .2229323 .2803835 .1830785
dyear8 .3725505 .1668584 .2056921 .1373082
dyear9 .029396 -.0351863 .0645823 .0456436

b = consistent under Ho and Ha; obtained from xtlogit


B = inconsistent under Ha, efficient under Ho; obtained from xtlogit

Test: Ho: difference in coefficients not systematic

chi2(25) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 103.15
Prob>chi2 = 0.0000
(V_b-V_B is not positive definite)

Table 10: Hausman test for model 2 for females


Coefficients
(b) (B) (b-B) sqrt(diag(V_b-V_B))
FE0 RE0 Difference S.E.

lneqinc .051942 .1762872 -.1243453 .0239969


age .168338 .0547748 .1135632 .0480474
age2 -.0011163 -.0005532 -.000563 .0001389
2bn.educ -.6251594 -.344899 -.2802604 .28935
3.educ -.6281966 -.3394443 -.2887522 .3135348
4.educ -.6350467 -1.072344 .4372975 .3492509
2bn.mlstat -.0318589 -.0767966 .0449377 .0650815
3.mlstat -.0242949 -.0601812 .0358862 .1090832
4.mlstat -.0520087 .0013436 -.0533523 .1149614
hhsize -.0119834 .0353308 -.0473142 .0184087
2bn.econ -.3388099 -.440112 .1013021 .0287254
3.econ .0180071 -.1596464 .1776535 .0474642
4.econ .0332195 -.1221987 .1554182 .0383016
5.econ -.1092973 -.2452102 .1359129 .0352944
2bn.region .2386444 .0244345 .2142099 .2133794
3.region 1.053697 -.1299666 1.183664 .4174101
4.region .6273475 .0409521 .5863954 .4206252
hllt 1.584861 2.332085 -.7472245 .0141451
smoker -.0385881 .2816023 -.3201904 .0716723
dyear1 1.25247 .6024633 .6500067 .4648466
dyear2 1.126695 .5442285 .5824669 .419974
dyear3 .9227322 .4039466 .5187856 .3743003
dyear4 .7799192 .3304579 .4494613 .3258133
dyear5 .6822526 .3030124 .3792402 .2787977
dyear6 .4735011 .1561538 .3173472 .2333332
dyear7 .3901269 .1460596 .2440673 .1865647
dyear8 .2546178 .0620862 .1925316 .1400357
dyear9 -.0323601 -.0936488 .0612887 .046806

b = consistent under Ho and Ha; obtained from xtlogit


B = inconsistent under Ha, efficient under Ho; obtained from xtlogit

Test: Ho: difference in coefficients not systematic

chi2(27) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 3575.78
Prob>chi2 = 0.0000

23

You might also like