0% found this document useful (0 votes)

22 views

Analizing Missing Data

This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily.

Uploaded by

gfguevara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Analizing Missing Data

Uploaded by

gfguevara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Kaambwa et al.

BMC Research Notes 2012, 5:330

http://www.biomedcentral.com/1756-0500/5/330

RESEARCH ARTICLE Open Access

Do the methods used to analyse missing data

really matter? An examination of data from an
observational study of Intermediate Care patients
Billingsley Kaambwa1*, Stirling Bryan2 and Lucinda Billingham3,4

Abstract
Background: Missing data is a common statistical problem in healthcare datasets from populations of older
people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the
method for dealing with this missingness is not the best option—but is this always true? This paper explores what
happens when extra information that suggests that a particular mechanism is responsible for missing data is
disregarded and methods for dealing with the missing data are chosen arbitrarily.
Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and
published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for
dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing
data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming
missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences in
results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated
standard errors.
Results: Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and
MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than
3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in
all regression models also had the same direction of influence on costs. All three mechanisms of missingness were
shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data
did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions
with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively.
Conclusions: Arbitrary selection of methods to deal with missing data should be avoided. Using extra information
gathered during the data collection exercise about the cause of missingness to guide this selection would be more
appropriate.
Keywords: Missing data, Complete case analysis, Multiple imputation, Generalised linear model, Heckman selection,
Observational data

Background may result in a significant reduction in sample size lead-

Missing data is an unwanted reality in most evaluations ing to threats to external validity as a sample reduced in
of services for older people as it can lead to biased size may no longer be representative of the target popu-
results as well as threats to the generalisability and lation [3–5]. This is more problematic in circumstances
power of the results obtained from analysing such data where the likelihood of response is related to observed
[1, 2]. Even under the best of conditions, missing data characteristics. Certain forms of missingness can reduce
the statistical power of the analyses of the available data
* Correspondence: [email protected] and therefore compromise the internal validity of a
1
Health Economics Unit, Public Health Building, University of Birmingham,
Edgbaston, Birmingham B15 2TT, United Kingdom
study, which is more serious [3, 6, 7]. A situation that
Full list of author information is available at the end of the article can potentially lead to reduced statistical power is when
© 2012 Kaambwa et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 2 of 12
http://www.biomedcentral.com/1756-0500/5/330

the probability of response is associated with the values the absence of missing data being recovered and ana-
of the variable for which values are only partly observed, lysed, hypothesis testing can at best only rule out that
which is a possibility in a lot of cases [8]. missing data are MCAR with no way of confirming that
The three main mechanisms that lead to missing data data are actually MCAR [20]. Provided enough data has
are: missing completely at random (MCAR), missing at been collected, it therefore seems that, where missing
random (MAR) and missing not at random (MNAR). If data is irrecoverable, it is only the latter approach that
data are MAR or MCAR, they can also be referred to as will give some fairly credible indication about whether
“ignorable” data while those MNAR are “non-ignorable” data are MCAR, MAR or MNAR [8, 19].
[8]. Missing data are said to be ignorable if the para- This study explores what happens when extra informa-
meters that are used to model the missing data process tion that suggests that a particular mechanism is respon-
are not related to the parameters used to model the sible for missing data is disregarded and methods for
observed data while non-ignorability exists if there is a dealing with the missing data are chosen arbitrarily. A
systematic difference between responders and nonre- dataset from the largest evaluation of intermediate care
sponders even after accounting for all the observed data services done and published in the UK to date is used
[7, 9]. There are various methods that have been pro- [21]. Intermediate services (IC) are tailored to prevent
posed to deal with missing data with each of these meth- admission to acute care or long-term care and also aid
ods premised on a specific missing data mechanism [1, discharge from hospital for older people [21]. It is not
10, 11]. Croninger and Douglas [7] indicate that the usual practice for such extra information to be gathered
choice of method used for coping with missing data is as part of the data collection process in a evaluation
not crucial if there is not much missing data and/or the such as that for IC and the presence of this information
sample is big. This is because most methods will yield therefore presented a unique opportunity to empirically
similar results in such circumstances. But as the level of compare different methods for dealing with missing
missingness rises and/or the sample becomes smaller, data. As far as we are aware, this is the first time that
the choice of method becomes potentially more signifi- this sort of analysis has been done on a dataset of older
cant. In this paper, we do not provide a detailed discus- people in the UK. Using this dataset, which had missing
sion of the various methods that can be used to deal data on several variables, the factors that explain vari-
with missing data. Interested readers can see Fielding ation in costs per patient, change in EQ-5D from admis-
et al. [12] for such a discussion. In general though, sion to discharge (ΔEQ-5D) and change in the Barthel
complete case analysis (both listwise and pairwise dele- index from admission to discharge (ΔBarthel) of IC
tion) can be performed when data are MCAR [13]. patients were explored in a regression modelling frame-
Approaches for use when data are MAR include listwise work. These factors could be broadly divided into three
deletion, various imputation techniques, propensity ad- groups: IC episode characteristics, descriptors of IC ser-
justment strategy, raw maximum likelihood and expect- vices and descriptors of IC-related services. Three meth-
ation maximisation [1, 3, 6, 14]. When data are MNAR, ods incorporating techniques for dealing with missing
panel selection models, including the Heckman, and data were used: (1) generalised linear models (GLMs)
pattern-mixture approaches can be used [15–17]. and ordinary least squares (OLS) on complete cases (as-
Most times, the method chosen to deal with missing suming that missing data were MCAR), (2) GLM and
data is not based on concrete evidence of the mechan- OLS models on data obtained through multiple imput-
ism responsible for this missing data. It is consequently ation (MI) (assuming missing data were MAR) and (3)
difficult to assess the accuracy of such methods because Heckman selection models (assuming that missing data
the data are by definition ‘missing’ [12]. It is a recog- were MNAR). We were interested in examining the
nised fact that data often provide little or no information signs of coefficients as well as the sizes of both coeffi-
at all to help determine the correct mechanism behind cients and associated standard errors in the regression
missingness [3, 18]. In many scenarios, therefore, it is model results obtained.
difficult, or even impossible, to know what mechanism is
responsible for the missingness. Sometimes more than Methods
one mechanism may be responsible for different sets of Source of data
missing data within the same evaluation [7, 19]. This Data for this study were obtained from five anonymous
therefore means that choosing among these alternative case study sites in the UK which were part of the Na-
methods is not an easy task. tional Evaluation of the Costs and Outcomes of IC for
Curran et al. [19] suggest two approaches for deter- Older People (ICNET) [21]. These sites were ‘whole sys-
mining the missing data mechanism: (1) hypothesis test- tems’ of IC i.e. areas with a specific geographical bound-
ing and (2) collecting extra information, during the data ary. Quantitative data were collected by staff working for
collection process, about why missing data is missing. In the IC services according to protocols set out by the
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 3 of 12
http://www.biomedcentral.com/1756-0500/5/330

evaluation team. Service staff completed study pro specific EQ-5D valuation algorithm was used in order to
forma, with or on behalf of their patients, at the point of convert the EQ-5D health description into a valuation.
admission to the service, for all IC admissions over a EQ-5D scores have a range of −0.59 to 1: the maximum
defined period. They completed discharge questions on score of 1 represents perfect health and a score of 0
the day of discharge, transfer to another IC service or as represents death [28]. Scores less than 0 represent health
soon as possible following end of service provision. In states that are worse than death [28–30]. Its generic na-
addition, extra information on the reasons as to why ture makes it comparable across patient populations.
some data were missing was obtained from IC coordina- The Barthel Index (BI) is a non-utility based conven-
tors, ICNET researchers’ observations as well as from tional clinical scale of functional independence which
preliminary statistical analyses done on the ICNET data- has been recommended by the Royal College of Physi-
set [21–23]. Data were collected between January 2003 cians for routine use in the assessment of older people
and January 2004. Ethical approval was granted by the [31]. Its validity when used on a general population of
Trent Multicentre Research Ethics Committee. older people has also been shown [32] . To measure a
person’s level of functional independence, the BI uses 10
Missing data in the ICNET dataset items, with each item carrying different weights [33].
The variables that were collected in the ICNET dataset, Two items (bathing and grooming) are rated on a two-
based on a sample of 2,253 patients, are presented in point scale of 0 and 5, six (feeding, dressing, bowels,
Table 1. Up to 42% of the data were missing for some bladder, toilet use and stairs) on a three-point scale of 0,
variables in that dataset. Extra information about why 5 and 10 and the last two items (transfers and mobility)
data were missing were available for all dependent vari- are rated on a four-point scale of 0, 5, 10 and 15. The
ables (cost per patient, ΔEQ-5D and ΔBarthel) but not scores on each item are added to produce an overall
available for nearly all of the independent variables. Be- score which ranges from 0 to 100. To standardise them,
cause of this lack of information and for purposes of the overall scores used in this paper were divided by 5
comparing the methods for dealing with missing data, a and therefore ranged from 0 to 20 [34]. The higher the
decision was made to focus on missingness only in the score recorded for an item, the greater the level of inde-
dependent variables. Therefore, 1,536 out of 2,253 obser- pendence. The reliability, sensitivity and suitability for
vations were excluded from the analyses reported in this proxy-assessment of the BI has been shown elsewhere
paper due to missing values in the independent vari- [33–35].
ables. There was therefore no missing value for all inde-
pendent variables (and interaction terms generated using
these variables) used in the analyses. A flow chart show- Reasons for missing data in the ICNET dataset
ing how the samples used in the final regression models When data are MCAR, it implies that the probability of
were arrived at is shown in Figure 1. A sample of 717 an item missing is unrelated to any measured or un-
individuals was therefore used for the cost per patient measured characteristic for that unit [36], while under
models and 125 (17.4%) of these individuals had missing MAR, the probability of an item having incomplete data
observations on the cost variable. For the ΔEQ-5D and depends on other variables in the dataset [1]. MNAR is
ΔBarthel models, a sample of 1105 individuals was uti- when the probability of missingness depends on the
lised. Of this sample, 417 (37.7%) and 392 (35.5%) had values of the unobserved values perhaps in addition to
missing values on the ΔEQ-5D and ΔBarthel variables, one or more other variables and/or the observed vari-
respectively. ables [37].
Because of time constraints placed on the data collec-
The dependent variables tion process, it was not possible to collect all of the cost
The cost per patient variable was calculated by combin- data. No other reason was established as being respon-
ing resource data with budget information for the indi- sible for the missing cost data. This suggests that where
vidual IC services. cost data were missing, it would be reasonable to assume
The EQ-5D is an outcome measure whose construct that these data were MCAR.
validity when used on populations of older people has In terms of the missing data on the EQ-5D and Barthel,
been well documented [24–26]. It is comprised of five all three mechanisms (MCAR, MAR and MNAR) could
dimensions of health: mobility, self-care, usual activities, be assumed as the reason for this missingness.
pain/discomfort, and anxiety/depression. There are three Firstly, information obtained from the IC coordinators
levels of impairment in each domain: no, some/moder- about some of the missing EQ-5D and Barthel data indi-
ate, and extreme problems in the relevant dimension of cated that some services did not routinely collect this in-
health. Using these responses, the EQ-5D is able to dis- formation while some of the item non-responses were
tinguish between 243 states of health [27, 28]. The UK- ascribed to administrative errors [21]. This suggested
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 4 of 12
http://www.biomedcentral.com/1756-0500/5/330

Table 1 Variables for use in economic analysis (with level of completeness)

Variable Description Missing (%)
Episode Characteristics
Age Age on 01/01/03 3
Gender 1 = female , 0 = Male 2
Live alone 1 = Individual lives alone, 0 = Otherwise 9
Barthel – Start Barthel Score at start of IC episode 31
Barthel – End Barthel Score at end of IC episode 38
EQ5D – Start EQ-5D at start of IC episode 40
EQ5D – End EQ-5D at end of IC episode 41
Change in ED-5D Difference between EQ-5D score at end and at start of IC episode 42
Change in Barthel Difference between Barthel score at end and at start of IC episode 41
Cost Cost per patient 38
Descriptors of IC Services
Type of service required 3
Admission Avoidance service 1 = Acute Admission Avoidance service, 0 = Otherwise
Supported Discharge service 1 = Supported discharge service, 0 = Otherwise
Other Service 1 = Other IC Services, 0 = Otherwise
Type of IC 1 = Residential IC, 0 = Non-Residential IC 0
Outcome of IC episode 13
Transfer 1 = Transferred before end of IC episode, 0 = Other outcome
Complete 1 = Completed IC episode, 0 = Otherwise
Died 1 = Patient Died, 0 = Otherwise
Other Outcome 1 = Alternative Outcome, 0 = Other outcome
Stay Duration Duration of service provision (number of days) 17
Descriptors of IC related services
Source of referral 3
Referral – primary 0 = Otherwise, 1 = Primary Care
Referral – hospital 0 = Otherwise, 1 = Hospital
Referral – social 0 = Otherwise, 1 = Social Services
Referral – other 0 = Otherwise, 1 = Other Sources
Alternatives to IC services 18
Alternative – Home 0 = Else, 1 = Home
Alternative – Hospital 0 = Else, 1 = Hospital
Alternative – other 0 = Else 1 = Other alternative

that it was plausible to assume that the missingness patients transfer was more often than not linked to their
mechanism for such data was MCAR. health or functional status e.g. the more functionally in-
Secondly, the ΔEQ-5D and ΔBarthel scores were cal- dependent an individual was, the more likely they were
culated by subtracting the scores at admission from to be transferred to a less intensive form of IC. Add-
those at discharge. A number of individuals had however itional statistical analyses on the IC dataset [23] also
been transferred to other services before the end of their revealed that the Barthel scores were predictive of the
IC episode. For some of these, it meant that their EQ- missing EQ-5D values, further reinforcing the plausibil-
5D and Barthel scores at ‘discharge’ were not collected ity of the missing EQ-5D data being MAR.
making it impossible to compute the ΔEQ-5D and Thirdly, the mean Barthel scores for some individuals
ΔBarthel variables. This could be seen as a situation who had missing EQ-5D scores were on average lower
where the missing data were MAR as the reason for the than those for individuals who did not have missing EQ-
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 5 of 12
http://www.biomedcentral.com/1756-0500/5/330

2,253 obs/ patients in the ‘National evaluation of costs

and out comes of Intermediate Care’ dataset

COST MODEL OUTCOME MODELS

1,536 obs with missing 1,148 obs with missing data

data on any of the on any of the independent
independent variables variables used in the
used in the cost model outcome models excluded
excluded

717 obs of which 125 EQ-5D model: 1,105 Barthel model: 1,105
had missing data on obs of which 417 had obs of which 392 had
the cost variable only missing data on the missing data on the
EQ-5D variable only Barthel variable only

Complete MI used Selection Complete MI used Selection Complete MI used Selection

case to impute model case to model case to model
analysis missing assuming analysis impute assuming analysis impute assuming
based on costs for 125 obs based on EQ-5D 417 obs based on missing 392 obs
592 obs 125 to be 688 obs scores to be 713 obs Barthel to be
patients censored for 417 censored scores censored
patients for 392
patients

GLM on GLM on Heckman OLS OLS Heckman OLS OLS Heckman

592 obs 717 obs model on on on model on on on model on
717 obs, 688 1,105 1,105 obs, 713 1,105 1,105 obs,
of which obs obs of which obs obs of which
125 were 417 were 392 were
‘censored’ ‘censored’ ‘censored’

Assuming Assuming Assuming Assu- Assu- Assuming Assu- Assu- Assuming

MCAR MAR MNAR ming ming MNAR ming ming MNAR
MCAR MAR MCAR MAR

Figure 1 Flow chart showing the data used in the analyses. obs = observations; MI = multiple imputation; GLM = Generalised linear model; OLS
- Ordinary least squares; MCAR = missing completely at random; MAR = missing at random; MNAR = missing not at random.

5D information [22]. Since some individuals with missing the EQ-5D. By the same token, some of the missing
EQ-5D data were associated with lower Barthel scores, it Barthel data could have been MNAR.
means that, by virtue of the positive relationship between
the two instruments [23], there is a possibility that these Choice of regression families
individuals would also have had lower EQ-5D scores had In this exercise, it was important to compare both the
these been collected. It was therefore reasonable to as- signs and sizes of coefficients (and sizes of associated
sume that some of the missing data on the EQ-5D could standard errors) from the different regression models.
also have been MNAR i.e. the poorer ones’ health status Both costs per patient and outcome variables were skewed
was, the more difficult it was for them to provide data on and heteroscedastic in their residuals. We chose the GLM
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 6 of 12
http://www.biomedcentral.com/1756-0500/5/330

as it is able to simultaneously deal with both problems [38, Pairwise deletion, implying the use of all available data
39]. We also used log-transformation where the natural on the particular variables specified in each model, was
log of the dependent variable was obtained [40] as another the method used to arrive at the samples modelled as
method for dealing with the skewed cost data despite sev- complete cases, As a result, disparate sample sizes of
eral limitations associated with this approach [41, 42]. For 592, 688 and 713 observations for the cost per patient,
the cost models, therefore, a decision was made for the ΔEQ-5D and ΔBarthel models, respectively, were used
GLM to be used for both the complete cases and the (please see Figure 1).
multiply imputed datasets while a log transformed cost Method 2 involved running GLM and OLS regression
per patient was used in the Heckman regression model. models again to explain variation in costs per patient and
As the exponentiated coefficients from the GLM model outcomes (ΔEQ-5D and ΔBarthel), respectively. Here,
have been shown to be easily comparable to the exponen- however, we used multiply imputed datasets (assuming
tiated counterparts obtained from a log-transformed that data were MAR) based on a multivariate normal
model [43], the results of all the cost per patient models model. [1] Up to about 38% of the data were missing and
are presented in terms of exponentiated coefficients. multiply imputed datasets were created to account for
A different approach was taken for the health- these missing data before running GLM and OLS
outcome dependent variables (ΔEQ-5D and ΔBarthel). regression models. These analyses focussed on imputing
This was because these variables also had negative values for the dependent variables where the
values. As a result, log transformation of these variables independent variables were not missing thereby creating
would have required the use of a shift factor and the complete datasets i.e. 717 observations for the cost per
transformed variables would then have had to be appro- patient model and 1105 observations for the ΔEQ-5D
priately retransformed once the results of the model had and ΔBarthel models. The rationale for this particular
been obtained. However for ease of analysis and com- imputation was to allow for direct comparison between
parison, a decision was made to use the raw scale of the results obtained using this method and those
these variables. As a result, OLS regressions were used produced by method 3 (described below), which
for both the ΔEQ-5D and ΔBarthel in the regression on comparison required that essentially the same samples
complete cases and on multiply imputed datasets. Fur- were analysed. Five sets of imputations were created
ther, OLS regressions on a raw scale have also been following conventional practice [11]. Since there was up
widely used for modelling such outcome data in the lit- to 38% data missing, these imputations led to point
erature [44]. The raw scale of the two variables was also estimates that were at least (1 + 0.38/5)−1 = 93% as
used in the Heckman selection models. efficient as those based on m = ∞ imputations [1].
In method 3, Heckman selection models (assuming
Approaches for dealing with the missing data that missing data were MNAR) were run on the log of
For our two samples (n = 717 and n = 1,105) obtained ‘cost per patient’, on ΔEQ-5D and ΔBarthel using
from the ICNET dataset, three methods, each assuming ‘complete cases’. Whereas method 1 only considered
either MCAR, MAR or MNAR, were used. A regression cases where there was no missing data for both the
framework was employed in the analysis and in general, dependent variable and independent variables, method
the regression relationship between the outcomes of 3 considers all subjects including those that had
interest and the independent variables could be illu- missing cost, EQ-5D or Barthel information. The
strated as [45]: sample selection used a dummy variable equal to 1 if
Yi ¼ β0 þ β1i þ . . . þ βk þ Xki þ μi the dependent variable was not missing and equal to 0
if it was. Using this classification, 125 out of 717
where Yi denotes the outcome of interest (cost per pa- observations were censored (missing) for the cost per
tient, ΔEQ-5D or ΔBarthel) for the ith individual, βi . . . k patient model while 417 and 392, out of 1105
are the coefficients, Xi . . . k are the explanatory variables observations, were censored for the ΔEQ-5D and
(both single and interaction terms) for the ith individual ΔBarthel models, respectively.
and μi is the stochastic error term for the ith individual.
A total of six sets of regression models (two for each Multiple imputations were conducted in NORM [46]
method) were conducted: while the rest of the analyses were done in STATA ver-
sion 8.2 [47].
Method 1 involved running regression models on
complete cases (assuming that data were MCAR). A Results
GLM was used to explain variation in ‘cost per patient’ The results of the above analyses are presented in Table 2
while OLS models were run for cases where the for the costs per patient models and Tables 3 and 4 for
dependent variables were ΔEQ-5D and ΔBarthel. the ΔEQ-5D and ΔBarthel models, respectively.
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 7 of 12
http://www.biomedcentral.com/1756-0500/5/330

Table 2 Comparison of results from three methods of regression analysis of costs per patient
GLM on complete GLM on MI Heckman on complete cases
cases n = 592 [1]a dataset n = 717 [2]b n = 717, 125 obs censored [3]c
Variables Exp (Coeff) S.E. Exp (Coeff) S.E. Exp (Coeff) S.E.
Episode Age in 2003 0.996 0.003 0.997 0.002 1.000 0.012
Characteristics
Gender 0.982 0.063 1.009 0.060 1.085 0.281
Lives alone 1.052 0.059 1.047 0.056 1.106 0.275
Barthel score at admission 0.973 0.009** 0.984 0.008* 0.884 0.061*
EQ5D score at admission 0.973 0.090 0.935 0.087 1.400 0.435
Descriptors of Acute Admission Avoidance Service 0.930 0.129 0.812 0.092* 6.723 0.960*
IC Service
Type of IC 3.181 0.079** 3.150 0.070** 5.146 1.274
Transferred before end of IC episode 1.144 0.310 1.259 0.258 1.316 1.422
Completed IC episode 2.094 0.300* 2.396 0.248** 4.611 1.318
Other IC Outcome 2.703 0.337** 2.796 0.287** 4.374 1.475
Patient Died (Reference. Group)
Descriptorsof Referral – Primary 0.777 0.123* 0.764 0.121* 0.936 0.576
IC-related Services
Referral – Hospital 0.914 0.158 0.777 0.134 4.523 0.930
Referral – Other 1.001 0.212 0.935 0.195 2.240 0.984
Referral – Social Workers (Reference Group)
Alternative to IC – Other 1.053 0.079 1.058 0.077 0.508 0.451
Alternative to IC – Home 1.121 0.074 1.058 0.070 1.112 0.329
Alternative to IC – Hospital (Reference Group)
Interactions Barthel score at admission*Type of IC 1.031 0.018 1.017 0.097 1.131 0.092
Acute Admission Avoidance Service* Type of IC 1.214 0.163 1.217 0.136 0.579 0.752
Transfer before IC end*Type of IC 1.145 0.185 1.176 0.169 1.145 0.825
Completed Episode*Type of IC 1.152 0.195 1.112 0.162 0.240 0.952
Other IC Outcome*Type of IC 0.717 0.708 0.583 0.534 0.773 2.846
Patient died*Type of IC (Reference group)
_constant 1140.3 0.421** 951.5 0.360** 345.3 1.866**
N 592 717 717
Censored obs 125
R-Squared 0.359 0.634
Rho 0.950
* 5% level of significance, ** 1% level of significance; Dependent variable: cost per patient for GLM and log of cost per patient for Heckman Selection model;
IC = Intermediate care;
a
Method 1 assumes that missing data are MCAR;
b
Method 2 assumes that missing data are MAR; cMethod 3 assumes that missing data are MNAR.

Cost per patient models which was significant in model (2) only. Also, the size of
The results of the GLM regression model on complete coefficients for nearly all of these variables differed by
cases (method 1) and GLM regression model on multi- less than 3.4% except the one for ‘completed IC episode’
ply imputed datasets (method 2) are similar. As shown which differed by about 14.4%. The sizes of the standard
in Table 2, significant predictors of cost per patient were errors were also similar. Further, the variables significant
the Barthel score at admission, IC function (acute ad- in both models had the same direction of influence on
mission avoidance service or not), type of IC (residential costs per patient. On the other hand, the results
or not), if one completed an IC episode, other IC out- obtained from the Heckman selection regression model
come and if the source of referral was primary care. All (method 3) were much more different. A lot more vari-
of the variables that were found to be significant in ables were found to be insignificant with only two
method (2) were also significant in method (1) with the variables (Barthel score at admission and acute admis-
exception of one (acute admission avoidance service) sion avoidance service) shown to significantly influence
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 8 of 12
http://www.biomedcentral.com/1756-0500/5/330

Table 3 Comparison of results from three methods of regression analysis (Change in EQ5D)
OLS on complete cases OLS on MI dataset Heckman on complete cases
n = 688 [1]a n = 1105 cases [2]b n = 1105, 417 obs censored [3]c
Variables Coeff S.E. Coeff S.E. Coeff S.E.
Episode Age in 2003 0.000 0.001 0.000 0.001 0.000 0.001
Characteristics
Gender 0.046 0.022* 0.051 0.018** 0.054 0.024*
Lives alone 0.020 0.020 0.015 0.017 0.029 0.023
Barthel score at admission 0.017 0.003** 0.017 0.002** 0.016 0.003**
EQ5D score at admission −0.495 0.033** −0.479 0.026** −0.484 0.037**
Descriptors of Acute Admission Avoidance Service −0.038 0.027 −0.017 0.021 0.156 0.042**
IC Service
Duration of Service Provision 0.000 0.000 0.001 0.000* 0.000 0.000
Descriptors of Referral – Primary −0.031 0.052 −0.044 0.043 −0.020 0.058
IC-related Services
Referral – Hospital −0.098 0.051 −0.053 0.042 0.020 0.059
Referral – Other −0.003 0.078 0.059 0.065 0.013 0.086
Referral – Social Workers (Reference Group)
Alternative to IC – Other −0.063 0.031* −0.077 0.025** −0.077 0.030*
Alternative to IC – Home −0.045 0.023* −0.028 0.019 −0.046 0.022*
Alternative to IC – Hospital (Reference Group)
Interactions Gender*Type of IC −0.048 0.053 −0.027 0.037 −0.057 0.053
Barthel score at admission*Type of IC 0.003 0.004 −0.002 0.003 0.004 0.004
EQ5D score at admission *Type of IC −0.098 0.083 0.061 0.057 −0.118 0.082
Acute Admission Avoidance Service*Type of IC 0.110 0.064 0.039 0.039 0.086 0.063
Alternative to IC – Other *Type of IC 0.137 0.084 0.133 0.059* 0.140 0.082
Alternative to IC – Home*Type of IC 0.086 0.106 −0.027 0.049 0.070 0.104
Alternative to IC – Hospital *Type of IC
(Reference Group)
_constant 0.157 0.101 0.093 0.084 0.100 0.105
N 688 1,105 688
Censored obs 417
R-Squared 0.284 0.266 0.634
Rho 0.950
* 5% level of significance, ** 1% level of significance;
Dependent variable: change in EQ-5D, IC = Intermediate care
a
Method 1 assumes that missing data are MCAR; bMethod 2 assumes that missing data are MAR; cMethod 3 assumes that missing data are MNAR.

costs per patient. The sizes of the coefficients in the provision and likely alternatives were IC not available
Heckman model were also different from those of (home and other alternative). Nearly all of the variables
the other two methods. For instance, the coefficient for that were significant in one model were also significant
‘acute admission avoidance service’ was about 730 times in the other models. The only exceptions were the ‘dur-
bigger than that obtained in method (2). The mills ratios ation of service provision’ and ‘Alternative to IC-Other
were −3.402 and −4.506 for the Heckman selection *Type of IC’ (both only significant in method 2), ‘acute
models with and without interactions, respectively. admission avoidance service’ (only significant in method
These were both statistically significant at 95% level 3) and ‘alternative to IC-Other’ (significant only in mod-
of significance. els 1 and 3). The sizes of the coefficients of variables
commonly significant in all models differed at most by
Change in EQ-5D models about 22% with the standard errors differing at most by
Here, the results from all three models/methods were 42% (Table 3). Further, the variables significant in all
broadly similar (Table 3). Significant predictors of ΔEQ- three models had the same direction of influence on the
5D were gender, Barthel score at admission, EQ-5D change in EQ-5D. The mills ratios were −0.284 and
score at admission, IC function, duration of service −0.143 for the Heckman selection models with and
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 9 of 12
http://www.biomedcentral.com/1756-0500/5/330

Table 4 Comparison of results from three methods of regression analysis (Change in Barthel)
OLS on complete cases OLS on MI dataset Heckman on complete cases
n = 712 [1]a n = 1105 cases [2]b n = 1105, 392 obs censored [3]c
Variables Coeff S.E. Coeff S.E. Coeff S.E.
Episode Age in 2003 −0.010 0.009 −0.009 0.007 −0.011 0.009
Characteristics
Gender −0.007 0.208 0.097 0.164 0.037 0.218
Lives alone 0.225 0.190 0.181 0.150 0.320 0.202
Barthel score at admission −0.318 0.028** −0.325 0.022** −0.305 0.030**
EQ5D score at admission −0.343 0.312 −0.428 0.239 −0.216 0.328
Descriptors of Acute Admission Avoidance Service 0.103 0.218 0.060 0.167 0.728 0.337*
IC Service
Duration of Service Provision 0.008 0.003* 0.011 0.003** 0.006 0.003*
Descriptors of Transfer before IC end 4.084 2.452 0.559 0.607 2.713 2.348
IC-related Services
Completed Episode 7.438 2.440** 3.443 0.587** 4.926 2.478*
Other IC Outcome 6.640 2.477** 2.921 0.656** 4.727 2.432
Patient died (Reference group)
Alternative to IC – Other −1.130 0.291** −1.076 0.221** −1.267 0.291**
Alternative to IC – Home −0.709 0.223** −0.667 0.169** −0.669 0.219**
Alternative to IC – Hospital (Reference Group)
Interactions Barthel score at admission*Type of IC −0.071 0.050 −0.035 0.027 −0.072 0.051
Acute Admission Avoidance Service* Type of IC 0.592 0.575 0.131 0.354 0.599 0.589
Duration of Service Provision*Type of IC −0.006 0.008 0.001 0.006 −0.006 0.008
Transfer before IC end*Type of IC −0.299 0.979 −0.160 0.411 −0.300 0.962
Completed Episode*Type of IC 1.053 0.830 0.374 0.424 1.055 0.816
Other IC Outcome*Type of IC 0.189 2.000 0.072 0.889 0.200 1.980
Patient died*Type of IC (Reference group)
Alternative to IC – Other *Type of IC 0.796 0.793 0.968 0.543 0.795 0.778
Alternative to IC – Home*Type of IC 2.261 1.025* 0.124 0.447 2.261 1.006*
Alternative to IC – Hospital *Type of IC
(Reference Group)
_constant 0.046 2.536 3.888 0.843 2.687 2.592
N 713 1,105 713
Censored obs 392
R-Squared 0.278 0.634
Rho 0.950
*5% level of significance, **1% level of significance;
Dependent variable: change in Barthel, IC = Intermediate care;
a
Method 1 assumes that missing data are MCAR; bMethod 2 assumes that missing data are MAR; cMethod 3 assumes that missing data are MNAR.

without interactions, respectively. These were both sta- variables that were significant in one model were also
tistically significant at 95% level of significance. significant in the other models with the exception of
‘acute admission avoidance service’ and ‘Alternative to
Change in Barthel models IC—Home*Type of IC’ (only significant in method 3)
As in the ‘change in EQ-5D’ models, the results obtained and ‘Other IC Outcome’ variable only significant in both
from all three models/methods for the change in Barthel method (1) and method (2). However, the differences in
were broadly similar (Table 4). Significant predictors of terms of the sizes of coefficients and standard errors of
ΔBarthel were the Barthel score at admission, IC func- variables significant in all methods were slightly bigger
tion, outcome of IC episode (completed and other), in these models than in the ‘change in EQ-5D’ models.
likely alternatives were IC not available (home and They differed at most by about 54% and 322% for coeffi-
other) and an interaction term between likely alterna- cients and standard errors, respectively. The variables
tives were IC not available and type of IC. All of the significant in all three models had the same direction of
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 10 of 12
http://www.biomedcentral.com/1756-0500/5/330

influence on the change in Barthel. The mills ratios were so as to assess how robust results are across different
−1.662 and −0.101 for the Heckman selection models analytic approaches [53].
with and without interactions, respectively. These were All three mechanisms of missingness were shown to
both statistically significant at 95% level of significance. be potential causes of the missing EQ-5D and Barthel
data. When observations in the dependent variable are
Discussion MAR while the independent variables are complete, Lit-
The ICNET dataset had up to 42% and 38% of the data tle [54] posits that the incomplete cases contribute no
on EQ-5D and Barthel scores, respectively, missing while information to the regression where such a dependent
31% of the sample had missing cost data. In terms of variable is modelled. While some, as a consequence,
other variables in the dataset, all but one (type of IC) have deleted cases with missing values on the dependent
had missing data ranging from 3 to 18%. This situation variable, which approach effectively reduces to a
is common to a vast number of health service research complete case (regression) analysis [55], others have
datasets for older people. If these missing data are sim- used imputed values of the dependent variable in subse-
ply ignored, then there is a chance that biased and quent regression analyses [16]. In this study, we did both
underpowered results may be obtained [1, 48]. The most despite the fact that we did not have outcome data that
appropriate method of dealing with this amount of miss- were purely MAR. The results from the ΔEQ-5D and
ingness therefore had to be determined [19, 49]. The ΔBarthel models show that the choice of mechanism did
results of this analysis have shown that, in determining the not have a very significant effect on the results. Despite
methods to deal with missing data, using extra informa- the sizes of the coefficients and standard errors being
tion gathered during the data collection exercise about the somewhat different, the results from all three methods
cause of missingness, rather than the arbitrary selection of were broadly comparable in that similar conclusions
such methods, is more appropriate. There is however need could have been reached on the back of running the
to carry out similar analyses in datasets based on indivi- models. A possible explanation for this may have been
duals with different characteristics in order to discount the fact that the reason for missing data could be
the effect that attributes specific to this dataset, such as ascribed to any one of the three mechanisms of missing-
the age of respondents, may have had on these results. ness or indeed a combination of these mechanisms. Cro-
The evidence gathered concerning the missing cost ninger and Douglas [7] also assert that MCAR and
data strongly suggested MCAR as the reason for this MAR-based methods are relatively robust if the sample
missingness. When MAR-based methods were used for size is modestly large even when missing data are
these data, the results obtained were not significantly MNAR. While the extra information gathered during the
different from those based on the MCAR assumption. data collection process supported the assertion that
These findings seem to bear out the position held by the missing data were either MCAR, MAR or MNAR,
Schafer and Graham [50] and David et al. [51] that in the significant mills ratios lent additional support to the
many realistic applications, departures from MAR are MNAR assumption as its significance in the selection
not big enough to effectively invalidate the results of an models indicated the presence of significant selection
MAR-based analysis. A similar position was arrived at bias. However, selection models, even though identifi-
by Foster and Fang [8]who found that estimates based able, should be treated with caution especially when data
on listwise deletion (assuming MCAR) and those based are possibly not MNAR [56].
on MI and ignorable maximum likelihood estimation In this study, there were limitations in terms of accur-
(both assuming MAR) were comparable. The use of an ately determining the reasons for the missing data as this
MNAR-based method in the costs per patient model determination relied on the views of IC coordinators,
yielded results that were so different to those obtained investigators’ observations and some statistical analyses
when either MCAR or MAR were assumed. In particu- carried out on the ICNET dataset. Determinations based
lar, fewer significant variables were obtained in the on this extra information were not definitive. Further,
MNAR-based method while, similar to the study by Fos- this information was only available for dependent vari-
ter and Fang [8], the sizes of the coefficients were larger. ables. A more formal way of collecting this extra infor-
Therefore, different conclusions could potentially be mation may include adding questions, within the main
reached if the MNAR assumption was made for the data collection instrument, about why these data are
missing cost data. Care must therefore be taken not to missing and this should be done for both dependent and
apply MNAR-based methods when it is not absolutely independent variables. A critical evaluation of the
clear that the missing data are MNAR as MNAR responses to these questions will help inform the process
approaches often require assumptions that cannot be of identifying the missingness mechanism. In the ab-
validated from the data at hand [52]. MNAR-based sence of hypothesis testing, however, this extra informa-
approaches are best implemented as sensitivity analyses tion provided the best insights into why the missing data
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 11 of 12
http://www.biomedcentral.com/1756-0500/5/330

were not collected. In addition, the exclusion of missing research funding for BK is provided through UK Department of health grants,
observations in the independent variable may have LB is supported by Cancer Research UK and Medical Research Council (grant
number G0800808). The funders were not involved in the study design, in
altered the missing data mechanisms. This however the writing of the manuscript or in the decision to submit the manuscript
would mainly apply to cases where data were MAR. As for publication.
the MAR mechanism was premised mainly on EQ-5D
Author details
and Barthel for which the missing observations were 1
Health Economics Unit, Public Health Building, University of Birmingham,
kept as low as possible, the probability of alterations in Edgbaston, Birmingham B15 2TT, United Kingdom. 2Centre for Clinical
the missingness mechanisms was minimised. Finally, the Epidemiology and Evaluation, University of British Columbia, Research
Pavilion 702-828 West 10th Ave, Vancouver, Canada. 3Cancer Research UK
use of untransformed OLS models for ΔEQ-5D and Clinical Trials Unit (CRCTU), University of Birmingham, Edgbaston,
ΔBarthel in the presence of the skewed nature of the Birmingham, B15 2TT, United Kingdom. 4MRC Midland Hub for Trials
two variables could have potentially led to biased results. Methodology Research, University of Birmingham, Edgbaston, Birmingham
B15 2TT, United Kingdom.
Tests of skewness performed on the variables have how-
ever showed low level of skewness (p values from the
Authors’ contributions
Shapiro-Wilk test for ΔEQ-5D and ΔBarthel were 0.047 BK undertook the econometric analyses and wrote the first draft of the
and 0.042, respectively) implying that any bias resulting paper. Subsequent drafts were contributed to by SB and LB who have
from the use of untransformed OLS models would also approved the final version. BK will act as guarantor. All authors read and
approved the final manuscript.
be minimal.
Received: 10 April 2012 Accepted: 27 June 2012
Conclusions Published: 27 June 2012
Many studies have emphasised the importance of deter-
mining the mechanism behind missing data before de- References
ciding on the technique to use [19, 49, 57]. This paper 1. Schafer JL: Analysis of Incomplete Multivariate Data. London: Chapman &
Hall; 1997.
considered three different mechanisms that may be re- 2. Biglan A, Severson H, Ary D, Faller C, Gallison C, Thompson R, et al: Do
sponsible for missing data and then discussed smoking prevention programs really work? Attrition and the internal
approaches that can be used to deal with the missing and external validity of an evaluation of a refusal skills training program.
J Behav Med 1987, 10:159–171.
data. The results from this analysis suggest that the 3. Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: John
methods used to analyse missing data really do matter Wiley & Sons; 1987.
especially when one is considering whether or not to use 4. Dow MM, Anthon Eff E: Multiple Imputation of Missing Data in Cross-
Cultural Samples. Cross-Cultural Research 2009, 43:206–229.
MNAR-based methods. Dealing with missing data is not 5. Barry AE: How attrition impacts the internal and external validity of
easy especially as the hypothesis-based techniques for longitudinal research. J Sch Health 2005, 75:267–270.
detecting the pattern of missingness are limited in that 6. Little RJA, Rubin DB: Statistical Analysis with Missing Data. New York: John
Wiley; 1987.
they can only be used to rule out MCAR but can not 7. Croninger RG, Douglas KM: Missing Data and Institutional Research. In
confirm this mechanism. Further, there are no Survey research. Emerging issues. New directions for institutional research #127.
hypothesis-test-based techniques available for determin- Edited by Umbach PD. San Fransisco: Jossey-Bass; 2005:33–50.
8. Foster EM, Fang GY: Alternative methods for handling attrition: an
ing if data are MAR or MNAR in cases where the miss- illustration using data from the Fast Track evaluation. Eval Rev 2004,
ing data are irrecoverable. This therefore means that 28:434–464.
there should not be any arbitrary selection of assump- 9. Kmetic A, Joseph L, Berger C, Tenenhouse A: Multiple imputation to
account for missing data in a survey: estimating the prevalence of
tions behind data missing mechanisms and using extra osteoporosis. Epidemiology 2002, 13:437–444.
information gathered during the data collection exercise 10. Allison P: Missing data. Thousand Oaks, CA: Sage; 2000.
about the cause of missingness to guide this selection 11. Schafer JL: Multiple imputation: a primer. Stat Methods Med Res 1999,
8:3–15.
would be more appropriate. In the absence of this extra 12. Fielding S, Fayers P, Ramsay C: Predicting missing quality of life data that
information, then one of the MAR-based methods could were later recovered: an empirical comparison of approaches. Clin Trials
be considered as these were shown in this study and 2010, 7:333–342.
13. Raymond MR, Roberts DM: A comparison of methods for treating
elsewhere to be robust for use even in cases where data incomplete data in selection research. Educational and
are strictly not MAR. PsychologicalMeasurement 1987, 47:13–26.
14. Allison PD: Multiple imputation for missing data: a cautionary tale.
Competing interests Sociological methods and Research 2000, 28:301–309.
The authors declare that they have no competing interests. 15. Hedeker D, Gibbons RD: Application of random-effects pattern-mixture
models for missing data in longitudinal studies. Psychological Methods
Acknowledgments 1997, 2:64–78.
We are grateful to colleagues from the Universities of Birmingham and 16. Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data
Leicester who participated in the National Evaluation of Intermediate Care problems: A data analyst’s perspective. Multivariate Behavioral Research
Services from which data used in this study were obtained. We are also 1998, 33:545–571.
thankful to the intermediate care-coordinators and the staff from the case- 17. Heckman JJ: The common structure of statistical models of truncation,
study sites that provided the quantitative data and clarified follow-up sample selection and limited dependent variables and a simple
questions. The National Evaluation was funded by the Department of Health estimator for such models. Annals of Economic and Social Measurement
(Policy Research Programme) and the Medical Research Council. General 1976, 5:475–492.
Kaambwa et al. BMC Research Notes 2012, 5:330 Page 12 of 12
http://www.biomedcentral.com/1756-0500/5/330

18. Heitjan DF: Annotation: what can be done about missing data? 48. Roderick P, Low J, Day R, Peasgood T, Mullee MA, Turnbull JC, et al: Stroke
Approaches to imputation. Am J Public Health 1997, 87:548–550. rehabilitation after hospital discharge: a randomized trial comparing
19. Curran D, Bacchi M, Schmitz SF, Molenberghs G, Sylvester RJ: Identifying domiciliary and day-hospital care. Age Ageing 2001, 30:303–310.
the types of missingness in quality of life data from clinical trials. Stat 49. Cohen J, Cohen P: Applied multiple regression/correlation analysis for the
Med 1998, 17:739–756. behavioral sciences. 2nd edition. Hillsdale, NJ: Erlbaum; 1983.
20. McKnight PE, McKnight KM, Sidani S, Figueredo AJ: Missing Data: A Gentle 50. Schafer JL, Graham JW: Missing data: our view of the state of the art.
Introduction. New York: The Gilford Press; 2007. Psychological Methods 2002, 7:147–177.
21. ICNET: A National Evaluation of the Costs and Outcomes of Intermediate Care 51. David M, Little RJA, Samuhel ME, Triest RK: Alternative Methods for CPS
for Older People: Final Report. Leicester: The University of Leicester; 2005. Income Imputation. Journal of the American StatisticalAssociation 1986,
22. Kaambwa B, Bryan S, Barton P, Parker H, Martin G, Hewitt G, et al: Costs and 81:29–41.
health outcomes of intermediate care: results from five UK case study 52. Verbeke G, Molenberghs G: Linear Mixed Models for Longitudinal Data. New
sites. Health Soc Care Community 2008, 16:573–581. York: Springer; 2000.
23. Kaambwa B, Billingham L, Bryan S: Mapping utility scores from the Barthel 53. Mallinckrodt CH, Sanger TM, Dube S, DeBrota DJ, Molenberghs G, Carroll RJ,
index. European Journal of Health Economics 2011 Nov 2 [Epub ahead of et al: Assessing and interpreting treatment effects in longitudinal clinical
print]. trials with missing data. Biol Psychiatry 2003, 53:754–760.
24. Brazier JE, Walters SJ, Nicholl JP, Kohler B: Using the SF-36 and Euroqol on 54. Little RJA: Regression with missing X’s: a review. Journal of the American
an elderly population. Qual Life Res 1996, 5:195–204. Statistical Association 1992, 87:1227–1237.
25. Coast J, Peters TJ, Richards SH, Gunnell DJ: Use of the EuroQoL among 55. Von Hippel PT: Regression with missing Ys: An improved strategy for
elderly acute care patients. Qual Life Res 1998, 7:1–10. analyzing multiply imputed data. Sociological Methodology 2007,
26. Lyons RA, Crome P, Monaghan S, Killalea D, Daley JA: Health status and 37:83–117.
disability among elderly people in three UK districts. Age Ageing 1997, 56. Glynn RJ, Laird NM, Rubin DB: Drawing Inferences from Self-selected
26:203–209. Samples. In Selection modelling versus mixture modelling with nonignorable
27. Brazier J, Roberts J, Tsuchiya A, Busschbach J: A comparison of the EQ-5D nonresponse. Edited by Wainer H. New York: Springer; 1986:115–142.
and SF-6D across seven patient groups. Health Econ 2004, 13:873–884. 57. Orme JG, Reis J: Multiple regression with missing data. Journal of Social
28. Dolan P: Modeling valuations for EuroQol health states. Med Care 1997, Service Research 1991, 9:61–91.
35:1095–1108.
29. Kind P, Hardman G, Macran S: UK population norms for EQ-5D. Discussion doi:10.1186/1756-0500-5-330
paper 172. York: University of York Centre for Health Economics; 1999. Cite this article as: Kaambwa et al.: Do the methods used to analyse
30. Murphy R, Sackley CM, Miller P, Harwood RH: Effect of experience of missing data really matter? An examination of data from an
severe stroke on subjective valuations of quality of life after stroke. observational study of Intermediate Care patients. BMC Research Notes
J Neurol Neurosurg Psychiatry 2001, 70:679–681. 2012 5:330.
31. Sainsbury A, Seebass G, Bansal A, Young JB: Reliability of the Barthel Index
when used with older people. Age Ageing 2005, 34:228–232.
32. Minosso JSM, Amendola F, Alvarenga MRM, de Campos Oliveira MA:
Validation of the Barthel Index in elderly patients attended in outpatient
clinics, in Brazil. Acta Paul Enferm 2010, 23:218–223.
33. Mahoney FI, Barthel D: Functional Evaluation: The Barthel Index. Md State
Med J 1965, 14:61–65.
34. Wolfe CD, Taub NA, Woodrow EJ, Burney PG: Assessment of scales of
disability and handicap for stroke patients. Stroke 1991, 22:1242–1244.
35. Shah S, Vanclay F, Cooper B: Improving the sensitivity of the Barthel Index
for stroke rehabilitation. J Clin Epidemiol 1989, 42:703–709.
36. Musil CM, Warner CB, Yobas PK, Jones SL: A comparison of imputation
techniques for handling missing data. West J Nurs Res 2002, 24:815–829.
37. Fielding S, Fayers PM, Ramsay CR: Investigating the missing data
mechanism in quality of life outcomes: a comparison of approaches.
Health Qual Life Outcomes 2009, 7:57.
38. McCullagh P, Nelder JA: Generalized linear models. 2nd edition. London:
Chapman & Hall; 1989.
39. Manning WG, Mullahy J: Estimating log models: to transform or not to
transform? J Health Econ 2001, 20:461–494.
40. Altman D: Practical statistics for medical research. 2nd edition. London:
Chapman & Hall; 1991.
41. Cantoni E, Ronchetti E: A robust approach for skewed and heavy-tailed
outcomes in the analysis of health care expenditures. J Health Econ 2006,
25:198–213.
42. Duan N: Smearing estimate a nonparametric retransformation method.
J Amer Statist Assoc 1983, 78:605–610. Submit your next manuscript to BioMed Central
43. Kilian R, Matschinger H, Loeffler W, Roick C, Angermeyer MC: A comparison and take full advantage of:
of methods to handle skew distributed cost variables in the analysis of
the resource consumption in schizophrenia treatment. J Ment Health
• Convenient online submission
Policy Econ 2002, 5:21–31.
44. Brazier JE, Yang Y, Tsuchiya A, Rowen DL: A review of studies mapping (or • Thorough peer review
cross walking) non-preference based measures of health to generic • No space constraints or color ﬁgure charges
preference-based measures. Eur J Health Econ 2010, 11:215–225.
45. Gujarati D: Basic Econometrics. 3rd edition. New York: McGraw-Hill, Inv; 1995. • Immediate publication on acceptance
46. Schafer JL: NORM: Multiple imputation of incomplete multivariate data • Inclusion in PubMed, CAS, Scopus and Google Scholar
under a normal model version 2 Software for Windows 95/98/NT. [http:// • Research which is freely available for redistribution
www.stat.psu.edu/jls/misoftwa.html].
47. StataCorp LP: Intercooled Stata 82 for Windows.: College Station, TX: US
StataCorp LP; 2004. Submit your manuscript at
www.biomedcentral.com/submit

Tutorial 2
No ratings yet
Tutorial 2
4 pages
Mice vs Ppca
No ratings yet
Mice vs Ppca
8 pages
Lecture 2.3.10
No ratings yet
Lecture 2.3.10
30 pages
10.1007@s10198 015 0734 5
No ratings yet
10.1007@s10198 015 0734 5
12 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Missing Data
100% (2)
Missing Data
35 pages
Performance of A General Location Model With An Ignorable Missing-Data Assumption in A Multivariate Mental Health Services Study
No ratings yet
Performance of A General Location Model With An Ignorable Missing-Data Assumption in A Multivariate Mental Health Services Study
13 pages
An analysis of four missing data treatment methods for supervised learning
No ratings yet
An analysis of four missing data treatment methods for supervised learning
16 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
missng data
No ratings yet
missng data
8 pages
Dyad 008
No ratings yet
Dyad 008
8 pages
He Circulation 2010-1 PDF
No ratings yet
He Circulation 2010-1 PDF
12 pages
SPSS
No ratings yet
SPSS
92 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Milsap Allison
No ratings yet
Milsap Allison
18 pages
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
No ratings yet
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
20 pages
The proportion of missing data should not be used to guide decisions on
No ratings yet
The proportion of missing data should not be used to guide decisions on
11 pages
Handling Data With Three Types of Missing Values
No ratings yet
Handling Data With Three Types of Missing Values
33 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
m Akaba 2019
No ratings yet
m Akaba 2019
7 pages
S3 Missing Value Analysis Imputation
No ratings yet
S3 Missing Value Analysis Imputation
15 pages
a-comparison-of-three-popular-methods-for-handling-missing-data-complete-case-analysis-inverse
No ratings yet
a-comparison-of-three-popular-methods-for-handling-missing-data-complete-case-analysis-inverse
31 pages
Missing Data and Multi Imputation
No ratings yet
Missing Data and Multi Imputation
5 pages
R2 - Horton2007 - Missing Data
No ratings yet
R2 - Horton2007 - Missing Data
13 pages
Missing Data
No ratings yet
Missing Data
5 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Newman
No ratings yet
Newman
40 pages
Missing Data in Stata
No ratings yet
Missing Data in Stata
12 pages
Missing Data Part 1: Overview, Traditional Methods
No ratings yet
Missing Data Part 1: Overview, Traditional Methods
11 pages
Mcar, Mar, Mnar
No ratings yet
Mcar, Mar, Mnar
6 pages
Comparative Methods For Handling Missing Data in Large Databases
No ratings yet
Comparative Methods For Handling Missing Data in Large Databases
13 pages
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
No ratings yet
Sullivan Et Al 2018 - Should Multiple Imputation Be The Method of Choice For Handling Missing Data in Randomized Trials
17 pages
FDS_U4.pptx
No ratings yet
FDS_U4.pptx
93 pages
603-8-1 Donders - J Clin Epidemiol 2006 v59 n10 p1087-91
No ratings yet
603-8-1 Donders - J Clin Epidemiol 2006 v59 n10 p1087-91
5 pages
Graham2009 Missing Values Analysis
No ratings yet
Graham2009 Missing Values Analysis
31 pages
Missing Data Review
No ratings yet
Missing Data Review
31 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
DL vs Conventional
No ratings yet
DL vs Conventional
14 pages
149 Missing
No ratings yet
149 Missing
10 pages
A Comparison of Imputation Strategies For Ordinal Missing Data On Likert Scale Variables
No ratings yet
A Comparison of Imputation Strategies For Ordinal Missing Data On Likert Scale Variables
21 pages
Missing-data,-part-1.-Why-missing-data-are-a-probl
No ratings yet
Missing-data,-part-1.-Why-missing-data-are-a-probl
4 pages
Missingdata
No ratings yet
Missingdata
10 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
Missing Data
No ratings yet
Missing Data
7 pages
v93b01
No ratings yet
v93b01
4 pages
Missingdata Methodsandimplications
No ratings yet
Missingdata Methodsandimplications
4 pages
Marketing Analytics (Unit 2)
No ratings yet
Marketing Analytics (Unit 2)
78 pages
BBL Presentation March2013 Followup
No ratings yet
BBL Presentation March2013 Followup
3 pages
Imputation in Clinical Cases
No ratings yet
Imputation in Clinical Cases
10 pages
Missing Data Stata
No ratings yet
Missing Data Stata
18 pages
Imputation
No ratings yet
Imputation
10 pages
IBM SPSS Missing Values
No ratings yet
IBM SPSS Missing Values
34 pages
Missing Value Paper
No ratings yet
Missing Value Paper
10 pages
Values
No ratings yet
Values
30 pages
Jurnal Farmakoepidemiologi - M. Ari Wisnu - 252020536U
No ratings yet
Jurnal Farmakoepidemiologi - M. Ari Wisnu - 252020536U
11 pages
moreno-betancur-chavance-2013-sensitivity-analysis-of-incomplete-longitudinal-data-departing-from-the-missing-at-random
No ratings yet
moreno-betancur-chavance-2013-sensitivity-analysis-of-incomplete-longitudinal-data-departing-from-the-missing-at-random
19 pages
Talk MissingCovariateDataJuly2010
No ratings yet
Talk MissingCovariateDataJuly2010
35 pages
Missing Data
No ratings yet
Missing Data
10 pages
Safari - Feb 29, 2024 at 8:02 AM
No ratings yet
Safari - Feb 29, 2024 at 8:02 AM
1 page
Common Errors in Statistics (and How to Avoid Them)
From Everand
Common Errors in Statistics (and How to Avoid Them)
Phillip I. Good
No ratings yet
Competitive Bidding Strategy
No ratings yet
Competitive Bidding Strategy
38 pages
Unit 5
No ratings yet
Unit 5
19 pages
Nonlinear Nonparametric Statistics: Using Partial Moments
100% (2)
Nonlinear Nonparametric Statistics: Using Partial Moments
101 pages
Chi Square Test
No ratings yet
Chi Square Test
14 pages
Describing Data: Frequency Distributions and Graphic Presentation
No ratings yet
Describing Data: Frequency Distributions and Graphic Presentation
22 pages
DSBDAL_Assignment No 6
No ratings yet
DSBDAL_Assignment No 6
4 pages
stac101-agriculture-statistics
No ratings yet
stac101-agriculture-statistics
92 pages
Page 290 and 291 of The Book "Research Methods by Sevilla Et. Al 1998
No ratings yet
Page 290 and 291 of The Book "Research Methods by Sevilla Et. Al 1998
1 page
Managerial Leadership and Performance As Fully Mediated by Transformational Leadership Through Structural Equation Modeling
No ratings yet
Managerial Leadership and Performance As Fully Mediated by Transformational Leadership Through Structural Equation Modeling
8 pages
Measures of Central Tendency
100% (3)
Measures of Central Tendency
36 pages
Process Capability
No ratings yet
Process Capability
19 pages
Regression
No ratings yet
Regression
15 pages
A Study On Technology Driven Innovation Practices in Banking Sector in Tiruchirappalli District
No ratings yet
A Study On Technology Driven Innovation Practices in Banking Sector in Tiruchirappalli District
11 pages
T Test Conclusion
No ratings yet
T Test Conclusion
2 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
6 pages
Skor Pencapaian Dalam SSQL 1113 Frekuensi
No ratings yet
Skor Pencapaian Dalam SSQL 1113 Frekuensi
3 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Lecture 5
No ratings yet
Lecture 5
29 pages
30 Days of Interview Preparation
100% (1)
30 Days of Interview Preparation
415 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Instant download Applied Statistics From Bivariate Through Multivariate Techniques 2nd Edition Warner Test Bank pdf all chapter
100% (6)
Instant download Applied Statistics From Bivariate Through Multivariate Techniques 2nd Edition Warner Test Bank pdf all chapter
32 pages
Muhammad Muneeb Arshad (359126) Im DVM
No ratings yet
Muhammad Muneeb Arshad (359126) Im DVM
5 pages
Statistic Joke
No ratings yet
Statistic Joke
5 pages
Regression Analysis MCQ's
No ratings yet
Regression Analysis MCQ's
3 pages
_211423205137 Ex-8A
No ratings yet
_211423205137 Ex-8A
3 pages
Reliability
No ratings yet
Reliability
15 pages
Altavox Case
No ratings yet
Altavox Case
10 pages
Errors in Chemical Analysis
No ratings yet
Errors in Chemical Analysis
51 pages
20ma402 Ps Unit III DCM
No ratings yet
20ma402 Ps Unit III DCM
77 pages

Analizing Missing Data

Uploaded by

Analizing Missing Data

Uploaded by

Kaambwa et al.

BMC Research Notes 2012, 5:330

RESEARCH ARTICLE Open Access

Do the methods used to analyse missing data

Background may result in a significant reduction in sample size lead-

Table 1 Variables for use in economic analysis (with level of completeness)

2,253 obs/ patients in the ‘National evaluation of costs

COST MODEL OUTCOME MODELS

1,536 obs with missing 1,148 obs with missing data

Complete MI used Selection Complete MI used Selection Complete MI used Selection

GLM on GLM on Heckman OLS OLS Heckman OLS OLS Heckman

Assuming Assuming Assuming Assu- Assu- Assuming Assu- Assu- Assuming

You might also like