Health and Quality of Life Outcomes
Health and Quality of Life Outcomes
Health and Quality of Life Outcomes
Address: 1Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK, 2Department of Rehabilitation Medicine, Faculty of Medicine
and Health, The University of Leeds, Leeds General Infirmary, St George St, Leeds, LS1 3EX, UK, 3Coventry Teaching Primary Care Trust,
Christchurch House, Greyfriars Lane, Coventry, CV1 2GQ, UK, 4Community Health Sciences, School of Clinical Sciences & Community Health,
University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, UK and 5NHS Health Scotland, Elphinstone House, 65 West Regent Street, Glasgow,
G2 2AF, UK
Email: Sarah Stewart-Brown* - [email protected]; Alan Tennant - [email protected];
Ruth Tennant - [email protected]; Stephen Platt - [email protected]; Jane Parkinson - [email protected];
Scott Weich - [email protected]
* Corresponding author
Abstract
Background: The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) was developed to meet demand for instruments
to measure mental well-being. It comprises 14 positively phrased Likert-style items and fulfils classic criteria for scale
development. We report here the internal construct validity of WEMWBS from the perspective of the Rasch measurement
model.
Methods: The model was applied to data collected from 779 respondents in Wave 12 (Autumn 2006) of the Scottish Health
Education Population Survey. Respondents were aged 16–74 (average 41.9) yrs.
Results: Initial fit to model expectations was poor. The items 'I've been feeling good about myself', 'I've been interested in new
things' and 'I've been feeling cheerful' all showed significant misfit to model expectations, and were deleted. This led to a marginal
improvement in fit to the model. After further analysis, more items were deleted and a strict unidimensional seven item scale
(the Short Warwick Edinburgh Mental Well-Being Scale (SWEMWBS)) was resolved. Many items deleted because of misfit with
model expectations showed considerable bias for gender. Two retained items also demonstrated bias for gender but, at the
scale level, cancelled out. One further retained item 'I've been feeling optimistic about the future' showed bias for age. The
correlation between the 14 item and 7 item versions was 0.954.
Given fit to the Rasch model, and strict unidimensionality, SWEMWBS provides an interval scale estimate of mental well-being.
Conclusion: A short 7 item version of WEMWBS was found to satisfy the strict unidimensionality expectations of the Rasch
model, and be largely free of bias. This scale, SWEMWBS, provides a raw score-interval scale transformation for use in
parametric procedures. In terms of face validity, SWEMWBS presents a more restricted view of mental well-being than the 14
item WEMWBS, with most items representing aspects of psychological and eudemonic well-being, and few covering hedonic
well-being or affect. However, robust measurement properties combined with brevity make SWEMWBS preferable to
WEMWBS at present for monitoring mental well-being in populations. Where face validity is an issue there remain arguments
for continuing to collect data on the full 14 item WEMWBS.
Page 1 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
Page 2 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
ysis (see below) bases person estimates upon the informa- deviating from model expectation may cause significant
tion that is available, estimates can be given where misfit at the item level.
missing values are present. However, the precision of the
estimate is reduced to an extent depending on the number The proper ordering of response categories is also evalu-
of missing items. ated. Failure to follow an expected increase in response
option consistent with an underlying increase in mental
The Rasch model well-being would show disordered thresholds across cate-
In satisfying the axioms of conjoint measurement [20], gories within an item. The term threshold refers to the
the Rasch model shows what is expected of responses to point between two response categories where either
items in a scale if measurement (at the metric level) is to response is equally probable. For a given item the number
be achieved. Dichotomous [14] and polytomous versions of thresholds is always one less than the number of
of the model are available [21,22]. The model assumes response options.
that the probability of a given respondent affirming an
item is a logistic function of the relative distance between Within the framework of Rasch measurement, the scale
the item location and the respondent location on a linear should also work in the same way irrespective of which
scale. In other words the probability that a person will group (e.g. gender) is being assessed [26]. For example, in
affirm an item is a logistic function of the difference the case of measuring mental well-being, males and
between the person's level of, for example, mental well- females should have the same probability of affirming an
being, and the level of well-being expressed by the item. item (in the dichotomous case), at the same level of mental
The model can be expressed in the form of a logit model: well-being. Thus the probability is conditioned on the trait.
If for some reason one gender did not display the same
⎛ P ⎞ probability of affirming the item, then this item would be
ln ⎜ ni ⎟ = q n − b i deemed to display differential item functioning (DIF),
⎝ 1− Pni ⎠ and runs the risk of biasing results. For example, if items
where ln is the normal log, P is the probability of person were biased for gender, then gender could not be used as
n affirming item i; is the person's level of mental well- a predictor variable for mental well-being, as the measure-
being, and b is the level of mental well-being expressed by ment of mental well-being would be confounded by gen-
the item. der bias. It is important to note that the detection of and,
if necessary, the adjustment for DIF, does not remove the
The process of Rasch analysis is described in detail else- effect of gender, but rather ensures that there is no gender
where [23,24]. Briefly, the analysis is concerned with how bias in the scale so that the effect of gender can be properly
far the observed data match that expected by the model, understood. In practice adjustments for such bias can be
using a number of fit statistics. In this paper, three overall made post-hoc in most circumstances, but items display-
fit statistics are considered. Two are item-person interac- ing DIF would be prime candidates for removal in any
tion statistics transformed to approximate a z-score, repre- scale revision [27]. Sometimes bias may cancel out in the
senting a standardised normal distribution. Therefore if test, for example, one item may favour males, another
the items and persons fit the model, a mean of approxi- females, and their effects may be nullified [28]. In the cur-
mately zero and a standard deviation of 1 would be rent analysis, DIF was tested for age, gender, and the pres-
expected. A third is an item-trait interaction statistic ence or not of a long-standing illness.
reported as a Chi-Square, reflecting the property of invar-
iance across the trait. A significant Chi-Square indicates Strict tests of unidimensionality are undertaken at every
that the hierarchical ordering of the items varies across the stage of analysis [29]. A Principal Component Analysis
trait, so compromising the required property of invari- (PCA) of the residuals is undertaken, the standardised
ance. person-item differences between the observed data and
what is expected by the model for every person's response
In addition to these overall summary fit statistics, individ- to every item. After extracting the 'Rasch factor' there
ual person- and item-fit statistics are presented, both as should be no further pattern in the data. This is formally
residuals (a summation of individual person and item tested by allowing the factor loadings on the first residual
deviations) and as a Chi Square statistic. In the former component to determine 'subsets' of items and then test-
case residuals between ± 2.5 are deemed to indicate ade- ing, by an independent t-test to see if the person estimate
quate fit to the model. To take account of multiple testing (the logit of person 'ability' or, in this case 'mental well-
Bonferroni corrections are applied to adjust the Chi- being') derived from these subsets significantly differ from
square p value [25]. The same fit statistics are available to each other [29,30]. If more than 5% of independent t-tests
detect person deviation, as a few respondents significantly are found to be significant, allowing for a Binomial confi-
Page 3 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
dence interval for a proportion, this would indicate a Local dependency was then observed for two more items
breach of the assumption of unidimensionality. and, after further analysis, a strict unidimensional seven
item scale was resolved (Analysis 4), comprising:
An estimate of the internal consistency reliability of the
scale is also available, based on the Person Separation Item 1 – I've been feeling optimistic about the future
Index (PSI) where the estimates on the logit scale for each
person are used to calculate reliability. This is equivalent Item 2 – I've been feeling useful
to Cronbach's Alpha [10].
Item 3 – I've been feeling relaxed
In order to obtain robust estimates of the internal con-
struct validity of the scale, the total data set is randomised Item 6 – I've been dealing with problems well
into two further sets of approximately 50% of cases. Final
results concerning the validity of the scale should be Item 7 – I've been thinking clearly
robust over the full data set, and each random sample.
Item 9 – I've been feeling close to other people
The Rasch analysis was undertaken with the RUMM2020
software package [31]. Item 11 – I've been able to make up my own mind about
things
Results
The 779 cases initially displayed no floor or ceiling effects, We have named this shortened scale SWEMWBS (Short
and thus all were entered into the analysis. The log Likeli- Warwick-Edinburgh Mental Well-being Scale) (see addi-
hood test Chi Square was 143.75 (df 38) with a probabil- tional file 2).
ity < 0.0001, indicating that the partial credit version of
the Rasch model was appropriate. All thresholds were Five out of the seven items discarded showed significant
found to be ordered (Figure 1). That is, within each item, DIF for gender (Table 2). For example, the item 'I've been
the transition from one category to the next represents an feeling confident' (item 10) showed that, at any level of
increase in the underlying trait of mental well-being. mental well-being, males were more likely to report a
higher score than females (Figure 2).
Initial fit to model expectations was poor (Table 1 – Anal-
ysis 1). The items 'I've been feeling good about myself', In the final seven item scale two items also showed DIF for
'I've been interested in new things' and 'I've been feeling gender, but these were found to cancel out at the test level,
cheerful' all showed significant misfit to model expecta- and fit improved further (Analysis 5). One further item
tions, and were deleted. This led to a marginal improve- (item 1) 'I've been feeling optimistic about the future' still
ment in fit (Analysis 2). A further two items 'I've been displayed marginal DIF for age. None of the items in the
feeling interested in other people' and 'I've had energy to 14 item WEMWBS showed DIF by the presence or absence
spare' were deleted, resulting in further improvement of a long-standing condition. As might be expected with a
(Analysis 3). shorter scale, the level of reliability had fallen from 0.906
(Analysis 1) to 0.845 (Analysis 5), although the original
14 item version is compromised by multidimensionality
caused by gender bias.
Page 4 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
each gender. Neither the males (Analysis 8) nor the measure mental health in trials and population surveys
females (Analysis 9) demonstrated fit to model expecta- have not been shown to meet these strict criteria.
tions, suggesting that the disturbance to the scale was
more than just gender DIF. Our analysis has shown that seven of the original 14 items
of WEMWBS, which we have called SWEMWBS (Short
Discussion Warwick-Edinburgh Mental Well-being Scale), conform
Increasingly, scales used for measuring health and medi- to Rasch model expectations and provide a valid raw score
cal outcomes are being developed to meet the strict crite- – interval level transformation with a correlation of 0.954
ria associated with additive conjoint measurement as to the full scale. Furthermore, SWEMWBS has been shown
operationalised through the Rasch measurement model to be largely free of item bias, and that its polytomous
[14,20]. Providing a scientific basis for the construction of response structure works as intended, with higher scores
linear measurement this approach is now widely used in within an item reflecting greater overall mental well-
the health and social sciences [32,33]. It remains true, being.
however, that the majority of scales commonly used to
Page 5 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
Emboldened probabilities show significant DIF. Shaded items are those that were deleted.
Item numbers correspond to the order of items in WEMWBS (additional file 1)
Although confirmatory factor analysis (not shown) had a component of education about the nature of mental
indicated that WEMWBS was consistent with a single well-being, which for many members of the public is a
underlying factor [8] the scale did not meet the criteria new concept. For this reason it was considered important
required of the Rasch model. Most of the seven items that WEMWBS presented a full picture of mental well-
excluded showed bias for gender. Perhaps because of this being including items relating to the majority of aspects
DIF (which can be a cause of multidimensionality), it was proposed in the academic literature. Face validity studies
not possible to construct a second meaningful scale from with the general public and its popularity with those prac-
the seven deleted items. Separate analyses of the 14 item ticing mental health promotion and public mental health
set by gender showed lack of fit to model expectations on in the UK suggest that WEMWBS met this goal.
both occasions, suggesting an underlying problem over
and above the disturbance caused by gender DIF. In order In terms of face validity, the 7 item scale (SWEMWBS)
to satisfy the rules for constructing interval scaling, the presents a more restricted view of mental well-being than
Rasch model imposes the strictest measurement criteria the 14 item scale (WEMWBS), with most items represent-
and. WEMWBS lack of fit to model expectations may have ing aspects of psychological and eudemonic well-being,
arisen either because of dimensionality issues, or because and few covering hedonic well-being or affect. In terms of
of the additional requirements for interval scale measure- measurement properties, however, the 7 item scale
ment over and above that required for ordinal scales. (SWEMWBS) was robust to Rasch model expectations,
whereas the original 14 item scale (WEMWBS) was not.
WEMWBS was developed, in part, to support the evalua- The lack of measurement validity shown by half the items
tion of mental well-being programmes. The latter involve in the original 14 item scale may be attributable to current
levels of knowledge and self-awareness relating to mental
well-being among the general public resulting in
responses which are not robust. As knowledge and self
awareness increase this situation may change.
Page 6 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
Table 3: Raw score to metric score conversion table for 7 item scale for use when change scores and other para-
SWEMWBS. metric procedures are required.
Raw Score Metric Score
Conclusion
7 7.00 Although providing a broader view of mental well-being
than the shortened version (SWEMWBS), WEMWBS does
8 9.51 not meet the strict criteria for measurement demanded by
the RASCH model, demonstrating DIF and multidimen-
9 11.25
sionality. The shortened scale, comprised of 7 items
10 12.40
(SWEMWBS), satisfied all criteria, including strict unidi-
mensionality. A linear transformation of the raw score
11 13.33 from SWEMWBS (Table 3) can be used with confidence in
parametric analyses, given appropriate distribution.
12 14.08 Responses to mental well-being scales may change as
knowledge and self-awareness increase at population
13 14.75
level. There are, therefore, arguments for continuing to
gather data on the 14 item scale (given the seven item
14 15.32
scale is embedded) to examine measurement of mental
15 15.84 well-being at the ordinal level, to explore item bias in dif-
ferent samples, and to further analyse potential dimen-
16 16.36 sionality.
19 17.98
Authors' contributions
SSB conceived of the study, supported the study design,
20 18.59 coordinated the development of the instrument and
drafted the manuscript. AT carried out all the statistical
21 19.25 analyses and produced the first draft of the manuscript. RT
designed and coordinated the study. SP participated in the
22 19.98 design and coordination of the study, and helped to draft
the manuscript. JP commissioned the study, participated
23 20.73
in its coordination and helped to draft the manuscript.
24 21.54
SW participated in the coordination of the study and
helped to draft the manuscript.
25 22.35
Additional material
26 23.21
33 30.70
34 32.55 Acknowledgements
NHS Health Scotland commissioned the HEPS which was carried out by
35 35.00 BMRB International. Ruth Fishwick played an important role in the devel-
opment and validation of the WEMWBS, a project which was supported by
Page 7 of 8
(page number not for citation purposes)
Health and Quality of Life Outcomes 2009, 7:15 http://www.hqlo.com/content/7/1/15
Stephen Joseph and guided by an Expert Panel comprised of Jenny Secker, 27. Tennant A, Penta M, Tesio L, Grimby G, Thonnard J-L, Slade A, Law-
Glyn Lewis, Stephen Stansfeld, in addition to SS-B, RT, SP, JP and SW. We ton G, Simone A, Carter J, Lundgren-Nilsson A, Tripolski M, Ring H,
Biering-Sørensen F, Marincek C, Burger H, Phillips S: Assessing and
are very grateful to all those who have contributed in this way.
adjusting for cross cultural validity of impairment and activ-
ity limitation scales through Differential Item Functioning
References within the framework of the Rasch model: the Pro-ESOR
1. World Health Organisation: Promoting Mental Health; Con- project. Medical Care 2004, 42:37-48.
cepts emerging evidence and practice. In Summary report 28. Tennant A, Pallant JF: DIF matters: A practical approach to test
Geneva; World Health Organisation; 2004. if Differential Item Functioning (DIF) makes a difference.
2. World Health Organisation: Strengthening mental health pro- Rasch Measurement Transactions 2007, 20:1082-1084.
motion. Geneva; World Health Organisation; 2001. 29. Smith EV: Detecting and evaluation the impact of multidimen-
3. Ryan RM, Deci EL: On happiness and human potential: a review sionality using tem fit statistics and principal component
of research on hedonic and eudaimonic well-being. Annual analysis of residuals. Journal of Applied Measurement 2002,
Review Psychology 2001, 52:141-166. 3:205-231.
4. Huppert FA, Wittington JE: Positive mental health in individuals 30. Tennant A, Pallant JF: Multidimensionality matters. Rasch Meas-
and populations. In The Science of Well-being Edited by: Huppert FA, urement Transactions 2006, 20:1048-1051.
Baylis N. Keverne Oxford: Oxford University Press; 2004:307-340. 31. Andrich D, Lyne A, Sheridon B, Luo G: RUMM 2020. Perth: RUMM
5. Linley PA, Joseph S, Eds: Positive psychology in practice. Hobo- Laboratory; 2003.
ken, NJ: Wiley; 2004. 32. Keenan A-M, Redmond A, Horton M, Conaghan P, Tennant A: The
6. Joseph S, Linley PA: Positive therapy: a meta-theory for posi- Foot Posture Index: Rasch analysis of a novel, foot specific
tive psychological practice. Routledge 2006. outcome measure. Archives Physical Medicine and Rehabilitation
7. Hu Y, Stewart-Brown S, Twigg L, Weich S: Can the 12 item Gen- 2007, 88:88-93.
eral Health Questionnaire be used to measure positive men- 33. Kyriakides L, Kaloyirou C, Lindsay G: An analysis of the Revised
tal health? Psychological Medicine 2007, 37(7):1005-13. Olweus Bully/Victim Questionnaire using the Rasch meas-
8. Tennant Ruth, Hiller Louise, Fishwick Ruth, Platt Stephen, Joseph urement model. British Journal of Educational Psychology 2006,
Stephen, Weich Scott, Parkinson Jane, Secker Jenny, Sarah Stewart- 76(4):781-801.
Brown: The Warwick-Edinburgh Mental Well-being Scale
(WEMWBS): development and UK validation. Health and
Quality of Life Outcomes 2007, 5:63.
9. Nunally JC: Psychometric theory. New York: McGraw-Hill; 1978.
10. Cronbach LJ: Coefficient alpha and the internal structure of
tests. Psychometrika 1951, 16:297-334.
11. Green SB, Lissitz RW, Mulaik SA: Limitations of coefficient alpha
as an index of test unidimensionality. Educational and Psycholog-
ical Measurements 1977, 37:827-838.
12. McDonald RP, Ahlawat KS: Difficulty factors in binary data. Brit-
ish Journal of Mathematical and Statistical Psychology 1974, 27:82-99.
13. Pallant JF: SPSS Survival Manual. Second edition. Maidenhead:
Open University Press; 2005.
14. Rasch G: Probabilistic models for some intelligence and
attainment tests. Chicago: University of Chicago Press; 1960.
15. Guttman LA: The basis for Scalogram analysis. In Studies in social
psychology in World War II: Measurement and Prediction Volume 4. Edited
by: Stouffer SA, Guttman LA, Suchman FA, Lazarsfeld PF, Star SA,
Clausen JA. Princeton: Princeton University Press; 1950:60-90.
16. Karabatos G: The Rasch model, additive conjoint measure-
ment, and new models of probabilistic measurement theory.
Journal of Applied Measurement 2001, 2:389-423.
17. Teresi JA, Kleinman M, Ocepek-Welikson K: Modern psychomet-
ric methods for detection of differential item functioning:
application to cognitive assessment measures. Statistical Med-
icine 2000, 19:1651-83.
18. Wright BD, Stone G: Best test design. Chicago: MESA Press; 1979.
19. Svensson E: Guidelines to statistical evaluation of data from
rating scales and questionnaires. Journal of Rehabilitation Medicine
2001, 33:47-48.
20. Luce RD, Tukey JW: Simultaneous conjoint measurement: A
new type of fundamental measurement. Journal of Mathematical
Psychology 1964, 1:1-27.
21. Andrich D: Rating formulation for ordered response catego-
ries. Psychometrika 1978, 43:561-573. Publish with Bio Med Central and every
22. Masters G: Rasch model for partial credit scoring. Psychometrika scientist can read your work free of charge
1982, 47:149-174.
23. Pallant JF, Tennant A: An introduction to the Rasch measure- "BioMed Central will be the most significant development for
ment model: An example using the Hospital Anxiety and disseminating the results of biomedical researc h in our lifetime."
Depression Scale (HADS). British Journal of Clinical Psychology Sir Paul Nurse, Cancer Research UK
2007, 46:1-18.
24. Tennant A, Conaghan PG: The Rasch Measurement Model in Your research papers will be:
Rheumatology: What is it and why use it? When should it be
available free of charge to the entire biomedical community
applied, and what should one look for in a Rasch paper? Arthri-
tis Rheumatism 2007, 57:1358-1362. peer reviewed and published immediately upon acceptance
25. Bland JM, Altman DG: Multiple significance tests: the Bonfer-
cited in PubMed and archived on PubMed Central
roni method. British Medical Journal 1995, 310:170.
26. Holland PW, Wainer H: Differential Item Functioning. In Hills- yours — you keep the copyright
dale New Jersey: Lawrence Erlbaum; 1993.
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing_adv.asp
Page 8 of 8
(page number not for citation purposes)