The Validity of Assessment Centres For The Prediction of Supervisory Performance Ratings
The Validity of Assessment Centres For The Prediction of Supervisory Performance Ratings
net/publication/237831139
CITATIONS READS
77 4,013
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Situational Perception in Situational Judgment Tests: The Role of Context in Situational Judgment Tests View project
All content following this page was uploaded by Filip Lievens on 25 April 2018.
The current meta-analysis of the selection validity of assessment centres aims to update
an earlier meta-analysis of assessment centre validity. To this end, we retrieved 26 studies
and 27 validity coefficients (N ¼ 5850) relating the Overall Assessment Rating (OAR) to
supervisory performance ratings. The current study obtained a corrected correlation of
.28 between the OAR and supervisory job performance ratings (95% confidence interval
.24 q .32). It is further suggested that this validity estimate is likely to be conserva-
tive given that assessment centre validities tend to be affected by indirect range
restriction.
& 2007 The Authors. Journal compilation & 2007 Blackwell Publishing Ltd,
9600 Garsington Road, Oxford, OX4 2DQ, UK and 350 Main St., Malden, MA, 02148, USA
406 Eran Hermelin, Filip Lievens and Ivan T. Robertson
Gaugler et al. (1987) still serve as the ‘gold standard’ of Third, we used Borman’s (1991) definition of super-
assessment centre validity estimation on the basis of the visory job performance ratings, which he defined as ‘an
OAR. However, since 1987, new validity studies have estimate of individuals’ performance made by a super-
been conducted that were obviously not included in the visor’ (p. 280). This estimate could be either an overall
Gaugler et al. (1987) meta-analysis. Accordingly, this or a multi-dimensional performance evaluation. Hence,
study meta-analysed studies that were not included in ratings of potential, objective measures of performance,
the Gaugler et al. (1987) meta-analysis (from 1985 performance tests, and ratings made by peers were
onward). In keeping with the Gaugler et al. study, we excluded.
focus on the selection validity of assessment centres Fourth, studies had to provide sufficient information
(i.e., their ability to select the best candidates for a given to be coded. As most of the studies did not report all
job) instead of on the validity of assessment centre the necessary information, the first author attempted
dimensions (see Arthur et al., 2003). Supervisory to contact the authors of these studies.
performance ratings served as the criterion measure On the basis of these four inclusion criteria 26
in the current meta-analysis. studies with 27 non-overlapping validity coefficients
were included in the meta-analysis. The total N was
5850 (as compared with N ¼ 4180 of Gaugler et al.,
2. Method 1987). Of these studies, 23 had been published, one was
presented at an international conference, and two were
2.1. Database unpublished. The earliest study included was published
in 1985, whereas the most recent study was conducted
We used a number of strategies to identify validity
in 2005.
studies potentially suited for inclusion in the current
The coding of the 27 validity coefficients which
meta-analysis. First, a computerized search of various
constituted the final meta-analytic dataset was con-
electronic databases was conducted (PsycInfo, Social
ducted separately by the first and second authors. On
Sciences Citation Index, etc.). Second, a computerized
the basis of a sample of studies coded by the authors, a
search of the British Psychological Society database of
reliability check revealed that when both authors
UK-based Chartered Psychologists was undertaken and
entered a coding, their coding agreed in 85% of cases.
academics and practitioners were contacted to identify
The full coding scheme is available from the first author.
individuals who may have access to unpublished assess-
At the end of this procedure, the separately coded
ment centre validity data. Third, around 20 of the top
datasets were compared and any disagreements were
companies in the FTSE 500 index and four of the United
resolved between the two authors.
Kingdom’s largest occupational psychology firms were
contacted.
creating a scatter plot resembling a symmetrical in- the magnitude of observed validities for the presence of
verted funnel. Should the sampling be biased, the funnel direct range restriction and criterion unreliability.
plot would be asymmetrical (Egger et al., 1997). For the
studies included in this meta-analysis, the scatter plot of
validities resembled an inverted funnel with validity 3. Results
coefficients based on small samples showing consider-
able variation, whereas those based on larger samples 3.1. Assessment centre validity
tended to converge on the mean meta-analytic validity
coefficient. There was however a tendency for validities As shown in Table 1, the mean observed r based on a
not to be evenly distributed around the mean, with six total sample size of 5850 was .17. Correcting this
coefficients located under the meta-analytic mean, one coefficient for direct range restriction in the predictor
coefficient corresponding to the meta-analytic mean, variable increased its value to .20. When the coefficient
and 20 coefficients located above the meta-analytic was also corrected for criterion unreliability, the popu-
mean. There was a tendency for studies with larger lation estimate for the correlation between OARs and
sample sizes to be more evenly distributed around the supervisory performance ratings, increased to .28 [95%
meta-analytic mean than studies based on smaller confidence interval (CI) ¼ .24 r .32]. Details of
sample sizes. The study contributing the largest sample the distribution of artifacts used to individually correct
size to the meta-analytic dataset (28% of total cases) observed validity coefficients, are provided in Table 1,
was positioned in the middle of the distribution of which shows that 84% of variance in validity coefficients
validities and so did not skew the outcome of the meta- may be explicable in terms of sampling error. Conse-
analysis. quently, once the variance theoretically contributed by
sampling error was removed, little unexplained variance
remained. Thus, the detection of potential moderator
variables was unlikely. Nevertheless, we tested for
2.4. Meta-analytic procedure various moderators (e.g., number of dimensions as-
The dataset was analysed by following the Hunter and sessed, number of different selection methods, type of
Schmidt (1990) procedures for individually correcting integration procedure used). As could be expected,
correlations for experimental artifacts. In instances in none of these moderators was significant.
which these were deemed not to be sufficiently detailed
for the purposes of the current study, advice was
solicited directly from the book’s second author 4. Discussion
(F. Schmidt, personal communication, 2002). We were
able to obtain range restriction data for 20 out of the The typically used meta-analytic estimate of the validity
27 validity coefficients included in the dataset. These 20 of assessment centre OARs (Gaugler et al., 1987) is
coefficients were hence individually coded for range based on studies conducted prior to 1985, some of
restriction. In the absence of specific information about which are now over 50 years old. However, in the last
the range restriction ratios of the remaining seven 20 years, many new assessment centre validation
coefficients, they were assigned the mean of the range studies have been conducted. Although Arthur et al.
restriction ratios coded for the 20 coefficients indivi- (2003) recently provided an updated estimate of the
dually coded for range restriction (see the Appendix A). validity of assessment centre dimensions, it is also
As reliabilities for supervisory performance ratings important to provide an updated estimate of assess-
were typically not mentioned in the studies included in ment centre OAR validity, as the OAR is almost always
our meta-analytic dataset, we decided to use the best used when assessment centres are used for selection
available reliability estimates for supervisory perfor- purposes (as opposed to developmental purposes).
mance ratings. In fact, two large scale meta-analyses Therefore, this study provides a meta-analytic update
found .52 to be the average criterion reliability estimate to the old value obtained by Gaugler et al. (1987). The
for supervisory performance ratings (Salgado et al. current investigation is also based on a larger sample
2003; Viswesvaran, Ones, & Schmidt, 1996). Hence, size (N ¼ 5850) than the sample size of 4180 used in the
we decided to use the value of .52 as the criterion Gaugler et al. (1987) meta-analysis.
reliability estimate for all 27 validity coefficients. The mean population estimate of the correlation
Although there now exist procedures to correct for between assessment centre OARs and supervisory
indirect range restriction (Hunter, Schmidt, & Le, 2006; performance ratings in the current study was r ¼ .28
Schmidt, Oh, & Le, 2006), we were not able to perform (95% CI ¼ .24 r .32). Our estimate is thus signifi-
this correction as indirect range restriction data were cantly lower than the value of r ¼ .36 (95%
not available in the primary studies. We were therefore CI ¼ .30 r .42) reported by Gaugler et al. (1987),
unable to go beyond the standard practice of correcting which lies outside the 95% CI fitted around our
unreliability; Explained variance, percentage of variance explained by sampling error; SDr , standard deviation of corrected correlations.
estimated population value. A possible explanation for On a broader level, these results show that the
this finding is that the participants of modern assess- selection stage should always be taken into account
ment centres are subject to more pre-selection (given when reporting the validity of predictors (Hermelin &
that they are so costly) than was customary in earlier Robertson, 2001, see also Roth, Bobko, Switzer, &
assessment centres. This would result in more indirect Dean, 2001). Hence, we urge future assessment centre
range restriction in the modern assessment centres researchers to routinely report (1) the selection stage
and consequently, in lower observed and corrected within which assessment centres are used, (2) the pre-
validities. selection ratio of assessment centre participants, and
Unfortunately, we could not correct our data for (3) the correlation between the predictor composite
indirect range restriction because the required indirect used in preliminary selection stages and the OAR used
range restriction data were simply not reported in the in later stages. Only when this information becomes
primary studies. Nevertheless, in ancillary analyses we available, will it be possible to examine more fully the
found some ‘indirect’ evidence of the impact of indirect indirect range restriction issue in assessment centres
range restriction on assessment centre data. Specifi- and to perform the corrections for indirect range
cally, six studies within the meta-analytic dataset re- restriction [according to the procedures detailed in
ported validities for cognitive ability tests that were Hunter et al. (2006) and Schmidt et al. (2006)].
used in the same selection stage within/alongside the In regard to the potential presence of moderator
assessment centre. The mean observed validity of these variables, the current investigation suggests that very
cognitive ability tests with respect to the criterion of little variance remains unaccounted for once sampling
job performance ratings was .10 (N ¼ 1757). Thus, the error has been removed. This result contradicts the
validity of cognitive ability tests used within or alongside notion that assessment centres should show consider-
an assessment centre seemed to be much lower than the able variation in validities given the wide variations in
observed meta-analytic validities for cognitive ability their design and implementation. We believe that this
tests as stand alone predictors (.24 and .22) reported by result is more likely to be a result of little chance
Hunter (1983) and Schmitt, Gooding, Noe, and Kirsch variation in the validity coefficients included in the
(1984) for US data. It is also much lower than the dataset, rather than due to a genuine absence of
observed meta-analytic validity for cognitive ability tests moderator effects.
as stand alone predictors (.29) on the basis of recent The following directions deserve attention in future
European data (Salgado et al., 2003). Although this research on the predictive validity of assessment centres.
comparison should be made with caution, it seems to First, the criterion measures for validating assessment
indicate that the depressed validity of cognitive ability centres should be broadened. Over the last decade, one
tests used within/alongside assessment centres might of the major developments in criterion theory is the
also result from considerable indirect range restriction distinction between task performance and citizenship
on the predictor variable – most likely due to pre- performance (Borman & Motowidlo, 1993). To our
selection on cognitive factors. knowledge, no studies have linked assessment centre
ratings to citizenship behaviours. This is surprising Tests to Predict Job Performance: A clarification of the
because one of the key advantages of assessment centres literature. Journal of Applied Psychology, 86, 730–740.
is that they are able to measure interpersonally oriented Roth, P.L., Bobko, P., Switzer, F.S. and Dean, M.A. (2001) Prior
dimensions. Second, it is of great importance that future Selection Causes Biased Estimates of Standardized Ethnic
studies examine the incremental validity of assessment Group Differences: Simulation and analysis. Personnel Psy-
chology, 54, 591–617.
centres over and above so-called low-fidelity simulations
Salgado, J.F., Anderson, N., Moscoso, S., Bertua, C., De Fruyt,
such as situational judgment tests (McDaniel, Morgeson, F. and Rolland, J.P. (2003) A Meta-Analytic Study of General
Finnegan, Campion, & Braverman, 2001). These tests Mental Ability Validity for Different Occupations in the
have gained in popularity because of their ease of European Community. Journal of Applied Psychology, 88,
administration in large groups and low costs. In 1068–1081.
addition, they seem to capture interpersonal aspects of Schmidt, F.L., Oh, I. and Le, H. (2006) Increasing the Accuracy
the criterion space and have shown good predictive of Corrections for Range Restriction: Implications for
validities. selection procedure validities and other research results.
Personnel Psychology, 59, 281–305.
Schmitt, N., Gooding, R.Z., Noe, R.A. and Kirsch, M. (1984)
Acknowledgement Meta-Analyses of Validity Studies Published Between 1964
and 1982 and the Investigation of Study Characteristics.
We would like to thank Frederik Anseel for his insight- Personnel Psychology, 37, 407–422.
ful comments on a previous version of this manuscript. Thornton, G.C., III. (1992) Assessment Centers and Human
Resource Management. Reading, MA: Addison-Wesley.
Viswesvaran, C., Ones, D.S. and Schmidt, F.L. (1996) Com-
References parative Analysis of the Reliability of Job Performance
Ratings. Journal of Applied Psychology, 81, 557–574.
Arthur, W., Jr, Day, E.A., McNelly, T.L. and Edens, P.S. (2003) A
Meta-Analysis of the Criterion-Related Validity of Assess-
ment Center Dimensions. Personnel Psychology, 56, 125–154.
Borman, W.C. (1991) Job Behavior, Performance, and Effec-
tiveness. In: Dunnette, M.D. and Hough, L.M. (eds), Hand- References to articles included in meta-
book of Industrial and Organizational Psychology. Palo Alto, analytic dataset
CA: Consulting Psychologists Press, pp. 269–313.
Borman, W.C. and Motowidlo, S.J. (1993) Expanding the Anderson, L.R. and Thaker, J. (1985) Self-Monitoring and Sex
Criterion Domain to Include Elements of Contextual as Related to Assessment Center Ratings and Job Perfor-
Performance. In: Schmitt, N. and Borman, W.C. and and mance. Basic and Applied Social Psychology, 6, 345–361.
Associates (eds), Personnel Selection in Organizations. San Arthur, W. and Tubre, T. (2001). The Assessment Center
Francisco: Jossey-Bass, pp. 71–98. Construct-Related Validity Paradox: An investigation of self-
Egger, M., Smith, G.D., Schneider, M. and Minder, C. (1997) monitoring as a misspecified construct. Unpublished Manu-
Bias in Meta-Analysis Detected by a Simple Graphical Test. script.
British Medical Journal, 315, 629–634. Binning, J.F., Adorno, A.J. and LeBreton, J.M. (1999). Intraorga-
Gaugler, B.B., Rosenthal, D.B., Thornton, G.C. and Bentson, nizational Criterion-Based Moderators of Assessment Center
C. (1987) Meta-Analysis of Assessment Center Validity. Validity. Paper Presented at the Annual Conference of the
Journal of Applied Psychology, 72, 493–511. Society for Industrial and Organizational Psychology,
Hermelin, E. and Robertson, I.T. (2001) A Critique and Atlanta, GA, April.
Standardization of Meta-Analytic Validity Coefficients in Bobrow, W. and Leonards, J.S. (1997) Development and
Personnel Selection. Journal of Occupational and Organiza- Validation of an Assessment Center During Organizational
tional Psychology, 74, 253–277. Change. Journal of Social Behavior and Personality, 12, 217–
Hunter, J.E. (1983). Test Validation for 12,000 Jobs: An application 236.
of job classification and validity generalization analysis to the Burroughs, W.A. and White, L.L. (1996) Predicting Sales
general aptitude test battery. USES Test Research Report No. Performance. Journal of Business and Psychology, 11, 73–84.
45, Division of Counseling and Test Development Employ- Chan, D. (1996) Criterion and Construct Validation of an
ment and Training Administration, US Department of Assessment Centre. Journal of Occupational and Organiza-
Labor, Washington, DC. tional Psychology, 69, 167–181.
Hunter, J.E. and Schmidt, F.L. (1990) Methods of Meta-Analysis: Dayan, K., Kasten, R. and Fox, S. (2002) Entry-Level Police
Correcting error and bias in research findings. Beverly Hills, Candidate Assessment Center: An efficient tool or a
CA: Sage. hammer to kill a fly? Personnel Psychology, 55, 827–849.
Hunter, J.E., Schmidt, F.L. and Le, H. (2006) Implications of Dobson, P. and Williams, A. (1989) The Validation of the
Direct and Indirect Range Restriction for Meta-Analysis Selection of Male British Army Officers. Journal of Occupa-
Methods and Findings. Journal of Applied Psychology, 91, tional Psychology, 62, 313–325.
594–612. Feltham, R. (1988) Assessment Centre Decision Making:
McDaniel, M.A., Morgeson, F.P., Finnegan, E.B., Campion, M.A. Judgmental vs. mechanical. Journal of Occupational Psychology,
and Braverman, E.P. (2001) Use of Situational Judgment 61, 237–241.
Fleenor, J.W. (1996) Constructs and Developmental Assess- examination of predictive validity. Personnel Psychology, 42,
ment Centers: Further troubling empirical findings. Journal 37–52.
of Business and Psychology, 10, 319–333. Moser, K., Schuler, H. and Funke, U. (1999) The Moderating
Fox, S., Levonai-Hazak, M. and Hoffman, M. (1995) The Role Effect of Raters’ Opportunities to Observe Ratees’ Job
of Biodata and Intelligence in the Predictive Validity of Performance on the Validity of an Assessment Centre.
Assessment Centres. International Journal of Selection and International Journal of Selection and Assessment, 7, 133–141.
Assessment, 3, 20–28. Nowack, K.M. (1997) Congruence Between Self–Other Rat-
Goffin, R.D., Rothstein, M.G. and Johnston, N.G. (1996) ings and Assessment Center Performance. Journal of Social
Personality Testing and the Assessment Center: Incremen- Behavior and Personality, 12, 145–166.
tal validity for managerial selection. Journal of Applied Pynes, J. and Bernardin, H.J. (1992) Entry-Level Police Selec-
Psychology, 81, 746–756. tion: The assessment center is an alternative. Journal of
Goldstein, H.W., Yusko, K.P., Braverman, E.P., Smith, D.B. and Criminal Justice, 20, 41–55.
Chung, B. (1998) The Role of Cognitive Ability in the Robertson, I. (1999) Predictive Validity of the General Fast Stream
Subgroup Differences and Incremental Validity of Selection Process. Unpublished Validity Report, School of
Assessment Center Exercises. Personnel Psychology, 51, Management, UMIST.
357–374. Russell, C.J. and Domm, D.R. (1995) Two Field Tests of an
Gomez, J.J. and Stephenson, R.S. (1987) Validity of an Assess- Explanation of Assessment Centre Validity. Journal of Occu-
ment Center for the Selection of School-Level Adminis- pational and Organizational Psychology, 68, 25–47.
trators. Educational Evaluation and Policy Analysis, 9, 1–7. Schmitt, N., Schneider, J.R. and Cohen, S.A. (1990) Factors
Higgs, M. (1996) The Value of Assessment Centres. Selection Affecting Validity of a Regionally Administered Assessment
and Development Review, 12, 2–6. Center. Personnel Psychology, 43, 1–12.
Hoffman, C.C. and Thornton, G.C., III. (1997) Examining Thomas, T., Sowinski, D., Laganke, J. and Goudy, K. (2005) Is the
Selection Utility Where Competing Predictors Differ in Assessment Center Validity Paradox Illusory? Paper Presented at
Adverse Impact. Personnel Psychology, 50, 455–470. the 20th Annual Conference of the Society for Industrial and
Jones, R.G. and Whitmore, M.D. (1995) Evaluating Develop- Organizational Psychology, Los Angeles, CA, April.
mental Assessment Centers as Interventions. Personnel Tziner, A., Meir, E.I., Dahan, M. and Birati, A. (1994) An
Psychology, 48, 377–388. Investigation of the Predictive Validity and Economic Utility
McEvoy, G.M. and Beatty, R.W. (1989) Assessment Centres of the Assessment Center for the High-Management Level.
and Subordinate Appraisals of Managers: A seven year Canadian Journal of Behavioral Science, 26, 228–245.