Meta Analysis of Big 5 Personality Traits and Job Performance
Prior meta-analyses investigating the relation between the Big 5 personality dimensions and job
performance have all contained a threat to construct validity, in that much of the data included within
these analyses was not derived from actual Big 5 measures. In addition, these reviews did not address the
relations between the Big 5 and contextual performance. Therefore, the present study sought to provide
a meta-analytic estimate of the criterion-related validity of explicit Big 5 measures for predicting job
performance and contextual performance. The results for job performance closely paralleled 2 of the
previous meta-analyses, whereas analyses with contextual performance showed more complex relations
among the Big 5 and performance. A more critical interpretation of the Big 5-performance relationship
is presented, and suggestions for future research aimed at enhancing the validity of personality predictors
are provided.
During the several decades prior to the 1990s, the use of 1990), researchers in the early 1990s began to adopt this Big Five
personality testing in employee selection was generally looked framework for selection research (Barrick & Mount, 1991; Tett,
down on by personnel selection specialists. This was primarily due Jackson, & Rothstein, 1991).
to pessimistic conclusions drawn by researchers such as Guion and Early meta-analytic work by Barrick and Mount (1991) and Tett
Gottier (1965) in their qualitative review of the personality testing et al. (1991) provided evidence suggesting that the Big Five might
literature and by Schmitt, Gooding, Noe, and Kirsch (1984) in their have some degree of utility for selecting employees into a variety
quantitative meta-analysis of various personnel selection tech- of jobs. In both of these reviews, the researchers used studies that
niques. The general conclusion drawn by these researchers was provided correlations between any type of personality variable and
that personality tests did not demonstrate adequate predictive job performance, categorizing the various personality variables
validity to qualify their use in personnel selection. In fact, Schmitt into one of the Big Five dimensions to estimate the strength of
et al. (1984) found that personality tests were among the least valid these variables' correlation with job performance. Although their
types of selection tests, with an overall mean sample-size weighted results were not altogether consistent (see Ones et al., 1994, and
correlation of .21 for predicting job performance, and concluded Tett, Jackson, Rothstein, & Reddon, 1994, for a discussion of
that "personality tests have low validity" (p. 420). reasons), the general consensus drawn by researchers and practi-
Over the past several years, however, there has been an in- tioners was that personality does in fact hold some utility as a
creased sense of optimism regarding the utility of personality tests predictor of job performance. The impact of these studies on
in personnel selection (Behling, 1998; Goldberg, 1993; Hogan, raising the status of personality tests in employee selection has
Hogan, & Roberts, 1996; Hogan & Ones, 1997; Mount & Barrick, been felt throughout the 1990s. Subsequent meta-analyses by
1995). In recent years, researchers have suggested that the true Mount and Barrick (1995) and Salgado (1997) have seemed to
predictive validity of personality was obscured in earlier research solidify this newfound status granted to personality, particularly to
by the lack of a common personality framework for organizing the Conscientiousness. Behling (1998), for example, recently claimed
traits being used as predictors (Barrick & Mount, 1991; Hough, Conscientiousness as one of the most valid predictors of perfor-
1992; Mount & Barrick, 1995; Ones, Mount, Barrick, & Hunter, mance for most jobs, second only to general intelligence.
1994). With increasing confidence in the robustness of the five- Much of the recent enthusiasm for the Big Five in personnel
factor model of personality (Digman, 1990; Goldberg, 1993; John, selection has been based on this body of meta-analytic work,
especially the original work of Barrick and Mount (1991). In fact,
on the basis of this work, most researchers seem satisfied to
dominant personality framework for personnel selection, we feel it sions. When faced with multiple scales categorized into the same
would be beneficial to meta-analyze this body of research in which dimension from a single study, Barrick and Mount (1991) entered
actual measures of the Big Five were correlated with job perfor- the average correlation across these scales into their meta-analysis.
mance. Third, given recent developments in the research explicat- Tett et al. (1991) entered the average absolute value correlation in
ing the job performance criterion domain (e.g., Borman & Moto- such instances. As Mount and Barrick (1995) noted, using the
widlo, 1993, 1997; Motowidlo & Van Scotter, 1994; Van Scotter average correlation underestimates the validity of the higher order
& Motowidlo, 1996), we feel it would be beneficial to meta- construct to which these scales purportedly belong. Instead, a
analytically explore the relations between the Big Five and these composite score correlation should be computed to reflect the
various dimensions of job performance. correlation between the sum of the lower order constructs and the
criterion. Mount and Barrick (1995) and Salgado (1997) used this
Methodological and Statistical Issues in Past Reviews composite-score correlation procedure and demonstrated a result-
ing increase in the estimated validities of the Big Five. However,
With respect to prior meta-analytic work examining the utility the fact still remains that these are only estimates of the validities
of the Big Five in personnel selection, we feel that there are two of actual Big Five measures, because these researchers' studies did
main weaknesses in these reviews that need to be addressed prior not exclusively include correlations from actual Big Five mea-
to making conclusions about the use of personality for personnel sures. Thus, the degree to which these meta-analyses have pro-
selection. First, it appears that all four major meta-analyses pub- vided accurate estimates of the "true" validities of the actual Big
lished up to this point (Barrick & Mount, 1991; Mount & Barrick, Five remains to be seen.
1995; Salgado, 1997; Tett et al., 1991) contain a potential threat to If we accept these previous estimates of the relation between the
construct validity resulting from the methods the researchers used Big Five and job performance, our second concern then centers
to derive their meta-analytic estimates of criterion-related validity. around the overwhelmingly positive interpretation of these esti-
This threat stems from the fact that these validity coefficients were mates. As we mentioned previously, Schmitt et al. (1984) sug-
largely based on studies that used measures that were not designed gested that a mean sample-size weighted observed correlation of
to explicitly measure the Big Five personality dimensions. Instead, .21 for personality, averaged across various personality scales
all four of these reviews were based on data from a diverse without a unified framework, indicated that personality has low
collection of non-Big Five measures that were classified post hoc validity for predicting job performance. Consistent with this con-
into the Big Five categories. Although these were gallant efforts at clusion, the selection community generally looked down on the use
addressing the relation between the Big Five and job performance of personality as a means of predicting job performance. We find
given the limited data available in the literature at that time, this it curious that a number of years later, after the Big Five frame-
post hoc classification procedure has raised some concern in the work was adopted in subsequent meta-analyses, there have been
personnel selection research community over the validity of the such positive conclusions concerning the criterion-related validity
results obtained in these past reviews (Hogan et al., 1996; Ones et of Conscientiousness, given that the mean sample-size weighted
al., 1994; Salgado, 1997; Tett et al., 1994). observed correlations for Conscientiousness were lower than that
The central issues concerning this classification procedure are found by Schmitt et al. (1984; the mean sample-size weighted
the suboptimal levels of interrater agreement in the classification observed fs for Conscientiousness ranged from . 10, Salgado, 1997,
of the various personality scales into the Big Five dimensions and to .18, Mount & Barrick, 1995, in these later meta-analyses). In
the misclassification of some scales into these dimensions. An fact, even Barrick and Mount's (1991) estimate of the true corre-
inspection of the methods reported by both Barrick and Mount lation for Conscientiousness, after corrections for range restriction
(1991) and Tett et al. (1991) reveals that the level of interrater and unreliability in both the predictors and criteria, was approxi-
agreement achieved within each of these reviews is not entirely mately equal to Schmitt et al.'s uncorrected estimate. Despite these
satisfactory. For example, Barrick and Mount (1991) reported only facts, these later reviews met with immediate enthusiasm for the
83% or better rater agreement on 68% of the classifications, potentially valuable role of Conscientiousness in selection.
suggesting less than desirable interrater agreement. In light of such In our view, this enthusiasm has resulted from two forces. First,
difficulties in agreeing on scale classifications, it is not entirely from a theoretical perspective, the Conscientiousness construct
unlikely that errors may have been made in these classifications. does seem to be logically related to job performance. It makes
As evidence of this problem, Hogan et al. (1996) found that a intuitive sense that individuals who have characteristic tendencies
number of errors had been made in how scales were classified in to be dependable, careful, thorough, and hardworking should be
these early meta-analyses. Additionally, Salgado (1997) indicated better performers on the job. It is therefore understandable that so
that the same scales had been classified into different categories by much interest has arisen in this construct as it relates to employee
the different groups of researchers when they conducted their selection. Schmitt et al. (1984), on the other hand, had no specific
separate meta-analyses. He suggested that this situation arose construct to point to in their analysis, as their validity coefficient
because there is a degree of ambiguity about how several scales was obtained by combining results across a variety of personality
map onto the Big Five, making it difficult to assign them exclu- variables with no attempt at categorization.
sively to one dimension (Salgado, 1997). These facts raise some Second, we believe that these validity coefficients for the Big
questions about the accuracy of the classifications and about the Five have often been interpreted in relative rather than in absolute
degree to which the meta-analytic findings map onto the actual Big terms. That is, in these meta-analyses (with the exception of Tett
Five constructs. et al., 1991), Conscientiousness has emerged as the most valid of
An issue that is related to the classification of scales is the the Big Five, and this has often been interpreted as indicating that
methods used for aggregating validity coefficients within dimen- Conscientiousness is valid in an absolute sense. On the contrary,
three of the meta-analyses present estimated true correlations for yses showing the relations between the other Big Five dimensions
Conscientiousness ranging from .15 to .22 (including statistical and those various criteria. Finally, Tett et al. (1991) performed no
corrections for range restriction, predictor unreliability, and crite- moderator analyses for criterion types but instead included only
rion unreliability)—correlations that do not fare extremely well correlations computed between personality scales and the criterion
when compared to absolute standards that have been used in dimensions they were hypothesized to predict, and their results
related research. A meta-analysis by laffaldano and Muchinsky were more positive in terms of the impact of Big Five factors other
(1985), for example, obtained a correlation of .17 between job than Conscientiousness on job performance.
satisfaction and job performance; this finding has been widely The findings of Tett et al. (1991) and Mount and Barrick (1995)
cited as indicating that there is no meaningful relationship between do provide some evidence that the link between the Big Five and
these constructs. Similarly, Cohen (1988) suggested .20 as an job performance might be more complex than has recently been
approximate standard that should be met for relationships between suggested, in that their degrees of validity depend on careful
constructs to be considered meaningful. Furthermore, as we noted selection of theoretically relevant criterion dimensions. Recent
previously, Schmitt et al. (1984) concluded that a correlation of .21 work by Motowidlo and Van Scotter (1994; Van Scotter & Mo-
was too low to consider personality a useful predictor of job towidlo, 1996) has likewise indicated that the Big Five have
performance. Finally, Mount and Barrick (1995) raised the stan- differing relations with theoretically linked dimensions of job
dards even further by suggesting that validities below .30 are performance within the task-versus-contextual distinction expli-
questionable, given the wide range of more valid predictors we cated by Borman and Motowidlo (1993, 1997). This body of work
have to choose from. has suggested that personality predictors should have their largest
If we were to adopt this .30 standard, only Mount and Barrick impact on contextual dimensions of job performance. Van Scotter
(1995) have provided evidence that Conscientiousness may be a and Motowidlo (1996) showed further that Extra version and
valid predictor of job performance in an absolute sense. Whereas Agreeableness were more strongly related to the interpersonal
Barrick and Mount (1991) and Salgado (1997) found the estimated facilitation component of contextual performance than they were
true correlations between Conscientiousness and job performance to task performance. Although the magnitudes of these correlations
to be .22 and .25, respectively, Mount and Barrick (1995) found an were rather small, this finding does suggest that perhaps the Big
estimated overall true validity of .31. It is likely that this higher Five dimensions other than Conscientiousness take on importance
true validity is due to Mount and Barrick's use of composites score for predicting certain dimensions of job performance—a finding
correlations, as we discussed previously (Mount & Barrick, 1995). that may have been masked in the earlier meta-analyses. Thus, we
However, Salgado's lower estimate of .25 was based on the use of feel that the body of meta-analytic evidence relating the Big Five
composite score correlations as well and was also based on cor- to job performance would benefit from an exploration of their
rections for predictor unreliability that Mount and Barrick did not differential relations with task performance and the dimensions of
perform. Thus, in our view, these findings still do not give defin- contextual performance.
itive estimates of the true validities of explicit Big Five measures
and do not allow for confident conclusions regarding the validity Summary and Purpose
of Conscientiousness in an absolute rather than a relative sense. At
best, they indicate a low to moderate criterion-related validity for In summary, we are suggesting that the current body of meta-
Conscientiousness, despite recent enthusiasm that seems to suggest analytic work investigating the Big Five as predictors of job
a much stronger role for Conscientiousness in personnel selection performance contains some deficiencies that can now be ad-
(e.g., Behling, 1998). dressed. One major deficiency, in our view, is that all four of the
previous meta-analyses (i.e., Barrick & Mount, 1991; Mount &
Developments in the Explication of the Job Performance Barrick, 1995; Salgado, 1997; Tett et al., 1991) suffer a potential
Criterion Domain threat to construct validity in terms of the degree to which their
predictors map onto the actual Big Five personality dimensions.
Another potential area in which the current body of meta- This methodological deficiency may have led to inaccurate esti-
analytic work can be improved on is the treatment of the criterion mates of the true relation between the Big Five and job perfor-
domain. Barrick and Mount (1991) performed a number of mod- mance. The current body of meta-analytic work in this area has
erator analyses for different types of criterion measures, and the provided general hypotheses about the strength of relation between
most clear finding was that their indicators of Conscientiousness the actual Big Five dimensions and job performance, suggesting
had a somewhat greater impact on subjective ratings than on that actual Big Five measures of Conscientiousness can be ex-
various types of objective ratings. The results for the other Big pected to produce criterion-related validities that are low to mod-
Five dimensions were less clear. Salgado (1997) split the criterion erate in magnitude.
domain into subjective ratings, personnel data, and training criteria In addition to overcoming this deficiency, we believe an explo-
and again found Conscientiousness to have a somewhat higher ration of the criterion-related validity of the Big Five for task
impact on subjective ratings than on objective criteria. Mount and versus contextual dimensions of job performance would aid in
Barrick (1995) were more careful to separate out dimensions of furthering this area of research. Motowidlo and Van Scotter (1994;
performance criteria that were theoretically meaningful with re- Van Scotter & Motowidlo, 1996) have begun to present evidence
spect to their relation with Conscientiousness, and they did find a in support of Big Five factors having differential validity with
pattern of differences showing Conscientiousness to relate to "will these different components of job performance. Thus, the purpose
do" or motivational factors more strongly than to "can do" or of the current study is both to meta-analytically summarize the
ability factors. However, Mount and Barrick did not present anal- body of research that has developed in recent years where actual
measures of the Big Five were used as predictors of job perfor- classifiable into one of these categories because of mixed samples or
mance and to test the criterion-related validities of the Big Five for inadequate information. These studies were therefore excluded from this
theoretically relevant dimensions of job performance. set of moderator analyses.
Criterion type. The type of criterion measure used when examining the
predictive validity of the Big Five was also coded as a potential moderator
Method of the personality-job performance relationship. The criterion domain was
analyzed in two separate ways. First, a two-category classification scheme
Literature Search was used, with the various criteria categorized as either measures of job
We used four separate methods to obtain validity coefficients for the proficiency or measures of training proficiency. Approximately 93% (42
present review. First, we conducted a computer-based literature search in out of 45) of the correlations were based on job proficiency criteria, and 37
PsycLit (1974-1996) and ERIC (1966-1996) using the key words person- of these were based on subjective ratings of job performance. Previous
ality and job performance, personality and training performance, five meta-analyses have analyzed subjective and objective performance mea-
factor model, and the Big Five. Second, we conducted a manual search in sures separately; in our data set, the objective analysis would have con-
sisted entirely of objective sales data, making it a subset of studies from the
the following journals for the period of time from 1985 to 1998: Academy
moderator analysis of the sales occupation. We therefore decided to ex-
such information and then computing the mean sample-size weighted Results
correlation across these studies. For the separate analyses of task and
contextual performance dimensions, we used the same procedure for com- Overall Validity Coefficients
bining correlations from a single sample that were based on multiple rating
scales classified into a common dimension. Table 1 presents the results of the omnibus meta-analysis
When conducting the actual meta-analysis, we used the Hunter-Schmidt across occupations and performance criteria. These analyses
validity generalization framework (Hunter & Schmidt, 1990). Using this were based on a range of 35-45 correlations and 5,525-8,083
framework, we obtained the mean sample size-weighted correlations, the job applicants and incumbents. The mean sample-size weighted
estimated true or operational validities corrected for sampling error, range correlations (r) ranged from .04 to .14 across dimensions and
restriction, criterion unreliability, and the estimated true-score correlations
are substantially lower than the mean correlation of .21 found
with additional corrections for predictor unreliability.
As in the previous meta-analyses we have just reviewed, these correc-
by Schmitt et al. (1984) and very similar to those found by
tions had to be made by way of artifact distributions because of a low rate Barrick and Mount (1991; ranging .03-. 13) and Salgado (1997;
of reporting the statistics that are necessary for applying corrections to the ranging .01-. 10). The estimated true validities (pv) for explicit
individual coefficients. Two artifact distributions were created for the measures of the Big Five ranged from .06 to .20, and the
criterion reliabilities. For analyses in which only subjective ratings of estimated true-score correlations (pc) ranged from .07 to .22.
performance were involved, we created a distribution by augmenting the Consistent with Barrick and Mount (1991) and Salgado (1997),
few interrater reliability coefficients obtained from our sample of studies the highest validity of the Big Five dimensions was that for
with those presented in Rothstein (1990). This distribution had a mean Conscientiousness (pv = .20), which demonstrated a low to
criterion reliability of .53 (SD = .15). For those analyses in which a moderate level of validity. The 90% credibility interval for this
combination of objective and subjective criteria were used, we added to the
dimension did not include zero, suggesting the absence of
previous distribution the reliabilities presented in our sample for objective
criteria and those presented by Hunter, Schmidt, and Judeisch (1990).
moderators in this estimate of the true validity (Hunter &
Adding these reliabilities created a distribution with a mean of .59 (SD = Schmidt, 1990; Whitener, 1990). Emotional Stability also had a
.19). Although this combined distribution is weighted rather heavily with credibility interval that was greater than zero, although its
subjective ratings, this is entirely consistent with the fact that approxi- estimated true validity was substantially lower (pv = .13).
mately 90% of the criteria in our sample of studies were subjective in
For corrections for predictor unreliability, we created separate artifact
Validity Coefficients by Occupation
distributions for each of the Big Five dimensions by augmenting the Table 2 presents the results of the moderator analysis for the
reliability estimates provided in our sample of studies with those from the
occupational categories. Despite the lack of moderators indicated
inventory manuals. This provided distributions with mean predictor reli-
abilities ranging from .76 (SD = .08; Agreeableness) to .86 (SD = .04; by the credibility intervals for Conscientiousness and Emotional
Emotional Stability). For range restriction corrections, we found very few Stability in the omnibus analysis, we carried out all moderator
unrestricted standard deviations reported in the studies for computing the « analyses for each of the Big Five for the sake of comparison. For
values. Thus, we used two strategies for obtaining unrestricted standard all four of the occupational categories, Conscientiousness exhib-
deviations. First, we attempted to contact the authors of the inventories to ited the highest estimated true validity. It is interesting to note that
obtain standard deviations from unrestricted samples of applicants. Second, despite the indication of no moderators for Conscientiousness, the
following Salgado's (1997) strategy, we used standard deviations provided estimated true validity for this dimension ranged from .15 to .26
in the inventory manuals as the unrestricted values. As we did not have across occupations. Its highest validities were for sales (pv = .26)
enough information to create reliable separate distributions for each of the and customer service (pv = .25) jobs. The magnitudes of these
five dimensions, we created a single artifact distribution for use in all our
validities are moderate, and those for the remaining Big Five
analyses. This distribution of u values had a mean of .92 (SD = .27).
Overall, our artifact distributions were very similar to those used in the dimensions remained low across all occupations.
previous meta-analyses. Corrections based on these distributions were It is noteworthy, however, that some of the low validities for the
conducted interactively using software described by Hunter and Schmidt other Big Five dimensions appear to be rather stable, in that their
(1990), on the basis of the recommendations of Law, Schmidt, and Hunter credibility intervals fall above zero. For sales jobs, Emotional
(1994). Stability (pv = .13) and Extraversion (pv = .15) appear to have
Table 1
Overall Validity Coefficients by Personality Dimension
Conscientiousness 45 8,083 .14 .0161 .0054 .0016 .0091 44 .22 .20 .14 .03
Emotional Stability 37 5,671 .09 .0084 .0065 .0007 .0013 85 .14 .13 .05 .06
Agreeableness 40 6,447 .07 .0108 .0062 .0005 .0041 62 .13 .11 .09 -.01
Extraversion 39 6,453 .06 .0111 .0060 .0004 .0047 57 .10 .09 .10 -.04
Openness to Experience 35 5,525 .04 .0093 .0064 .0002 .0028 70 .07 .06 .08 -.04
Note, k = number of validity coefficients; N = total sample size; r = sample-size weighted mean observed validity; S2 = total observed variance in r,
S2 = variance due to sampling error; S^,eas = variance due to measurement artifacts; S;res = residual variance; % VE = percentage of variance accounted
for by sampling error and measurement artifacts; pc = true-score correlation; pv = true (operational) validity; SDpv = standard deviation of true validity;
CV = credibility value (lower bound of credibility interval for pv).
Table 2
Validity Coefficients for Personality Dimensions by Occupational Category
Conscientiousness 10 1,369 .18 .0117 .0069 .0026 .0021 82 .29 .26 .07 .17
Emotional Stability 7 799 .09 .0082 .0087 .0007 .0000 115 .15 .13 .00 .13
Agreeableness 8 959 .03 .0098 .0084 .0001 .0013 87 .06 .05 .05 -.02
Extraversion 8 1,044 .10 .0117 .0076 .0009 .0033 72 .16 .15 .08 .04
Openness to Experience 6 732 .03 .0150 .0083 .0001 .0067 55 .04 .04 .12 -.12
Customer service
Conscientiousness 12 1,849 .17 .0121 .0062 .0023 .0036 70 .27 .25 .09 .13
Emotional Stability 10 1,614 .08 .0052 .0062 .0006 .0000 129 .13 .12 .00 .12
Agreeableness 11 1,719 .11 .0038 .0063 .0011 .0000 193 .19 .17 .00 .17
Extraversion 10 1,640 .07 .0117 .0061 .0004 .0052 56 .11 .11 .11 -.03
Openness to Experience 9 1,535 .10 .0043 .0058 .0010 .0000 158 .17 .15 .00 .15
Conscientiousness 4 495 .11 .0451 .0079 .0011 .0361 20 .19 .17 .28 -.19
Emotional Stability 4 495 .08 .0088 .0080 .0006 .0002 98 .13 .12 .02 .10
Agreeableness 4 495 -.03 .0040 .0081 .0001 .0000 205 -.04 -.04 .00 -.04
Extraversion 4 495 .08 .0045 .0080 .0006 .0000 192 .13 .12 .00 .12
Openness to Experience 4 495 -.02 .0111 .0081 .0000 .0029 74 -.03 -.03 .08 -.13
Note, k = number of validity coefficients; N = total sample size; r = sample-size weighted mean observed validity; 5? = total observed variance in r;
$1 = variance due to sampling error; S^,eas = variance due to measurement artifacts; S^es = residual variance; % VE = percentage of variance accounted
for by sampling error and measurement artifacts; pc = true-score correlation; pv = true (operational) validity; SDpv = standard deviation of true validity;
CV = credibility value (lower bound of credibility interval for pv).
low but stable true validities. This same general pattern emerged although Extraversion (pv = .17) and Agreeableness (pv = .18)
for managerial jobs, although the small number of studies (k = 4) had the highest validities.
located for estimating this true validity may render this finding Table 4 shows the separate analyses of the Big Five as predic-
tenuous. Customer service jobs appear more complex in that tors of task performance, job dedication, and interpersonal facili-
Emotional Stability (pv = .12), Agreeableness (pv = .17), and tation. Recent research and theory explicating the dimensionality
Openness to Experience (pv = .15) exhibited rather low but stable of the job performance domain has suggested that personality
true validities. This may indicate a somewhat more complex pat- should predict the contextual performance dimensions of job ded-
tern of relationships between personality and performance in jobs ication and interpersonal facilitation more strongly than task per-
that involve interpersonal interactions than is captured solely by formance does (Borman & Motowidlo, 1993, 1997; Motowidlo &
assessing Conscientiousness. In contrast, the true validity esti- Van Scotter, 1994; Van Scotter & Motowidlo, 1996). Our analyses
mates for skilled and semiskilled jobs, which may often involve a show that Conscientiousness predicted all three criteria with ap-
smaller interpersonal component of performance, tended to be proximately the same level of true validity (pv = .15-. 18), al-
rather small across all of the Big Five, and these validities appear though the credibility intervals indicate that this true validity was
rather unstable in light of their credibility intervals. only stable for the interpersonal facilitation criterion. Emotional
Stability appeared to have a low but very stable true validity across
Validity Coefficients by Criterion Type these three criteria (pv = . 13-. 16). For the interpersonal facilitation
Table 3 presents the results of the moderator analysis for the criterion, Agreeableness (pv = .17) rivaled both Conscientiousness
separate predictions of job proficiency and training proficiency. (pv = .16) and Emotional Stability (pv = .16) in its estimated true
For job proficiency, virtually the same pattern and magnitude of validity. This supports Van Scotter and Motowidlo's finding that
validities emerged as was found in the omnibus analysis, which is although Agreeableness does not influence task performance, it
not surprising given the fact that over 90% of the individual does appear to influence ratings of interpersonal facilitation. It
correlations across dimensions involved job proficiency criteria. should be noted, however, that none of these analyses for the task
The small number of correlations summarized for training profi- and contextual performance criteria revealed stronger true validi-
ciency renders interpretation of the true validity estimates tenuous, ties than did the overall performance analyses.
Table 3
Validity Coefficients for Personality Dimensions by Criterion Type
Job performance
Conscientiousness 42 7,342 .15 .0148 .0055 .0019 .0074 50 .24 .22 .13 .06
Emotional Stability 35 5,027 .09 .0089 .0069 .0007 .0013 85 .15 .14 .05 .07
Agreeableness 38 5,803 .07 .0111 .0065 .0004 .0042 62 .12 .10 .10 -.02
Extraversion 37 5,809 .06 .0118 .0064 .0003 .0051 57 .09 .09 .11 -.05
Openness to Experience 33 4,881 .03 .0097 .0068 .0001 .0028 71 .06 .05 .08 -.05
Training performance
Conscientiousness 3 741 .02 .0145 .0041 .0000 .0104 28 .03 .03 .15 -.16
Emotional Stability 2 644 .06 .0030 .0031 .0003 .0000 111 .09 .08 .00 .08
Agreeableness 2 644 .12 .0049 .0030 .0013 .0006 88 .21 .18 .04 .13
Extraversion 2 644 .12 .0020 .0030 .0012 .0000 207 .19 .17 .00 .17
Openness to Experience 2 644 .08 .0042 .0031 .0007 .0005 88 .14 .13 .03 .08
Note, k = number of validity coefficients; N = total sample size; f = sample-size weighted mean observed validity; S? = total observed variance in r;
Si = variance due to sampling error; S^,eas = variance due to measurement artifacts; Sjes = residual variance; % VE = percentage of variance accounted
for by sampling error and measurement artifacts; pc = true-score correlation; pv = true (operational) validity; SDpv = standard deviation of true validity;
CV = credibility value (lower bound of credibility interval for pv).
Discussion the highest validity of the Big Five dimensions for overall job
performance. Furthermore, our estimated true-score correlation of
The main purpose of this study was to provide a confirmatory .22 (and true validity of .20) was virtually identical in magnitude
meta-analysis of the relation between the Big Five and job perfor- to Barrick and Mount's estimate. This finding alleviates concern
mance by including only scales that were explicitly designed to that Barrick and Mount's heavily cited results underestimated the
measure the Big Five personality dimensions. Our overall results overall true validity of the Conscientiousness dimension as a result
were highly consistent with the original work of Barrick and of their categorization procedures. On the other hand, our findings
Mount (1991), in that Conscientiousness was again found to have indicate that at least for single-scale, global Big Five measures, the
Table 4
Validity Coefficients for Personality Dimensions by Criterion Dimension
Task performance
Conscientiousness 12 2,197 .10 .0138 .0054 .0008 .0076 45 .16 .15 .13 -.02
Emotional Stability 8 1,243 .09 .0015 .0064 .0007 .0000 463 .14 .13 .00 .13
Agreeableness 9 1,754 .05 .0090 .0051 .0002 .0037 59 .08 .07 .09 -.05
Extraversion 9 1,839 .04 .0052 .0049 .0002 .0001 98 .07 .06 .02 .04
Openness to Experience 7 1,176 -.01 .0237 .0060 .0000 .0177 25 -.01 -.01 .20 -.26
Job dedication
Conscientiousness 17 3,197 .12 .0203 .0052 .0013 .0139 32 .20 .18 .17 -.04
Emotional Stability 15 2,581 .09 .0059 .0058 .0007 .0000 109 .14 .13 .00 .13
Agreeableness 17 3,197 .06 .0096 .0053 .0003 .0040 59 .10 .08 .09 -.03
Extraversion 16 3,130 .03 .0111 .0051 .0001 .0059 47 .05 .05 .11 -.10
Openness to Experience 14 2,514 .01 .0108 .0056 .0000 .0052 52 .01 .01 .11 -.13
Interpersonal facilitation
Conscientiousness 23 4,301 .11 .0083 .0053 .0010 .0020 76 .18 .16 .07 .07
Emotional Stability 21 3,685 .10 .0046 .0056 .0010 .0000 142 .17 .16 .00 .16
Agreeableness 23 4,301 .11 .0117 .0052 .0012 .0053 55 .20 .17 .11 .03
Extraversion 21 4,155 .06 .0105 .0050 .0004 .0051 52 .11 .10 .11 -.04
Openness to Experience 19 3,539 .03 .0075 .0054 .0001 .0020 73 .05 .05 .07 -.04
Note, k = number of validity coefficients; N = total sample size; r = sample-size weighted mean observed validity; S? = total observed variance in f;
Sj = variance due to sampling error; S^eas = variance due to measurement artifacts; S^s = residual variance; % VE = percentage of variance accounted
for by sampling error and measurement artifacts; pc = true-score correlation; pv = true (operational) validity; 5Dpv = standard deviation of true validity;
CV = credibility value (lower bound of credibility interval for pv).
validity estimates for Conscientiousness provided by Mount and What degree of utility do these global Big Five measures offer
Barrick (1995) and Salgado (1997) appear to be overestimates. We for predicting job performance? Overall, it appears that global
offer from our results an estimated true criterion-related validity of measures of Conscientiousness can be expected to consistently add
.20 for actual Big Five measures of Conscientiousness. a small portion of explained variance in job performance across
It is also noteworthy that Emotional Stability shows rather jobs and across criterion dimensions. In addition, for certain jobs
consistent (although low) levels of criterion-related validity. In and for certain criterion dimensions, certain other Big Five dimen-
addition, the separate analyses for the different occupational cat- sions will likely add a very small but consistent degree of ex-
egories provide a more complex picture of the validities of the Big plained variance. If the global Big Five measure is uncorrelated
Five than do prior reviews, in that the dimensions beyond Con- with the other predictors that are currently used for a job (e.g.,
scientiousness begin to show low but rather stable validities for personality tends to be uncorrelated with cognitive ability; Day &
certain occupations. In particular, for jobs involving customer Silverman, 1989; Rosse, Miller, & Barnes, 1991), then even this
service, Agreeableness, Openness to Experience, and Emotional small incremental explained variance can, under certain circum-
Stability had low levels of validity (pvs ranging .12-. 17) but zero stances, make a practically significant contribution to predictive
the broad dimension level of the Big Five, the magnitude of these tently had equivalent or higher levels of criterion-related validity
correlations might be enhanced if the most relevant specific facets in comparison with employees' self-reports. Although the practice
of these broad dimensions could be specified. of using rating sources other than oneself is not likely to be
We suggest, then, that the Big Five framework and the patterns adopted in personnel selection practice, such alternative measure-
of small to moderate validities for these broad dimensions that ment methods could help gain a better understanding of the aspects
have begun to emerge should be used in future research to help of personality that affect performance.
guide the selection back "downward" toward somewhat narrower
personality facets with theoretical links to the performance dimen-
sions under investigation. If a broad, global performance criterion Limitations
is of interest, perhaps a global Conscientiousness scale will suffice
At least two limitations of the current meta-analysis should be
with a moderate level of validity. However, if multiple perfor-
pointed out. First, several of our moderator analyses were based on
mance dimensions such as those distinguishing task performance
a relatively small number of correlations, especially for the man-
from contextual performance, or perhaps those consistent with
