Military Classification
Military Classification
Michael G. Rumsey
Abstract
This chapter describes military selection and classification research in the United States from a
historical perspective. It describes the evolution of enlisted selection and classification measures from
Army Alpha and Beta in 1917 to current explorations into non-cognitive tools. Major developments,
such as the transition from service-specific test batteries to the joint service Armed Services
Vocational Aptitude Battery (ASVAB) and the joint service project to link enlistment standards to job
performance, are given special attention. Officer screening has evolved separately from enlisted
screening, and is given separate treatment in this chapter. Both enlisted testing and officer testing have
been characterized by a historical progression from fairly broad measures of cognitive ability to a more
comprehensive approach, involving not only an expansion of the cognitive components assessed, but
also an increasing attention to non-cognitive dimensions. While enlisted and officer testing have many
features in common, two themes that have received more emphasis in officer selection are the work in
identifying measures that predict aviation success, and the development of realistic assessment centers
to validate predictors of officer success. The success of the military in developing enlisted and officer
measures that predict important outcomes is a major chapter focus.
Keywords: Military, selection, classification, enlisted, officer, non-cognitive, ASVAB, validate, aviation,
performance
There are several characteristics of the military envi- status, the principal selection decision for officer
ronment that have contributed to the development candidates typically involves their selection into a
of the military selection and classification system as pre-commissioning training program, which they
it has existed in the past and exists today. The first is must successfully complete before becoming officers.
the division, if we leave aside for the moment the The second characteristic is the sheer volume of
special case of warrant officers, of the military popu- accessions, from hundreds of thousands to millions
lation into two principal categories—commissioned of members a year for enlisted personnel alone
officer and enlisted. With few exceptions, each indi- (Waters, 1997). This has a number of ramifications.
vidual remains in the category in which he or she is One is that the services have been able to invest con-
first placed. “Officers” do not become “enlisted,” siderable resources to ensure that the tools used in
and the number penetrating the barrier from this process represent the state of the art in testing
“enlisted” to “officer” is relatively small. Because methodology. Another ramification is that the mili-
these categories are so distinct, the selection and tary has found it efficient to set up special-purpose
classification processes for each are also distinct. testing centers across the country for the sole pur-
One of the major differences between these pro- pose of processing applicants for enlistment. The
cesses is that, while the enlisted selection process availability of these testing centers and the sizable
involves a direct transition from civilian to military quantity of applicants flowing through them daily
129
make the use of these centers for much of the match- the military has frequently played a leading role, as
ing of persons to jobs as well as selection a some- discussed briefly below.
what natural occurrence.
The third significant characteristic of this envi- Development and Assessment
ronment is the link between accessions and train- The measures used have often represented the high-
ing. Generally, applicants are not presumed to have est standards in terms of quality and relevance. The
any training in the job they will be performing when Army Alpha and Beta tests, developed in World
the selection decision is made. The military assumes War I, have been lauded as pioneering efforts in
the burden of training new enlistees and those who group cognitive testing (Zeidner & Drucker, 1988).
have been selected into a pre-commissioning pro- Later, the services developed highly sophisticated
gram. By and large, applicants enter at the bottom classification batteries, and have recently made sig-
rung. Lateral entry into a higher-status position is a nificant contributions to the science of personality
rare event. These factors contribute to the signifi- assessment. Developers have scrupulously applied
cance of the selection process. The services are advanced scientific principles to measurement
making a major commitment to the development of development. The military has often demonstrated
the individual selected, and any mistakes will be a greater appreciation of the need to validate its
costly. measures against some outcome of importance than
have many civilian organizations. This is probably
Scope of This Chapter due, at least in part, to cost–benefit considerations.
The current selection and classification procedures Validation is an expensive proposition, and for most
used by the military cannot be properly understood civilian organizations, difficult to support for the
without appreciation of the key historical develop- number of individuals screened on an annual basis.
ments that led to their implementation. Thus, this As noted earlier, the military screens an unusually
chapter will be historical in orientation, describing large number of applicants, so the cost of validation
enlisted and officer developments separately. As in this context is a necessary and justifiable expense,
noted above, enlisted screening is a massive enter- given the enormous difference between benefits
prise, conducted through the administration of a associated with a highly valid test and those associ-
joint service selection and classification test battery ated with a test of negligible validity. For a substan-
at a variety of locations. Officer screening is a much tial period of time, the “outcome of interest” in
more decentralized operation. Not only does each military validation research was almost exclusively
service have its own screening process, but also attrition or success in training.
within each service there are multiple processes. In
many cases, officer screening is dependent more job analysis
upon a whole-person evaluation than on scores on a This brings us to the topic of job analysis. Job analy-
selected set of tests. As the screening processes for sis is often used to help inform selection of which
the two groups are divergent, so has been the his- individual difference dimensions to test, but is a
torical evolution of these processes. The chapter will particularly critical step whenever job performance
proceed first with a discussion of enlisted screening, measures are used in the process of validating selec-
followed by an examination of officer screening. tion instruments. Although the military has played
A full treatment of military selection and classifi- a leading role in the development of job analysis
cation would incorporate the many significant techniques, these techniques have generally been
developments that have occurred outside the United designed more for use in training development than
States as well as those that have occurred within. in performance measurement.
However, such a treatment is far beyond what can Because of the early emphasis on the use of train-
be accomplished in a single chapter. The focus here ing success to validate selection and classification
will be limited to the United States. measures, for several decades little attention was
Certain themes will receive particular emphasis given to comprehensive analyses of job requirements
in this review, based on their relevance to the sophis- for purposes of developing performance measures.
tication, maturity, and comprehensiveness of the In order to identify appropriate content to predict
research and the screening systems generated by the performance in training, Maier (1993, p. 5) observed
research. These themes relate to the measures devel- that “[r]esearchers typically observe and talk to
oped and the procedures used to develop, assess, workers in the area and visit training programs.”
and administer those measures. In all these respects While these activities may be considered a form of
ru m s ey 131
officers, during the war (Zeidner & Drucker, 1988). that individuals with a high level of general mental
A pictorial version of the test battery, known as the ability were matched with the jobs that were judged
Army Beta, was also developed under Yerkes’s direc- to require such ability. However, during and after
tion. The development of the Army Alpha and Beta the war, that approach began to change. Classification
is generally viewed as the initial step in scientific came to mean a matching of particular aptitudes
screening of military personnel, as well as an his- with particular job requirements. Given the large
torical development in cognitive testing, which number of military jobs, the challenge of determin-
until then had been conducted in an individualized ing the best combination of tests for each job was
manner. Despite the massive administration of the enormous. This problem was somewhat simplified
tests, their use in selection and assignment during by the concept of clustering. Jobs that were found to
this period was sporadic. However, Zeidner and have similar requirements were clustered together
Drucker noted (p. 11) that: “Although the test and linked to the same set of tests.
scores were not universally accepted by Army man- In 1924, the Navy began using a verbal test, the
agement, they were allowed to play an important General Classification Test (GCT), to screen candi-
role in many decisions. For example, almost 8,000 dates for enlistment. During World War II, as the
recruits were recommended for immediate discharge Navy struggled with trying to classify nearly four
as mentally incompetent; another 8,000 were million enlisted personnel, the Navy General
assigned to special labor duty.” The Army Alpha Classification Test was not found to provide a suffi-
continued in use for over 25 years (Staff, Personnel cient basis for differentiating across specialties
Research Section, 1945). (Faulkner & Haggerty, 1947). Thus, it was supple-
World War II stimulated the next major push mented in 1943 with a Basic Test Battery (BTB),
forward in the development of enlisted screening which included more specialized tests, such as
tests. Initially, the emphasis was more on tests of Mechanical Aptitude and Radio Code Aptitude
general ability rather than specific aptitudes related (Odell, 1947). A version of the BTB consisting of
to job placement. The Army developed the Army the GCT, an Arithmetic Test, a Mechanical Test, a
General Classification Test (AGCT), a test of “gen- Clerical Test, and a Shop Practices Test, was linked
eral learning ability,” for administration to every with final grades across 47 schools. When the tests
“literate inductee” beginning in 1940 (Staff, were combined in a regression analysis, the level of
Personnel Research Section, 1945). The first form prediction achieved by the BTB (.57, corrected) was
consisted of vocabulary, arithmetic, and block- found to exceed that observed for the Armed
counting items. Despite the rather limited range of Services Vocational Aptitude Battery (ASVAB, .47,
content categories initially, the AGCT was used as a corrected), a test battery to be discussed later in this
classification test in the sense that it could “sort new chapter (Thomas, 1970). The BTB continued in use
arrivals” (Staff, Personnel Research Section, 1945, into the 1970s (Maier, 1993, p. 37).
p. 760) and help determine their qualifications for Between 1946 and 1948, the newly formed Air
various types of training regimens. In 1942, the Force developed its own Airman Classification
Marine Corps began using the AGCT and Battery. Earlier research by the Army Air Force in
Mechanical Aptitude Tests for classifying their new World War II focusing on officer selection (to be
recruits (Furer, 1959). In 1945, the Army developed described later in this chapter) provided the founda-
new forms of the AGCT, which contained four tion for this battery, but new tests were added,
subtests—reading and vocabulary, arithmetic com- focusing on enlisted jobs. Twelve aptitude tests,
putation, arithmetic reasoning, and pattern analy- covering such general abilities as word knowledge
sis. These forms provided the model for the Armed and more specific abilities such as aviation informa-
Forces Qualification Test (AFQT), which provided tion, were developed, along with a biographical
“an objective basis for the equitable distribution of inventory. Separate composites of tests, called
human resources” (Zeidner & Drucker, 1988, Aptitude Indices (AI) were developed for separate
p. 50) across the services. From 1950 through 1972, clusters of Air Force jobs. These tests were found to
all services used a common AFQT for selection. be highly reliable, and had a median validity of .61
for predicting technical course grades when cor-
Development of Classification Batteries rected for restriction in range (Weeks, Mullins, &
Before World War II, aside from the limited use of Vitola, 1975). In 1958, the Airman Classification
specialized tests for a few select occupations, the Battery was replaced with the Airman Qualifying
concept of “classification” generally meant ensuring Examination, which was similar but shorter in
ru m s ey 133
and interest inputs are intended to be used by this training performance was not particularly contro-
software in making person–job matches. A new versial. It soon became apparent that measures that
computer-administered interest measure, Job and could be relatively universally accepted as represent-
Occupational Interest in the Navy (JOIN) has been ing job performance were not readily available.
developed to improve the quality and precision of Researchers would have to develop and administer
the interest inputs (Farmer et al., 2003). such measures on their own. Once service members
had graduated from training and settled into their
ASVAB and the Job Performance job assignments, they were widely dispersed
Measurement Project throughout the world. Locating them and assessing
In the 1970s, the use of separate test batteries by the their performance would be an enormous effort.
different services was found to be administratively Congress instructed the Department of Defense
cumbersome and constraining. “To determine eligi- to link enlistment standards to job performance. An
bility for enlistment in different Services, an appli- early concept guiding this research, known as the
cant would have had to take the tests specific to each Job Performance Measurement project, was that job
Service” (Maier, 1993, p. 37). Thus, the services sample, or “hands-on,” testing would be the “bench-
combined their efforts to produce a joint service mark” performance measure. This approach was
selection and classification battery: the Armed tied to the view that “the predictor battery of inter-
Services Vocational Aptitude Battery (ASVAB). Full est (the ASVAB) was intended to predict only job
administration of this battery began in 1976 (Sands proficiency” (Knapp, 2006, p. 116). It was recog-
& Waters, 1997) and has continued to this day. nized that hands-on testing would not always be
Content categories and test content have changed feasible. This consideration gave rise to the idea of
over time, but at the outset, the tests used were very alternate measures that could be used if they could
much like those in the individual services’ classifica- be shown to represent reasonable “surrogates” to the
tion batteries, minus any interest inventory. hands-on benchmark. In fact, “[e]ach service was
The AFQT was maintained as a composite of the assigned a surrogate measurement method to evalu-
ASVAB tests that measured general ability. As tests ate as part of their research programs” (Knapp,
were revised, efforts were devoted to ensuring that 2006, p. 114).
score distributions in the new forms had roughly Within this general guidance, there was consider-
the same meaning as distributions in the old forms. able flexibility regarding how the services responded
Screening standards were linked to these score dis- to Congress’s instruction. The Air Force developed a
tributions. By 1980, it had become evident that procedure called “walk through performance test-
something was very wrong. Complaints were ing” which combined an interview and hands-on
received from the schoolhouses and the field about approach. Whereas a strictly hands-on test would
the performance of newly accessioned service mem- require the individual to actually perform a task, in
bers. Analyses revealed that the process of linking an interview the individual would “describe in detail
new scores to old had been flawed, and that the how he or she would perform” the task (Hedge &
result was, in effect, to unintentionally lower entry Teachout, 1992, p. 454). The administrator would
standards. The scoring problem was corrected, but then determine “whether the description of each
in the meantime hundreds of thousands of individ- step of the task was correct or incorrect” (Hedge &
uals had enlisted who, if the standards had been Teachout, 1992, p. 454). The Air Force supple-
applied as intended, might have been considered mented the walk-through with job knowledge and
ineligible (Laurence & Ramsberger, 1991; Waters, rating measures. The Navy used a hands-on test, a
1997). job-knowledge simulation test, and a set of rating
When the situation was explained to Congress, it scales (Laabs & Baker, 1989). The Marine Corps
brought national attention to the enlistment pro- used hands-on measures, job-knowledge tests, and
cess. Members wanted to know what the impact of file data (Carey, 1992).
enlistment standards was on job performance. The Army, like the other services, used multiple
Previous research had already demonstrated the link performance measures, but presented an alternative
between military tests and training performance. conceptualization of performance to the “bench-
However, assessing training performance was rela- mark” approach. The Army agreed that hands-on
tively easy compared to measuring job performance. tests provided useful performance information, but
To some extent, training grades were already acces- did not perceive the value of other measures to
sible for analysis. The question of what constituted depend only on the extent to which they could serve
ru m s ey 135
individual’s characteristics most accurately. Concerns Computer Adaptive Testing
about faking prevented the implementation of the When first developed, the ASVAB was administered
ASP, or of the separate ABLE or ASAP measures. in paper and pencil format. While this was all the
However, such concerns also stimulated research to available technology would support, it had many
develop and test safeguards against faking. In the disadvantages. The primary disadvantage was the
Army, the successor to the ABLE was the Assessment time required for administration, over three hours.
of Individual Motivation, or AIM (White & Young, Another disadvantage was that it was administratively
1998). The AIM was designed as a forced-choice cumbersome, requiring the printing and distribution
measure. The intent was to present options in such of test booklets and answer sheets. A third disadvan-
a way that the most desirable option was not obvi- tage was that it was inefficient and imprecise. The
ous. The individual was “forced” to choose between same test questions were administered to everyone,
two apparently desirable and two apparently unde- despite their ability level. Someone at a very high
sirable options, so the ultimate score would be more level would have to answer a number of very easy
likely to represent behavior related to job success questions, which contributed little if any new infor-
than behavior seen as socially desirable. mation about that person’s ability beyond the ques-
Initial research findings suggested that the tions closer to that individual’s ability level. Someone
approach was successful—that it inhibited faking at a very low level would have to answer a number of
without sacrificing validity (Young et al., 2000). very difficult questions, which similarly would pro-
The Army saw a potential application of AIM as a vide very little in terms of new information.
predictor of attrition of non–high-school-diploma Computerized adaptive testing (CAT) offered a
graduates. It has historically been observed that way to counter these disadvantages. The key was the
those without a high school diploma tend to wash “adaptive” part. The basic concept behind CAT was
out of the military at a far higher rate than those that, for every test taker, there was a “true score”
with a diploma (Department of Defense, 1985). reflecting their true ability on whatever attribute
Accordingly, the number of non-graduates to be was being measured, and that the object of the test
accepted into the services has been limited by was to pinpoint that true score. Each item adminis-
Department of Defense policy. One concern regard- tered provided some information about the indi-
ing this policy is that there are many reasons why vidual’s true score. Each item answered correctly
someone may leave the military. Lack of persistence, led to a more difficult item; each item answered
which failure to graduate from high school may par- incorrectly led to an easier item. Ultimately, the
tially represent, may be one of those reasons. individual’s ability level and the item-difficulty level
However, there may be a more direct way of mea- would converge and lead to an estimate of the indi-
suring one’s propensity to attrite that might allow vidual’s true score. Because a small number of care-
the military to salvage some non-graduates who do fully calibrated items could generate a good estimate
not have a high propensity to attrite. of that score, an adaptive test could be completed
Accordingly, a trial program was initiated within in a much shorter time than a non-adaptive one
the Army to identify promising non–high-school (Sands & Waters, 1997). This procedure was more
graduates using a combination of AIM, a body mass efficient and precise than a more conventional one.
index, and two ASVAB subtests. This new program Computerization made it possible to adapt item dif-
was known as the Tier Two Attrition Screen (TTAS). ficulty to individual ability, and also made printing
The initial results were disappointing. There was far and distribution of test materials and answer sheets
greater evidence of faking on the AIM when used in unnecessary.
this operational trial than there had been in a In 1979 the Department of the Navy was named
research context (Young et al., 2004). Researchers as Executive Agent for a project to develop a com-
went back to the drawing board to see if the prob- puterized adaptive version of the ASVAB (Martin &
lem could be fixed. What they found was that, while Hoshaw, 1997). Initially, a three-year project was
some AIM items performed worse in the opera- envisioned, but it soon became apparent that this
tional context, others worked just fine. Thus, with timeline was too ambitious for the challenges
some fine-tuning, they were able to improve the involved. The size and scope of military aptitude
AIM’s effectiveness (Young et al., 2004; Young & testing generated delivery system requirements and
White, 2006). The revised TTAS has been used in a technical challenges that were beyond the existing
new trial program now for several years, with con- state of the art. The types of microcomputers
siderable success. available at the beginning of the project were not
ru m s ey 137
other than those currently evaluated on the ASVAB motivational types of criteria in an experimental
deserved particularly close attention. These included administration (Heffner & White, 2009) and was
peer leadership, cognitive flexibility, and self-esteem approved for administration in the Military Entrance
(Knapp & Tremble, 2007). Processing Stations in an initial operational test and
The Department of Defense itself convened an evaluation (IOT&E). That is, it was to be used in a
ASVAB Review Panel to examine whether changes limited way for selection decisions for a three-year
to this centerpiece of the existing selection and period, after which time it would be evaluated for
classification system were needed. The panel possible future use. This IOT&E began in May,
emerged with several recommendations. One was 2009.
that “[n]oncognitive measures should be included Progress has also been made on the other panel
in the battery of tests used for classification” recommendations. For example, the DOD spon-
(Drasgow et al., 2006, p. iv). The panel also recom- sored research to review existing information and
mended that a “test of information and communi- communications technology literacy tests (Russell
cations technology literacy” (Drasgow et al., 2006, & Sellman, 2008; Trippe & Russell, 2008) and
p. iii) and one or more nonverbal reasoning tests be nonverbal reasoning tests (Waters, Russell, &
developed and considered for inclusion in enlisted Sellman, 2007).
testing. Furthermore, the panel generated recom- Recent developments in classification theory are
mendations addressing job analysis and validation also worth noting. Since 1976, all services have used
issues. the ASVAB, but each service has combined the
In conjunction with the services, the Department ASVAB tests for classification purposes in a manner
of Defense generated a plan for implementing the that served its own needs best. Zeidner and Johnson
panel’s recommendations (Sellman, 2007). Consid- (1994), in advancing an approach they labeled
erable work on non-cognitive measure development “Differential Assignment Theory,” argued that exist-
was already underway. Both the Navy and the Army ing approaches for developing composites relied too
were involved in the development of computer- much on determining which set of tests best pre-
adaptive personality tests, employing breakthroughs dicted performance for a given set of jobs. They
achieved by Stark, Chernyshenko, and Drasgow noted that differential validity, which focused on
(2005) to build and score paired-comparison forced- the extent to which a test or set of tests differentially
choice measures. The Navy’s version, the Navy predicted performance across jobs, deserved greater
Computer Adaptive Personality Scales (NCAPS), emphasis. They argued that the critical metric
has been found to be resistant to faking in a research should be mean predicted performance, a concept
environment (Underhill, Lords, & Bearden, 2006) originated by Brogden (1959) that was “a function
and to show promising levels of validity for predict- of predictive validity, intercorrelationships among
ing performance (Borman et al., 2009). the least-square estimates of job performance, and
The Army’s version has advanced to the opera- the number of job families” (Zeidner & Johnson,
tional testing level. In 2007, the success of the TTAS 1994, p. 379). They developed a procedure for cal-
program stimulated Army leaders to examine culating mean predicted performance through a
whether non-cognitive measures might be used in complex simulation process, and have demonstrated
screening high school students as well as the non– that their methodology can improve classification
high school students included in TTAS testing. The efficiency.
U.S. Army Research Institute for the Behavioral
and Social Sciences (ARI) initiated a trial of a collec- Officer Screening
tion of non-cognitive measures, including one Aviation Screening
known as the Tailored Adaptive Personality Each of the services has invested heavily in research
Assessment System, or TAPAS. The TAPAS, unlike on aircrew screening. Hunter (1989, p. 129)
the AIM, was designed to be administered in a explained this situation by noting, “[p]ilot training
computer adaptive format. It also incorporated is, almost without exception, the most expensive of
paired items that were more closely paired in judged the many training programs conducted by the mili-
desirability, thus hopefully even less fakeable. A tary services.”
number of studies demonstrated that the TAPAS The Air Force, Navy, and Marine Corps all use
format was effective in reducing faking (Drasgow, commissioned officers as pilots. The Army presents
Stark, Chernyshenko, 2007). TAPAS was found to a special case. After the Air Corps was split off from
be an effective predictor of both proficiency and the Army, the Army focused predominantly on
ru m s ey 139
Test (ACT), another test of general cognitive ability Candidate Selection Method (PCSM) score that is
that also contained elements specific to aviation used in pilot selection (Reimer, 2006).
tasks, replaced the Wonderlic. The ACT, like the The Air Force’s Ernest Tupes and Raymond
Wonderlic, “was found to predict academic failures Christal were pioneers in identifying what has
(ground-school training) fairly well, but to be of no become a commonly accepted structure of personal-
value in predicting flight-training failures” (Ames & ity dimensions. The structure, known as “the Big
Older, 1948, p. 533). The FAR and the ACT were Five,” consists of five dimensions: agreeableness,
combined to constitute the Navy’s aviation selection conscientiousness, extroversion, neuroticism, and
battery (Brown, 1989). openness (1961). Between 1993 and 2004, Christal
led an effort to develop an instrument based on the
postwar aviation testing: air force Big Five. The result was the Self Description
and navy Inventory Plus, which added two dimensions to the
For both the Air Force and the Navy, these early Big Five: Service Orientation and Team Orientation.
efforts formed the basis for later testing. The Air The Self Description Inventory Plus became part of
Force tests helped pave the way for the development the AFOQT in 2005, although it was not at that
of the Air Force Officer Qualifying Test in 1951. time used for operational selection or classification
The AFOQT contained many of the same kinds of (Weissmuller & Schwartz, 2007).
content areas prevalent in the earliest Air Force tests, In 1953, the Navy introduced the Aviation
including current affairs, mathematics, reading Selection Test Battery (ASTB), which added a spatial
comprehension, and biographical information. A test but otherwise maintained the same general
total of 16 tests also included such areas as general content categories as in the FAR and ACT. The
science, aerial orientation, and visualization of Aviation Classification Test was renamed the Aviation
maneuvers (Valentine & Craeger, 1961). This four- Qualification Test and new forms were developed at
hour comprehensive battery has been in continuous this time. Further revisions were made in 1971 and
use since, while undergoing several revisions (Waters, a new test for non-aviation candidates was added
1997). It yields five composite scores: Pilot, (Brown, 1989). The ASTB remains in use today for
Navigator-Technical, Academic Aptitude, Verbal, Navy, Marine Corps, and Coast Guard aviation
and Quantitative. It has been used for both selection programs, and includes the following components:
and classification (Waters, 1997). Across a multitude math skills, reading skills, mechanical comprehen-
of investigations, validities generally in the range of sion, spatial apperception, aviation and nautical
.20 to .40 have been reported (Brown, 1989). information, and an aviation supplemental test
Interest in psychomotor testing and other “appa- (Naval Aerospace Medical Institute, 2010b).
ratus-based” testing for Air Force pilot selection The Biographical Inventory, which was included
continued even after such testing was dropped from among the tests adopted in World War II, was also
the Aircrew Classification Battery in 1995. Many initially included in the ASTB (Frank & Baisden,
studies conducted in the 1970s showed promising 1993). A factor analysis by Stricker (2005) identi-
results, and in 1981, a “project to develop and vali- fied five factors: (1) commissioned officer, (2) sci-
date a computer-based test system known as the ence and engineering interests, (3) flight experience,
Basic Attributes Test (BAT)” (Carretta & Ree, 1993, (4) masculine activities, and (5) school athletics. As
p. 190) was launched. The BAT measured not only Stricker wrote in 1993, when a revised version of
psychomotor abilities, but “cognitive abilities, per- the measure was still in use: “This device has consis-
sonality, and attitudes toward risk” (Carretta, & tently been one of the most valid predictors of
Ree, 1993, p. 192) as well. The BAT was imple- retention vs. attrition in the battery, overshadowing
mented for pilot selection in 1993 and has been tests of general ability, mechanical comprehension,
shown to be a valid predictor of pilot training suc- spatial ability, and aviation information” (p. 7).
cess (Carretta, Zelenski, & Ree, 2000). It was However, the official website for the ASTB now
replaced in 2007 by the Test of Basic Aviation Skills explains: “Although the [Biographical Inventory]
(TBAS), described as a measure of “psychomotor was initially a powerful predictor of attrition, its
skills proven to be correlated to the completion of ability to predict which students will complete avia-
Specialized Undergraduate Pilot training, including tion training has essentially declined to zero over a
hand-eye coordination and listening response” period of years and thus, was suspended” (Naval
(Reimer, 2006, para. 6). The TBAS is combined Aerospace Medical Institute, 2010a, “What is the
with AFOQT and flying hours to produce a Pilot Biographical Inventory,” para. 1).
ru m s ey 141
and background, was developed in 1947 (Brogden conducted in the late 1960s and early 1970s, led to
& Burke, 1950). Its success in predicting important the development of the Cadet Evaluation Battery in
outcomes was limited, a shortcoming that was 1972 (Rumsey & Mohr, 1978). The technical-
judged to be related to a tendency for respondents managerial cognitive subtest was used for selection
to respond in a way that would maximize their into the ROTC Advanced Course beginning in
scores, rather than reflect their true characteristics. 1978. This test, under a different name, was also
Developers then turned to a forced-choice approach, used for selection into OCS beginning in 1979.
designed to counteract this tendency to “fake,” and The second major historical event in Army offi-
more positive linkages to ratings of leadership were cer selection research was stimulated by a recom-
found, with correlations ranging from .27 to .29 mendation by a 1977 Army study group for a more
(Brogden, Burke, & Frankfeldt, 1952). “performance-based” approach to Army pre-
Also in the period following the war, a screening commissioning assessment (Department of the
measure was developed for entrance into the Army Army, 1978). Following this recommendation, the
ROTC Advanced Course. The ROTC Qualifying Army Research Institute (ARI) built three types of
Examination, consisting of quantitative and verbal measures based on a job analysis conducted to iden-
tests, was found to be a good predictor of academic tify critical officer performance dimensions.
grades. Similarly, the Officer Candidate Test, testing Situational exercises based on standard platoon-
arithmetic reasoning, reading comprehension, and leader types of tasks were generated for an ROTC
data interpretation, was developed for Army OCS assessment center. A structured interview was a
selection in 1942 (Parrish & Drucker, 1957). second assessment tool (Rogers et al., 1982); and a
There have been three major developments in paper and pencil test assessing a variety of cognitive
the history of Army officer selection research since skills, the Officer Selection Battery (OSB), was a
World War II. One was the Officer Prediction pro- third (Fischl et al., 1986). The OSB was found to
gram, stimulated by a perception in the mid-1950s predict ratings of performance and potential by
that ROTC selection procedures were deficient in ROTC instructors with validities of .21 to .29 and
their assessment of leader potential, particularly final grade in post-commissioning training at an
combat leadership potential. A wide variety of cog- average level of .52 (Fischl et al., 1986). The OSB
nitive, physical and non-cognitive measures were was incorporated into the ROTC selection system
developed for administration to officers who par- as part of a “whole person assessment,” although the
ticipated in an assessment center consisting of inte- Scholastic Aptitude Test and American College Test
grated military exercises (e.g., inspecting vehicles, eventually replaced it for that purpose.
directing evacuation of an office) administered over The third major development is ongoing. It
a three-day period in an escalating hostilities simu- involves renewed interest and application of non-
lation (Helme, Willemin, & Day, 1971; Helme, cognitive measures in selection into precommis-
Willemin, & Grafton, 1974). sioning training programs. USMA has been
The outcome of the Officer Prediction program exploring the potential predictive value of two per-
was to identify measures that differentially predicted sonality dimensions in particular, hardiness and
performance in three different types of scenarios: grit. Hardiness “refers to a specific set of attitudes
technical, administrative, and combat. Since the and skills” that lead to “resilience and growth in
predictors for technical and administrative tasks stressful situations (Maddi et al., 2010). In a project
were comparable, these were combined. On the known as the Baseline Officer Longitudinal Data
cognitive side, tests of knowledge of tactics and Set, or BOLDS, hardiness, a social judgment mea-
practical skills were good predictors of combat lead- sure, and cognitive measures such as the SAT and
ership performance; while measures of knowledge ACT, were all found to relate to performance of
of history, politics and culture, and math and physi- USMA cadets (Bartone, Snook, & Tremble, 2002).
cal science were good predictors of technical- More recent investigations of the relationship
managerial leadership. A number of measures of between hardiness and performance of USMA
non-cognitive dimensions, including endurance cadets have also shown positive results (e.g., Bartone
and physical leader, predicted combat leadership, et al., 2009; Maddi et al., 2010).
and measures of such non-cognitive constructs as Grit “entails working strenuously toward chal-
verbal/social leader and scientific interest predicted lenges . . . over years despite failure, adversity, and
technical-managerial leadership (Helme, Willemin, plateaus in progress” (Duckworth et al. 2007,
& Grafton, 1974). The results from this research, pp. 1087-1088). Grit has been associated with
ru m s ey 143
can be useful in identifying those who have the intan- (Chair), Predicting leader performance: Insights from Army offi-
gible qualities associated with effective leadership. cer research. Symposium conducted at the annual meeting of
the American Psychological Association, Washington, DC.
Historically, there has been considerable investment Ambrose, S. E. (1966). Duty, honor, country: A history of West
in the development of non-cognitive officer selec- Point. Baltimore: The Johns Hopkins Press.
tion measures, and recently there has been a revival Ames, V. C., & Older, H. J. (1948). Chapter II: Aviation psy-
of this approach. chology in the United States Navy. Review of Educational
The challenges of improving current methods of Research, 18, 532–542.
Arabian, J. M., & Shelby, J. A. (2000). Policies, procedures, and
describing jobs and measuring performance are not people: The initial selection of U.S. military officers (pp. 1–1
unique to enlisted selection. To some extent, because to 1–7). In Officer selection. Cedex, France: Research
there is less diversity and more generality in officer Technology Organization, NATO.
jobs, the challenges are somewhat less than on the Bartone, P. T., Eid, J., Johnsen, B. H., Laberg, J. C., & Snook, S. A.
enlisted side. However, officer jobs tend to entail (2009). Leadership and Organization Development Journal,
30(6), 498–521.
more complexity, and in that respect are more dif- Bartone, P. T., Snook, S. A., & Tremble, T. R. (2002). Cognitive
ficult to define and present more performance mea- and personality predictors of leader performance in West
surement difficulties than enlisted jobs. Thus, efforts Point cadets. Military Psychology, 14, 321–338.
to develop new methods of job description and per- Borman, W. C., Schneider, R. J., Houston, J. S., & Bearden, R. M.
formance measurement for military applications (2009). The Navy Computerized Adaptive Personality Scales:
Evidence for validity (abstract, briefing slides). Paper pre-
will need to consider the unique characteristics of sented at the 51st annual meeting of the International
job requirements for both the enlisted and officer Military Testing Association, Tartu, Estonia.
populations. Brogden, H. E. (1959). Efficiency of classification as a function
of number of jobs, percent rejected, and the validity and
intercorrelation of job performance estimates. Educational
Final Words and Psychological Measurement, 19, 181–190.
The military today has a well-deserved reputation Brogden, H. E., & Burke, L. (1950). Validation of the West Point
for the quality of its service members. Quality may Biographical Inventory, WPB-1, against first-year Aptitude for
be viewed as the product of potential as measured at Service ratings. (Rep. No. 829). Washington, DC: Personnel
Research Section, Personnel Research and Procedures Branch,
entry, training, and experience. The “potential”
Adjutant General’s Office (Army).
component of this equation is not determined Brogden, H. E., Burke, L. K., & Frankfeldt, E. (1952). Validation
entirely by the selection tools employed. Unless a of the West Point Personal Inventory (Rep. No. 882).
significant pool of applicants is available, the utility Washington, DC: Personnel Research Section, Personnel
of any screening system will be limited. However, Research and Procedures Branch, Adjutant General’s Office
(Army).
having an ample applicant pool does not guarantee
Brown, D. C. (1987). Military officers: Commissioning sources and
that those chosen will meet the services’ needs. selection criteria (Final Rep. No. 87-42). Alexandria, VA:
Some means of separating those with high potential Human Resources Research Organization.
from the remainder is another essential requirement Brown, D. C. (1989). Officer aptitude selection measures. In
for that. The services have devoted substantial effort M. F. Wiskoff & G. M. Rampton (Eds.), Military personnel
measurement: Testing, assignment, evaluation (pp. 97–127).
and resources over many decades to develop the best
New York: Praeger.
tools for that purpose. In an era of multiple threats Brown, W. R., Dohme, J. A., & Sanders, M. G. (1982). Changes
and mounting personnel costs, the payoff for in the U.S. Army aviator selection and training program.
improved screening and assignment procedures will Aviation, Space, and Environmental Medicine, 53, 1173–1176.
only get greater. This speaks to the need to identify Bruskiewicz, K. T., Katz, L. C., Houston, J., Paulin, C., O’Shea,
G., & Damos, D. (2007). Predictor development and pilot test-
and overcome those obstacles to an optimal selec-
ing of a prototype selection instrument for Army flight training
tion and classification system that still remain. (Tech. Rep. No. 1195). Arlington, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
Campbell, J. P., & Knapp, D. J. (Eds.) (2001). Exploring the
Acknowledgment limits in personnel selection and classification. Mahwah, NJ:
The author wishes to express his great appreciation Erlbaum.
to Dorothy Young for her help in locating many of Carey, N. B. (1992). Does choice of a criterion matter? Military
the articles referenced in this chapter. Psychology, 4, 103–117.
Carretta, T. R., & Ree, M. J. (1993). Basic Abilities Test:
Psychometric equating of a computer-based test. The
References International Journal of Aviation Psychology, 3, 189–201.
Allen, M. T. Babin, N. E., Oliver, J. T., & Russell, T. L. (2011). Carretta, T. R., Zelenski, W. E., & Ree, M. J. (2000). Basic
Predicting leadership performance and potential in U. S. Attributes Test (BAT) retest performance. Military Psychology,
Army Officer Candidate School (OCS). In M. G. Rumsey 12, 221–232.
ru m s ey 145
Maier, M. H., & Fuchs, E. F. (1969). Development of improved Putka, D. J., Kilcullen, R., Legree, P., & Wasko, L. (2011).
Aptitude Area composites for enlisted classification. (Tech. Identifying the leaders of tomorrow: Validating predictors of
Research Rep. No. 1159). Arlington, VA: U.S. Army leader potential and performance. In M. G. Rumsey (Chair),
Behavioral Science Research Laboratory. Predicting leader performance: Insights from Army officer
Maier, M. H., & Fuchs, E. F. (1972). An improved differential research. Symposium conducted at the annual meeting of the
Army classification system. (Tech. Research Rep. No. 1177). American Psychological Association, Washington, DC.
Arlington, VA: Behavior and Systems Research Laboratory. Reimer, K. (2006, July 1). AETC deploys new pilot screening test
Martin, C. J., & Hoshaw, C. R. (1997). Policy and program for FY07. Retrieved from http://www.aetc.af.mil/news/story.
management perspectives. In W. A. Sands, B. K. Waters, & asp?id=123023176/
J. R. McBride (Eds.), Computerized adaptive testing: From Rogers, D. L., Roach, B. W., & Short, L. O. (1986). Mental abil-
inquiry to operation (pp. 11–20). Washington, DC: American ity testing in the selection of Air Force officers: A brief historical
Psychological Association. overview. (AFHRL TP-86-23). Brooks Air Force Base, TX:
Matthews, W. T. (1977). Marine Corps enlisted attrition (CRC Air Force Human Resources Laboratory, Air Force Systems
No. 341). Arlington, VA: Center for Naval Analyses. Command.
McBride, J. R. (1997). Technical perspective. In W. A. Sands, Rogers, R. W., Lilley, L. W., Wellins, R. S., Fischl, M. A., &
B. K. Waters, & J. R. McBride (Eds.), Computerized Adaptive Burke, W. P. (1982). Development of the pre-commissioning
Testing (pp. 29–44). Washington, DC: U.S. American Leadership Assessment Program (Tech. Rep. No. 560).
Psychological Association. Alexandria, VA: U.S. Army Research Institute of the
Melton, A. W. (1947). Apparatus tests. Report No. 4. Washington, Behavioral and Social Sciences.
DC: U.S. Government Printing Office. Rumsey, M. G., & Mohr, E. S. (1978). Male and female factors on
Mitchell, J. L., & Driskell, W. E. (1996). Military job analysis: A the Cadet Evaluation Battery (Tech. Paper No. 331).
historical perspective. Military Psychology, 8, 119–142. Alexandria, VA: U.S. Army Research Institute for the
Naval Aerospace Medical Institute (2010a, Jan 22). ASTB Behavioral and Social Sciences.
frequently asked questions. Retrieved from http://www.med. Russell, T. L., & Sellman, W. S. (2008). Review of information
navy.mil/sites/navmedmpte/nomi/nami/Pages/ASTB and communications technology literacy tests. Paper presented
FrequentlyAsked Questions.aspx at the 23rd Annual Conference of the Society for Industrial
Naval Aerospace Medical Institute (2010b, Jan 22). ASTB infor- and Organizational Psychology, San Francisco, CA.
mation and sample questions. Retrieved from http://www. Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB
med.navy.mil/sites/navmedmpte/nomi/nami/Pages/ASTB and CAT. In W. A. Sands, B. K. Waters, & J. R. McBride
Overview.aspx. (Eds.), Computerized adaptive testing: From inquiry to opera-
Navy Personnel Research Studies and Technology (1998). Sailor tion (pp. 3–9). Washington, DC: American Psychological
21. A research vision to attract, retain, and utilize the 21st cen- Association.
tury sailor. Millington, TN: Navy Personnel Research Studies Schmidt, F. L. (1994). The future of personnel selection in the
and Technology. U.S. Army. In M. G. Rumsey, C. B. Walker, & J. H. Harris,
North, R. A., & Griffin, G. R. (1977). Aviator selection 1919– Personnel selection and classification (pp. 103–125). Hillsdale,
1977 (Special Rep. No. 77-2). Pensacola, FL: Naval Aerospace NJ: Erlbaum.
Medical Research Laboratory. Sellman, S. W. (2007). Research and implementation plan.
Odell, C. E. (1947). Selection and classification of enlisted per- Addressing recommendations for enhancing ASVAB and DOD
sonnel. In D. B. Stuit (Ed.), Personnel research and test devel- enlisted personnel and job classification system (FR-07-46).
opment in the Bureau of Naval Personnel (pp. 21–30). Alexandria, VA: Human Resources Research Organization.
Princeton, NJ: Princeton University Press. Staff, Personnel Research Section, Classification and Replacement
Olson, P. T. (1968). Use of Army school samples in estimating ACB Branch, the Adjutant General’s Office (1945). The Army
test validity (Tech. Research Note No. 199). Washington, General Classification Test. Psychological Bulletin, 42, 760–768.
DC: U.S. Army Behavioral Science Research Laboratory. Stark, S., Chernyshenko, O. S., & Drasgow, F. (2005). An IRT
Oppler, S. H., McCloy, R. A., Peterson, N. G., Russell, T. L., & approach to constructing and scoring pairwise preference
Campbell, J. P. (2001). The prediction of multiple compo- items involving stimuli on different dimensions: The multi-
nents of entry-level performance. In J. P. Campbell & Knapp, unidimensional pairwise-preference model. Applied
D. J. (Eds.) (2001). Exploring the limits in personnel selection Psychological Measurement, 29, 184–203.
and classification (pp. 349–388). Mahwah, NJ: Erlbaum. Stricker, L. J. (1993). The Navy’s Biographical Inventory: What
Parrish, J. A., & Drucker, A. J. (1957). Personnel research for accounts for its success? (pp. 7–12). Proceedings, 35th Annual
Officer Candidate School (Tech. Research Rep. No. 1107). Conference of the Military Testing Association. Williamsburg, VA.
Washington, DC: Personnel Research and Procedures Stricker, L. J. (2005). The Biographical Inventory in naval aviation
Division, Personnel Research Branch, The Adjutant General’s selection: Inside the black box. Military Psychology, 17, 55–67.
Office (Army). Thomas, P. J. (1970). A comparison between the Armed Services
Personnel Testing Division, Defense Manpower Data Center Vocational Aptitude Battery and the Navy Basic Test Battery in
(2008). ASVAB Technical Bulletin No. 3: CAT-ASVAB Forms predicting Navy school performance (Tech. Bulletin No. STB
5-9. Retrieved from http://www.official-asvab.com/catasvab_ 70-4) San Diego, CA: Navy Personnel and Training Research
res.htm Laboratory.
Putka, D. J. (Ed.) (2009). Initial development and validation of Trent, T. (1993). The Armed Services Applicant Profile (ASAP).
assessments for predicting disenrollment of four-year scholarship In T. Trent & J. H. Laurence (Eds.), Adaptability screening for
recipients from the Reserve Officer Training Corps (Study Rep. the Armed Forces (pp. 71–99). Washington, DC: Department
No. 2009-06). Arlington, VA: U. S. Army Research Institute of Defense, Office of Assistant Secretary of Defense (Force
for the Behavioral and Social Sciences. Management and Personnel).
ru m s ey 147