0% found this document useful (0 votes)
109 views19 pages

Military Classification

This chapter describes military selection and classifi cation research in the United States from a historical perspective. It describes the evolution of enlisted selection and classifi cation measures from Army Alpha and Beta in 1917 to current explorations into non-cognitive tools.

Uploaded by

Ernesto Mora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views19 pages

Military Classification

This chapter describes military selection and classifi cation research in the United States from a historical perspective. It describes the evolution of enlisted selection and classifi cation measures from Army Alpha and Beta in 1917 to current explorations into non-cognitive tools.

Uploaded by

Ernesto Mora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CHA P T E R

Military Selection and Classification


11 in the United States

Michael G. Rumsey

Abstract
This chapter describes military selection and classification research in the United States from a
historical perspective. It describes the evolution of enlisted selection and classification measures from
Army Alpha and Beta in 1917 to current explorations into non-cognitive tools. Major developments,
such as the transition from service-specific test batteries to the joint service Armed Services
Vocational Aptitude Battery (ASVAB) and the joint service project to link enlistment standards to job
performance, are given special attention. Officer screening has evolved separately from enlisted
screening, and is given separate treatment in this chapter. Both enlisted testing and officer testing have
been characterized by a historical progression from fairly broad measures of cognitive ability to a more
comprehensive approach, involving not only an expansion of the cognitive components assessed, but
also an increasing attention to non-cognitive dimensions. While enlisted and officer testing have many
features in common, two themes that have received more emphasis in officer selection are the work in
identifying measures that predict aviation success, and the development of realistic assessment centers
to validate predictors of officer success. The success of the military in developing enlisted and officer
measures that predict important outcomes is a major chapter focus.
Keywords: Military, selection, classification, enlisted, officer, non-cognitive, ASVAB, validate, aviation,
performance

There are several characteristics of the military envi- status, the principal selection decision for officer
ronment that have contributed to the development candidates typically involves their selection into a
of the military selection and classification system as pre-commissioning training program, which they
it has existed in the past and exists today. The first is must successfully complete before becoming officers.
the division, if we leave aside for the moment the The second characteristic is the sheer volume of
special case of warrant officers, of the military popu- accessions, from hundreds of thousands to millions
lation into two principal categories—commissioned of members a year for enlisted personnel alone
officer and enlisted. With few exceptions, each indi- (Waters, 1997). This has a number of ramifications.
vidual remains in the category in which he or she is One is that the services have been able to invest con-
first placed. “Officers” do not become “enlisted,” siderable resources to ensure that the tools used in
and the number penetrating the barrier from this process represent the state of the art in testing
“enlisted” to “officer” is relatively small. Because methodology. Another ramification is that the mili-
these categories are so distinct, the selection and tary has found it efficient to set up special-purpose
classification processes for each are also distinct. testing centers across the country for the sole pur-
One of the major differences between these pro- pose of processing applicants for enlistment. The
cesses is that, while the enlisted selection process availability of these testing centers and the sizable
involves a direct transition from civilian to military quantity of applicants flowing through them daily

129
make the use of these centers for much of the match- the military has frequently played a leading role, as
ing of persons to jobs as well as selection a some- discussed briefly below.
what natural occurrence.
The third significant characteristic of this envi- Development and Assessment
ronment is the link between accessions and train- The measures used have often represented the high-
ing. Generally, applicants are not presumed to have est standards in terms of quality and relevance. The
any training in the job they will be performing when Army Alpha and Beta tests, developed in World
the selection decision is made. The military assumes War I, have been lauded as pioneering efforts in
the burden of training new enlistees and those who group cognitive testing (Zeidner & Drucker, 1988).
have been selected into a pre-commissioning pro- Later, the services developed highly sophisticated
gram. By and large, applicants enter at the bottom classification batteries, and have recently made sig-
rung. Lateral entry into a higher-status position is a nificant contributions to the science of personality
rare event. These factors contribute to the signifi- assessment. Developers have scrupulously applied
cance of the selection process. The services are advanced scientific principles to measurement
making a major commitment to the development of development. The military has often demonstrated
the individual selected, and any mistakes will be a greater appreciation of the need to validate its
costly. measures against some outcome of importance than
have many civilian organizations. This is probably
Scope of This Chapter due, at least in part, to cost–benefit considerations.
The current selection and classification procedures Validation is an expensive proposition, and for most
used by the military cannot be properly understood civilian organizations, difficult to support for the
without appreciation of the key historical develop- number of individuals screened on an annual basis.
ments that led to their implementation. Thus, this As noted earlier, the military screens an unusually
chapter will be historical in orientation, describing large number of applicants, so the cost of validation
enlisted and officer developments separately. As in this context is a necessary and justifiable expense,
noted above, enlisted screening is a massive enter- given the enormous difference between benefits
prise, conducted through the administration of a associated with a highly valid test and those associ-
joint service selection and classification test battery ated with a test of negligible validity. For a substan-
at a variety of locations. Officer screening is a much tial period of time, the “outcome of interest” in
more decentralized operation. Not only does each military validation research was almost exclusively
service have its own screening process, but also attrition or success in training.
within each service there are multiple processes. In
many cases, officer screening is dependent more job analysis
upon a whole-person evaluation than on scores on a This brings us to the topic of job analysis. Job analy-
selected set of tests. As the screening processes for sis is often used to help inform selection of which
the two groups are divergent, so has been the his- individual difference dimensions to test, but is a
torical evolution of these processes. The chapter will particularly critical step whenever job performance
proceed first with a discussion of enlisted screening, measures are used in the process of validating selec-
followed by an examination of officer screening. tion instruments. Although the military has played
A full treatment of military selection and classifi- a leading role in the development of job analysis
cation would incorporate the many significant techniques, these techniques have generally been
developments that have occurred outside the United designed more for use in training development than
States as well as those that have occurred within. in performance measurement.
However, such a treatment is far beyond what can Because of the early emphasis on the use of train-
be accomplished in a single chapter. The focus here ing success to validate selection and classification
will be limited to the United States. measures, for several decades little attention was
Certain themes will receive particular emphasis given to comprehensive analyses of job requirements
in this review, based on their relevance to the sophis- for purposes of developing performance measures.
tication, maturity, and comprehensiveness of the In order to identify appropriate content to predict
research and the screening systems generated by the performance in training, Maier (1993, p. 5) observed
research. These themes relate to the measures devel- that “[r]esearchers typically observe and talk to
oped and the procedures used to develop, assess, workers in the area and visit training programs.”
and administer those measures. In all these respects While these activities may be considered a form of

130 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


job analysis, they do not constitute a particularly the job analysis measures and used these to validate
systematic form. the selection measures.
Meanwhile, the military was moving forward While this was the most prominent example of
with the development of job analysis tools that an effort to link individual-difference measures to
would, when the need was perceived, prove criti- job performance, another notable effort, this time
cally valuable in building new outcome measures. with officers, was the Officer Prediction Project
One was the job inventory approach, which relies conducted by the Army in the late 1950s and early
on the identification of discrete tasks. The Air Force 1960s. An elaborate assessment center was built to
developed a sophisticated version of this approach represent a variety of job duties, based on input
over a ten-year period, from 1957 to 1967 (Christal, from subject matter experts and technical reviews
1969), when it was implemented as a fully func- “at the appropriate branch schools” (Zeidner &
tioning occupational analysis system. This system Drucker, 1988, p. 118). An immense variety of
involved generating task lists through group inter- diverse predictor measures were linked to perfor-
views and a conference composed of subject-matter mance at this center to develop a battery of tests
experts, then collecting incumbent data on such later used in Reserve Officers’ Training Corps
dimensions as the percentage of time spent on each (ROTC) selection. This, too, will be discussed in
task and the number performing each. The Air more detail later in this chapter.
Force’s system has often been referred to in terms of
the programs used to analyze these data, the test administration
Comprehensive Occupational Data Analysis In test administration, the military has again played
Programs (CODAP). Although the Air Force pio- a leading role. Since 1976, enlisted testing has been
neered this system, the other services have adapted a joint service function, conducted in numerous
their own forms of it (Mitchell & Driskell, 1996). locations throughout the country. These testing
The Air Force also developed another valuable locations include Military Entrance Processing
form of job analysis, the critical incident technique Stations (MEPS), serving relatively large geographic
(Flanagan, 1954), in the course of developing a pilot areas, and the more localized Mobile Examining
selection program. This technique focuses on behav- Team (MET) sites. The MEPS are also used for
iors that led to a particularly successful or unsuc- other enlistment processing activities, including
cessful outcome in a particular situation. Behaviors physical screening. The cost of testing is related to
thus generated can then form the foundation of the amount of time required to complete the tests.
dimensions used in rating scales. In 1997 the services accomplished a remarkable
breakthrough—the initiation of computer adaptive
validation testing at all major testing stations (Personnel
Much of the discussion of validation in this chapter Testing Division, 2008). The advantages of this pro-
will concern validation against training performance cedure are many, including time savings, scoring
or attrition. However, beginning in about 1970, a efficiency, and an enhanced testing experience for
fair amount of research has been conducted linking examinees (Sands & Waters, 1997).
selection measures to job performance. Many of the This chapter will now turn to an examination of
enlisted efforts were summarized in a review by the major historical developments in the domains of
Welsh, Kucinkas, and Curran (1990). One of these, enlisted and officer screening.
to be discussed later, was particularly notable. This
was known as the Job Performance Measurement Enlisted Screening
Project, which linked selection measures to mea- The Two World Wars
sures of job performance. What is worth mention- At the beginning of World War I, with the need for
ing here is how this project made use of job analysis huge quantities of Army troops, the potential value
data to help ensure that the validation was based on of a screening instrument that could be group-
relevant criteria. “In the JPM project . . . all the ser- administered was recognized. Robert Yerkes, an
vices used task-based job analysis methods” (Knapp, eminent psychologist, was given the responsibility
2006, p. 117), and the “Army and Navy collected of directing the development of the Army Alpha test
critical incident data that were later used in the battery. The Army Alpha had eight parts, including
development of performance rating scales” (Knapp, grammar, vocabulary, and arithmetic, among other
2006, pp. 117–118). Each of the services developed content areas. These tests were ultimately adminis-
hands-on measures and other measures linked to tered to over 1.7 million men, including 42,000

ru m s ey 131
officers, during the war (Zeidner & Drucker, 1988). that individuals with a high level of general mental
A pictorial version of the test battery, known as the ability were matched with the jobs that were judged
Army Beta, was also developed under Yerkes’s direc- to require such ability. However, during and after
tion. The development of the Army Alpha and Beta the war, that approach began to change. Classification
is generally viewed as the initial step in scientific came to mean a matching of particular aptitudes
screening of military personnel, as well as an his- with particular job requirements. Given the large
torical development in cognitive testing, which number of military jobs, the challenge of determin-
until then had been conducted in an individualized ing the best combination of tests for each job was
manner. Despite the massive administration of the enormous. This problem was somewhat simplified
tests, their use in selection and assignment during by the concept of clustering. Jobs that were found to
this period was sporadic. However, Zeidner and have similar requirements were clustered together
Drucker noted (p. 11) that: “Although the test and linked to the same set of tests.
scores were not universally accepted by Army man- In 1924, the Navy began using a verbal test, the
agement, they were allowed to play an important General Classification Test (GCT), to screen candi-
role in many decisions. For example, almost 8,000 dates for enlistment. During World War II, as the
recruits were recommended for immediate discharge Navy struggled with trying to classify nearly four
as mentally incompetent; another 8,000 were million enlisted personnel, the Navy General
assigned to special labor duty.” The Army Alpha Classification Test was not found to provide a suffi-
continued in use for over 25 years (Staff, Personnel cient basis for differentiating across specialties
Research Section, 1945). (Faulkner & Haggerty, 1947). Thus, it was supple-
World War II stimulated the next major push mented in 1943 with a Basic Test Battery (BTB),
forward in the development of enlisted screening which included more specialized tests, such as
tests. Initially, the emphasis was more on tests of Mechanical Aptitude and Radio Code Aptitude
general ability rather than specific aptitudes related (Odell, 1947). A version of the BTB consisting of
to job placement. The Army developed the Army the GCT, an Arithmetic Test, a Mechanical Test, a
General Classification Test (AGCT), a test of “gen- Clerical Test, and a Shop Practices Test, was linked
eral learning ability,” for administration to every with final grades across 47 schools. When the tests
“literate inductee” beginning in 1940 (Staff, were combined in a regression analysis, the level of
Personnel Research Section, 1945). The first form prediction achieved by the BTB (.57, corrected) was
consisted of vocabulary, arithmetic, and block- found to exceed that observed for the Armed
counting items. Despite the rather limited range of Services Vocational Aptitude Battery (ASVAB, .47,
content categories initially, the AGCT was used as a corrected), a test battery to be discussed later in this
classification test in the sense that it could “sort new chapter (Thomas, 1970). The BTB continued in use
arrivals” (Staff, Personnel Research Section, 1945, into the 1970s (Maier, 1993, p. 37).
p. 760) and help determine their qualifications for Between 1946 and 1948, the newly formed Air
various types of training regimens. In 1942, the Force developed its own Airman Classification
Marine Corps began using the AGCT and Battery. Earlier research by the Army Air Force in
Mechanical Aptitude Tests for classifying their new World War II focusing on officer selection (to be
recruits (Furer, 1959). In 1945, the Army developed described later in this chapter) provided the founda-
new forms of the AGCT, which contained four tion for this battery, but new tests were added,
subtests—reading and vocabulary, arithmetic com- focusing on enlisted jobs. Twelve aptitude tests,
putation, arithmetic reasoning, and pattern analy- covering such general abilities as word knowledge
sis. These forms provided the model for the Armed and more specific abilities such as aviation informa-
Forces Qualification Test (AFQT), which provided tion, were developed, along with a biographical
“an objective basis for the equitable distribution of inventory. Separate composites of tests, called
human resources” (Zeidner & Drucker, 1988, Aptitude Indices (AI) were developed for separate
p. 50) across the services. From 1950 through 1972, clusters of Air Force jobs. These tests were found to
all services used a common AFQT for selection. be highly reliable, and had a median validity of .61
for predicting technical course grades when cor-
Development of Classification Batteries rected for restriction in range (Weeks, Mullins, &
Before World War II, aside from the limited use of Vitola, 1975). In 1958, the Airman Classification
specialized tests for a few select occupations, the Battery was replaced with the Airman Qualifying
concept of “classification” generally meant ensuring Examination, which was similar but shorter in

132 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


length and the time required to complete it. This recruit information (e.g., aptitude test scores, bio-
change coincided with a policy change to make Air graphical data, etc.) and who used their best judg-
Force recruiting more selective. Now, in addition to ment as the basis for filling duty assignments and
qualifying on the AFQT, Air Force applicants had training school quotas. This manual procedure
to meet qualifying standards on at least one aptitude involved the human evaluation of trade-offs . . . and
index of the Airman Qualifying Examination. A often yielded person–job mismatches. . . .”
mean correlation of .63 was found for a 1960 ver- Beginning with the Army in 1958 and continu-
sion of the AQE against 41 sets of school grades. ing into the 1960s, all services developed automated
The AQE continued in use until replaced by the assignment systems. These systems varied in several
Armed Services Vocational Aptitude Battery in respects, perhaps most critically in the extent to
1973 (Weeks, Mullins, & Vitola, 1975). which they were designed to optimize person–job
In 1949, the Army also developed a classification match versus other objectives, such as filling high-
battery that incorporated more specialized tests than priority jobs. Kroeker (1989) differentiated between
did the AGCT. The Army’s new battery, the Army two types of allocation systems. One was character-
Classification Battery (ACB), initially included ten ized by its emphasis on job filling. As an example,
tests, including tests of general ability (math, verbal, he identified the Computer Assisted Assignment
information); mechanical ability (electronics, System (COMPASS) II model then used by the
mechanical, automotive, trade); perceptual ability Navy. Elements of the model were “ASVAB test
(pattern analysis, auditory perception, attention to scores, civilian job experience, educational back-
detail); and an inventory covering interests in four ground, and vocational objectives and preferences.”
areas (combat, attentiveness, electronics, mainte- Although a number of objectives were considered in
nance). The tests were combined into ten group- this model, “fill policy overshadow[ed] all other
ings, or Aptitude Areas, linked to ten job clusters considerations. . . .”(Kroeker, 1989, p. 53).
(Zeidner, Harper, & Karcher, 1956). A second type of allocation system was more
As more data emerged, revisions were made to balanced between filling jobs and fitting persons
the ACB. These revisions were related to accumu- to jobs. The flexibility to consider other factors
lated validation findings, typically against training besides immediate “fill” was associated with the
data (e.g., Zeidner, Harper & Karcher, 1956; Maier shift from a conscription environment in which job
& Fuchs, 1969), and to an effort to better predict assignments occurred after basic training, to a vol-
combat performance, which involved linking per- unteer environment in which job guarantees were
sonality measures to on-the-job ratings. An interest- made at the time of enlistment. Given the time
ing conclusion that emerged from the latter effort lapse between commitment to serve and initiation
was that “well-adjusted good citizens” (Willemin & of specialty training, the emphasis on filling imme-
Karcher, 1958, p. 7), (e.g., high in self-confidence diate service needs was reduced. As an example of
and emotional stability) and “men with masculine this category, Kroeker (1989) offered the Air Force’s
interests” (e.g., sports, motoring, hunting) made Procurement Management Information System
“good fighters” (Willemin & Karcher, 1958, p. 8). (PROMIS) model, which sought to optimize fit
The last changes to the ACB were made in 1972 with the jobs available at the time the applicant was
(Maier & Fuchs, 1972). The ACB continued in use ready to make a commitment.
for the Army up to 1975. The Marine Corps used a The services continue to strive to improve their
version of the ACB known as ACB-61 until 1976 allocation processes. The Army has developed a
(Matthews, 1977). system known as the Enlisted Personnel Allocation
System (Lightfoot, Ramsberger, & Greenston,
Assignment Systems 2000), which would add an optimization compo-
Linking tests to jobs is only one element in the nent to a current system that places heavy emphasis
assignment process. Olson (1968) noted that other on job fill. Implementation of this system has been
factors included service priorities and preferences of impeded by concerns that its inclusion in the per-
the individuals. Somehow all this information must son–match process might reduce the emphasis on
be combined to make the best decision for all con- filling high-priority jobs. The Navy has also devel-
cerned. Kroeker (1989, p. 44) noted that personnel oped “new classification decision support software,
assignment in the 1950s “in the armed services was the Rating Identification Engine (RIDE)” to “pro-
accomplished monthly by large teams of classification vide greater utility in the operational classification
technicians who sorted through cards containing system” (Farmer et al., 2003, p. 62). Both ability

ru m s ey 133
and interest inputs are intended to be used by this training performance was not particularly contro-
software in making person–job matches. A new versial. It soon became apparent that measures that
computer-administered interest measure, Job and could be relatively universally accepted as represent-
Occupational Interest in the Navy (JOIN) has been ing job performance were not readily available.
developed to improve the quality and precision of Researchers would have to develop and administer
the interest inputs (Farmer et al., 2003). such measures on their own. Once service members
had graduated from training and settled into their
ASVAB and the Job Performance job assignments, they were widely dispersed
Measurement Project throughout the world. Locating them and assessing
In the 1970s, the use of separate test batteries by the their performance would be an enormous effort.
different services was found to be administratively Congress instructed the Department of Defense
cumbersome and constraining. “To determine eligi- to link enlistment standards to job performance. An
bility for enlistment in different Services, an appli- early concept guiding this research, known as the
cant would have had to take the tests specific to each Job Performance Measurement project, was that job
Service” (Maier, 1993, p. 37). Thus, the services sample, or “hands-on,” testing would be the “bench-
combined their efforts to produce a joint service mark” performance measure. This approach was
selection and classification battery: the Armed tied to the view that “the predictor battery of inter-
Services Vocational Aptitude Battery (ASVAB). Full est (the ASVAB) was intended to predict only job
administration of this battery began in 1976 (Sands proficiency” (Knapp, 2006, p. 116). It was recog-
& Waters, 1997) and has continued to this day. nized that hands-on testing would not always be
Content categories and test content have changed feasible. This consideration gave rise to the idea of
over time, but at the outset, the tests used were very alternate measures that could be used if they could
much like those in the individual services’ classifica- be shown to represent reasonable “surrogates” to the
tion batteries, minus any interest inventory. hands-on benchmark. In fact, “[e]ach service was
The AFQT was maintained as a composite of the assigned a surrogate measurement method to evalu-
ASVAB tests that measured general ability. As tests ate as part of their research programs” (Knapp,
were revised, efforts were devoted to ensuring that 2006, p. 114).
score distributions in the new forms had roughly Within this general guidance, there was consider-
the same meaning as distributions in the old forms. able flexibility regarding how the services responded
Screening standards were linked to these score dis- to Congress’s instruction. The Air Force developed a
tributions. By 1980, it had become evident that procedure called “walk through performance test-
something was very wrong. Complaints were ing” which combined an interview and hands-on
received from the schoolhouses and the field about approach. Whereas a strictly hands-on test would
the performance of newly accessioned service mem- require the individual to actually perform a task, in
bers. Analyses revealed that the process of linking an interview the individual would “describe in detail
new scores to old had been flawed, and that the how he or she would perform” the task (Hedge &
result was, in effect, to unintentionally lower entry Teachout, 1992, p. 454). The administrator would
standards. The scoring problem was corrected, but then determine “whether the description of each
in the meantime hundreds of thousands of individ- step of the task was correct or incorrect” (Hedge &
uals had enlisted who, if the standards had been Teachout, 1992, p. 454). The Air Force supple-
applied as intended, might have been considered mented the walk-through with job knowledge and
ineligible (Laurence & Ramsberger, 1991; Waters, rating measures. The Navy used a hands-on test, a
1997). job-knowledge simulation test, and a set of rating
When the situation was explained to Congress, it scales (Laabs & Baker, 1989). The Marine Corps
brought national attention to the enlistment pro- used hands-on measures, job-knowledge tests, and
cess. Members wanted to know what the impact of file data (Carey, 1992).
enlistment standards was on job performance. The Army, like the other services, used multiple
Previous research had already demonstrated the link performance measures, but presented an alternative
between military tests and training performance. conceptualization of performance to the “bench-
However, assessing training performance was rela- mark” approach. The Army agreed that hands-on
tively easy compared to measuring job performance. tests provided useful performance information, but
To some extent, training grades were already acces- did not perceive the value of other measures to
sible for analysis. The question of what constituted depend only on the extent to which they could serve

134 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


as surrogates for hands-on measures. The Army’s This differentiation allowed a more nuanced evalua-
position was that no one measure could provide tion of the ASVAB’s strengths and weaknesses than
complete information about an individual’s perfor- an approach focusing on only one of these two com-
mance. Hands-on tests could provide information ponents would have permitted. ASVAB’s relation-
about an individual’s maximum level of task profi- ship to can-do performance was powerful and
ciency, but not about that individual’s day-to-day undeniable, with multiple correlations in the .60s.
performance. Ratings from peers and supervisors ASVAB’s relationship to will-do performance was
familiar with the individual’s performance and such much more modest, opening the door for the addi-
administrative measures as letters of commendation tion of new predictor measures. A number of non-
could shed light on the individual’s day-to-day per- cognitive predictor measures were found to add
formance, which might be viewed as a combination considerably to ASVAB’s accuracy in predicting
of proficiency and motivation. Job-knowledge tests will-do performance. (See Oppler et al., 2001; and
could provide information about proficiency on for a complete description of the Army’s effort to
tasks not tested in a hands-on manner. Since part of both validate existing measures and develop and
an individual’s job in the Army is to know how to validate new measures, which incorporated two
perform tasks that are typically not performed in projects, Project A and Building the Career Force,
peacetime, job knowledge is an important element see Campbell & Knapp, 2001.)
of performance in its own right. Thus the Army
used all of these measures, and all were viewed as Non-cognitive Testing
potentially valuable sources of information. One outcome of the Job Performance Measurement
Observing that other services developed mea- effort was the demonstration that ASVAB does
sures of interpersonal as well as task-related behav- indeed predict job performance. This was important
ior, Knapp (2006, p. 116) noted that, “It was not to supporting continued use of the measure. Another
just the Army that was inclined to view performance outcome was to stimulate further investigation of
with a broader lens than task proficiency.” However, non-cognitive measures. The Army’s Assessment of
the Army’s goals were particularly compatible with Background and Life Experiences (ABLE) was a tra-
the use of a broad range of performance measures. ditional self-report temperament measure, asking
As noted above, the development and administra- individuals to select from a set of alternatives which
tion of performance measures, particularly across a one best represented themselves, their beliefs, or
wide sample of jobs, is an extraordinarily resource- their attitudes. The other services had developed
intensive exercise. Once the services were commit- similar measures that focused on individuals’ his-
ted to doing it, it could be viewed not only as a tory. Because of this biographical orientation, they
challenge, but also as a unique, literally “once-in-a- were known as “biodata” measures, although the
lifetime” event. Why not take this opportunity to difference between a biodata measure, and a tem-
determine how the selection and classification perament measure that includes items relating to
system could be improved by the addition of new activities or beliefs in the past, may be subtle or, in
measures? The ASVAB could be viewed as a measure some cases, nonexistent. The Air Force began using
of general cognitive ability. It could not be viewed as a biodata measure, the History Opinion Question-
a measure of “whole person” potential. Spatial, psy- naire, in 1975 (Trent, 1993; Guinn, Johnson, &
chomotor, and personality measures were excluded Kantor, 1975). The Navy developed the Armed
from the ASVAB. Part of the Army’s approach was Services Applicant Profile (ASAP), based on existing
to develop and administer such measures to deter- item sources, in the 1980s (Trent, 1993). A measure
mine what they could add to the ASVAB in predict- combining the ABLE and ASAP, known as the
ing performance. Adaptability Screening Profile (ASP), was consid-
This joint service effort demonstrated that the ered for joint service use (Trent & Laurence,
ASVAB predicted job proficiency, whether profi- 1993).
ciency was measured by hands-on tests or job- A major concern about non-cognitive measures,
knowledge tests. The Army portion of the effort, whether they are classified as temperament, biodata,
untethered from the “benchmark” concept, was able personality, or interest, is that of faking. Since these
to show a clear differentiation between two major measures are self-reported, there is always the danger
components of performance, “can do” as repre- that the individual may take the opportunity to
sented by the proficiency measures, and “will do” as present himself or herself in the most positive light
represented by ratings and administrative measures. possible, rather than in terms that would reflect the

ru m s ey 135
individual’s characteristics most accurately. Concerns Computer Adaptive Testing
about faking prevented the implementation of the When first developed, the ASVAB was administered
ASP, or of the separate ABLE or ASAP measures. in paper and pencil format. While this was all the
However, such concerns also stimulated research to available technology would support, it had many
develop and test safeguards against faking. In the disadvantages. The primary disadvantage was the
Army, the successor to the ABLE was the Assessment time required for administration, over three hours.
of Individual Motivation, or AIM (White & Young, Another disadvantage was that it was administratively
1998). The AIM was designed as a forced-choice cumbersome, requiring the printing and distribution
measure. The intent was to present options in such of test booklets and answer sheets. A third disadvan-
a way that the most desirable option was not obvi- tage was that it was inefficient and imprecise. The
ous. The individual was “forced” to choose between same test questions were administered to everyone,
two apparently desirable and two apparently unde- despite their ability level. Someone at a very high
sirable options, so the ultimate score would be more level would have to answer a number of very easy
likely to represent behavior related to job success questions, which contributed little if any new infor-
than behavior seen as socially desirable. mation about that person’s ability beyond the ques-
Initial research findings suggested that the tions closer to that individual’s ability level. Someone
approach was successful—that it inhibited faking at a very low level would have to answer a number of
without sacrificing validity (Young et al., 2000). very difficult questions, which similarly would pro-
The Army saw a potential application of AIM as a vide very little in terms of new information.
predictor of attrition of non–high-school-diploma Computerized adaptive testing (CAT) offered a
graduates. It has historically been observed that way to counter these disadvantages. The key was the
those without a high school diploma tend to wash “adaptive” part. The basic concept behind CAT was
out of the military at a far higher rate than those that, for every test taker, there was a “true score”
with a diploma (Department of Defense, 1985). reflecting their true ability on whatever attribute
Accordingly, the number of non-graduates to be was being measured, and that the object of the test
accepted into the services has been limited by was to pinpoint that true score. Each item adminis-
Department of Defense policy. One concern regard- tered provided some information about the indi-
ing this policy is that there are many reasons why vidual’s true score. Each item answered correctly
someone may leave the military. Lack of persistence, led to a more difficult item; each item answered
which failure to graduate from high school may par- incorrectly led to an easier item. Ultimately, the
tially represent, may be one of those reasons. individual’s ability level and the item-difficulty level
However, there may be a more direct way of mea- would converge and lead to an estimate of the indi-
suring one’s propensity to attrite that might allow vidual’s true score. Because a small number of care-
the military to salvage some non-graduates who do fully calibrated items could generate a good estimate
not have a high propensity to attrite. of that score, an adaptive test could be completed
Accordingly, a trial program was initiated within in a much shorter time than a non-adaptive one
the Army to identify promising non–high-school (Sands & Waters, 1997). This procedure was more
graduates using a combination of AIM, a body mass efficient and precise than a more conventional one.
index, and two ASVAB subtests. This new program Computerization made it possible to adapt item dif-
was known as the Tier Two Attrition Screen (TTAS). ficulty to individual ability, and also made printing
The initial results were disappointing. There was far and distribution of test materials and answer sheets
greater evidence of faking on the AIM when used in unnecessary.
this operational trial than there had been in a In 1979 the Department of the Navy was named
research context (Young et al., 2004). Researchers as Executive Agent for a project to develop a com-
went back to the drawing board to see if the prob- puterized adaptive version of the ASVAB (Martin &
lem could be fixed. What they found was that, while Hoshaw, 1997). Initially, a three-year project was
some AIM items performed worse in the opera- envisioned, but it soon became apparent that this
tional context, others worked just fine. Thus, with timeline was too ambitious for the challenges
some fine-tuning, they were able to improve the involved. The size and scope of military aptitude
AIM’s effectiveness (Young et al., 2004; Young & testing generated delivery system requirements and
White, 2006). The revised TTAS has been used in a technical challenges that were beyond the existing
new trial program now for several years, with con- state of the art. The types of microcomputers
siderable success. available at the beginning of the project were not

136 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


adequate to the need. As the Navy moved forward to determine whether this battery could “predict suc-
to meet the technical challenges, the sophistication cess on learning tasks more accurately” than a battery
of computer systems gradually advanced to the point such as the ASVAB (Kyllonen, 1994, p. 103).
where they could be incorporated into the project The LAMP researchers developed a taxonomy
(McBride, 1997). A staged approach with more flex- that combined elements of information processing
ible timelines replaced the three-year plan. and basic content categories. The information pro-
However, in 1985, perceived urgency required cessing elements were working memory, processing
the Navy lead laboratory, the Navy Personnel speed, declarative knowledge, declarative learning,
Research and Development Center (NPRDC), to and procedural learning. The cognitive categories
implement an accelerated approach. In 1987 a were verbal, quantitative, and spatial.
review of the cost–benefit ratio of CAT-ASVAB These concepts led to the development of succes-
failed to demonstrate that the new testing program sive versions of an experimental test battery known
could provide dollar savings in personnel budgets. as Cognitive Abilities Measurement (CAM). CAM
The Director of Accession Policy in the Department was designed according to a matrix in which the
of Defense concluded that “support for computer- information processing elements were represented
ized testing could be strengthened by emphasizing as rows and the content categories as columns, and
the potential for use of new types of computerized tests were designed for each cell. Thus, for example,
cognitive tests” (Martin & Hoshaw, 1997, p. 18). there were verbal tests for each information process-
Thus, the Enhanced Computerized Administered ing category (Kyllonen, 1994).
Test (ECAT) validity research investigation was The LAMP project was a major theoretical
approved in 1988. Each of the services had devel- achievement, although practical implementations
oped tests that could be adapted to a computerized within the military have thus far been limited.
battery, but were not at that time part of the enlisted Kyllonen (1994) asserted that the CAM battery pre-
accessions testing program. Several tests were dicted performance on learning tasks more accu-
selected for experimental administration, to be vali- rately than the ASVAB. However, its length was also
dated against measures of training success. The greater than that of the ASVAB (Kyllonen, 1994),
research was completed in 1992 (Martin & Hoshaw, and Schmidt (1994) questioned if it could really
1997). Certain of the tests were found to improve provide additional validity beyond that which the
upon the predictive validity of the ASVAB, although ASVAB could provide.
not by more than a few points. One test from the
Army’s Project A, Assembling Objects, was approved Recent Developments
for inclusion in the ASVAB (for a complete descrip- As the twenty-first century approached, each of the
tion of the ECAT project, see Wolfe, 1997). services was involved with advances in digital com-
By this time, the environment was more favor- munication systems and technological advances in
able to the CAT-ASVAB concept. In 1993, imple- weaponry and other equipment, while recognizing
mentation of CAT-ASVAB across all Military that the military was confronting an unprecedented
Entrance Processing Stations (MEPS), beginning in diversity of missions. Thus, each began considering
1995, was approved. (Martin & Hoshaw, 1997). what implications these changes might have for
their personnel systems. The Navy’s vision for future
Learning Abilities Measurement personnel research was captured in the document
Program (LAMP) “Sailor 21” (Navy Personnel Research Studies and
One of the more ambitious efforts to improve Technology, 1998), which identified the need to
enlisted selection and classification was the Air expand “our view of the predictor and criterion
Force’s Project LAMP, a basic research effort that space” (p. 30). An Air Force researcher noted,
began in 1981 and continued until 1998 “tomorrow’s weapons . . . will require people to
(Weissmuller & Schwartz, 2007). The program was operate and maintain them who have the requisite
designed with the goal to “identify and assess quali- skills, perhaps many of which we can only vaguely
fied applicants who failed to meet minimum cutoffs imagine” (Looper, 1997, p. 272). The Army
on standardized tests” (Weissmuller & Schwartz, launched a research project known as 21st Century
2007, The Learning Abilities Measurement Program Soldiers, in an attempt to determine what implica-
[Project LAMP] section, para. 1). The principal tions changing conditions might have for required
focus of the program was to develop a test battery characteristics for future success. The results, while
“constructed on the basis of cognitive theory” and not conclusive, suggested that certain characteristics

ru m s ey 137
other than those currently evaluated on the ASVAB motivational types of criteria in an experimental
deserved particularly close attention. These included administration (Heffner & White, 2009) and was
peer leadership, cognitive flexibility, and self-esteem approved for administration in the Military Entrance
(Knapp & Tremble, 2007). Processing Stations in an initial operational test and
The Department of Defense itself convened an evaluation (IOT&E). That is, it was to be used in a
ASVAB Review Panel to examine whether changes limited way for selection decisions for a three-year
to this centerpiece of the existing selection and period, after which time it would be evaluated for
classification system were needed. The panel possible future use. This IOT&E began in May,
emerged with several recommendations. One was 2009.
that “[n]oncognitive measures should be included Progress has also been made on the other panel
in the battery of tests used for classification” recommendations. For example, the DOD spon-
(Drasgow et al., 2006, p. iv). The panel also recom- sored research to review existing information and
mended that a “test of information and communi- communications technology literacy tests (Russell
cations technology literacy” (Drasgow et al., 2006, & Sellman, 2008; Trippe & Russell, 2008) and
p. iii) and one or more nonverbal reasoning tests be nonverbal reasoning tests (Waters, Russell, &
developed and considered for inclusion in enlisted Sellman, 2007).
testing. Furthermore, the panel generated recom- Recent developments in classification theory are
mendations addressing job analysis and validation also worth noting. Since 1976, all services have used
issues. the ASVAB, but each service has combined the
In conjunction with the services, the Department ASVAB tests for classification purposes in a manner
of Defense generated a plan for implementing the that served its own needs best. Zeidner and Johnson
panel’s recommendations (Sellman, 2007). Consid- (1994), in advancing an approach they labeled
erable work on non-cognitive measure development “Differential Assignment Theory,” argued that exist-
was already underway. Both the Navy and the Army ing approaches for developing composites relied too
were involved in the development of computer- much on determining which set of tests best pre-
adaptive personality tests, employing breakthroughs dicted performance for a given set of jobs. They
achieved by Stark, Chernyshenko, and Drasgow noted that differential validity, which focused on
(2005) to build and score paired-comparison forced- the extent to which a test or set of tests differentially
choice measures. The Navy’s version, the Navy predicted performance across jobs, deserved greater
Computer Adaptive Personality Scales (NCAPS), emphasis. They argued that the critical metric
has been found to be resistant to faking in a research should be mean predicted performance, a concept
environment (Underhill, Lords, & Bearden, 2006) originated by Brogden (1959) that was “a function
and to show promising levels of validity for predict- of predictive validity, intercorrelationships among
ing performance (Borman et al., 2009). the least-square estimates of job performance, and
The Army’s version has advanced to the opera- the number of job families” (Zeidner & Johnson,
tional testing level. In 2007, the success of the TTAS 1994, p. 379). They developed a procedure for cal-
program stimulated Army leaders to examine culating mean predicted performance through a
whether non-cognitive measures might be used in complex simulation process, and have demonstrated
screening high school students as well as the non– that their methodology can improve classification
high school students included in TTAS testing. The efficiency.
U.S. Army Research Institute for the Behavioral
and Social Sciences (ARI) initiated a trial of a collec- Officer Screening
tion of non-cognitive measures, including one Aviation Screening
known as the Tailored Adaptive Personality Each of the services has invested heavily in research
Assessment System, or TAPAS. The TAPAS, unlike on aircrew screening. Hunter (1989, p. 129)
the AIM, was designed to be administered in a explained this situation by noting, “[p]ilot training
computer adaptive format. It also incorporated is, almost without exception, the most expensive of
paired items that were more closely paired in judged the many training programs conducted by the mili-
desirability, thus hopefully even less fakeable. A tary services.”
number of studies demonstrated that the TAPAS The Air Force, Navy, and Marine Corps all use
format was effective in reducing faking (Drasgow, commissioned officers as pilots. The Army presents
Stark, Chernyshenko, 2007). TAPAS was found to a special case. After the Air Corps was split off from
be an effective predictor of both proficiency and the Army, the Army focused predominantly on

138 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


rotary wing aircraft (helicopters), and used warrant development of a number of iterations of the
officers to fly these. Thus, the discussion of aviation Aircrew Classification Battery that included a wide
screening through World War II will focus on the variety of both paper and pencil tests and psycho-
Army Air Force and Navy, while the examination of motor tests (Brown, 1989), which were administered
postwar screening will have one section on Air Force between 1942 and 1947. This was followed by a
and Navy research and a separate section on Army “period of experimentation” (Rogers, Roach, &
research. Short, 1986, p. 7) during which a number of mea-
sures were tried. Meanwhile, the Aircrew Classifi-
screening through world war ii cation Battery was discontinued in 1947 and
Beginning in 1920, special examinations on school reinstated in 1951 (Rogers, Roach and Short,
subjects were used to determine qualifications for 1986).
candidates for Army aviation training flight training The Aircrew Classification Battery represented a
who did not have the minimum educational require- major step forward in terms of military classifica-
ments. In the Navy, when screening of flight candi- tion tests. Considerable initial effort was made to
dates extended beyond physical assessment in the identify tests that would be appropriate for particu-
1920s, it included a “psychological interview of lar types of jobs. Thus, for example, the initial bat-
sorts” which addressed the candidate’s character, tery “included four different types of mathematics
motivation, and intelligence (Brown, 1989, p. 118). tests believed to be especially important for the nav-
Then came World War II, which was a particu- igator; tests of dial and table reading also believed to
larly productive time for the development of officer be of primary importance in selecting navigators;
aviation tests. “By the end of World War II, and three tests involving speed of recognition of forms
certainly by the early 1950s, the present state-of- which were considered to be especially important to
the-art had been achieved with respect to paper- pilots and the bombardier” as well as various other
and-pencil testing,” (pp. 1, 3) reported North and types of tests, including five “apparatus” (psycho-
Griffin in 1977. When World War II began, “the motor) tests (Flanagan, 1948, p. 64). The explora-
requirements for aircrew personnel increased dra- tion of the use of psychomotor tests, particularly for
matically” (Rogers, Roach, & Short, 1986, p. 4). A pilots, began in 1941 in response to a concern that
screening examination known as the Aviation Cadet failures in flight training were sometimes related to
Qualifying Examination was developed and insti- “‘poor coordination’ and other categories of pre-
tuted in 1942. It contained reading comprehension, sumed psychomotor deficiency . . .” (Melton, 1947,
mathematics, mechanical comprehension, and p. 1). Psychomotor testing as part of the Classification
information items, as well as “questions presenting Battery was discontinued in 1955 (Carretta & Ree,
practical problems which might be met, not only in 1993), “largely due to the unreliability of the elec-
flying, but in everyday activities. . .” (Flanagan, tromechanical apparatus ”(Hunter, 1989, p. 146).
1948, p. 54). It was designed to predict success in Similarly, the Navy conducted a comprehensive
training. In 1944, when the use of the test was evaluation of a wide variety of psychological, physi-
expanded to screen candidates for the enlisted cal, and psychomotor tests for use in screening avia-
gunner job, it was renamed the Army Air Forces tor candidates in World War II. Based on their
Qualifying Examination (Rogers, Roach, & Short, validity in predicting future outcomes, a battery of
1986). tests was identified that included a general cognitive
During the course of the war, the Army Air Force ability test, the Wonderlic Personnel Test, a mechan-
conducted a comprehensive program to develop ical comprehension test, and a biographical inven-
screening measures for three classes of jobs: pilot, tory (Brown, 1989; Waters 1997). Each of these
navigator, and bombardier. The program began with components was found to correlate significantly
job analyses for each type of job. These analyses with success in training (Hunter, 1989; Fiske,
involved the accumulation of various types of data, 1947), although Ames and Older (1948) noted that
including faculty board proceedings and evaluation the success of the Wonderlic was most prominent
board reports, as well as formal job analyses that among “low score groups” (p. 533). The Wonderlic
consisted of checklists of such things as job duties and the mechanical test made up the Flight Aptitude
and the nature of one’s work. Twenty traits were Rating (FAR), which was first administered in 1942.
identified, divided into four categories: intellectual, The FAR has been shown to be a valid predictor
perceptual, temperamental, and psychomotor (r = .63, corrected) of success in training (North &
(Guilford & Lacey, 1947). This research led to the Griffin, 1977). In 1944, the Aviation Classification

ru m s ey 139
Test (ACT), another test of general cognitive ability Candidate Selection Method (PCSM) score that is
that also contained elements specific to aviation used in pilot selection (Reimer, 2006).
tasks, replaced the Wonderlic. The ACT, like the The Air Force’s Ernest Tupes and Raymond
Wonderlic, “was found to predict academic failures Christal were pioneers in identifying what has
(ground-school training) fairly well, but to be of no become a commonly accepted structure of personal-
value in predicting flight-training failures” (Ames & ity dimensions. The structure, known as “the Big
Older, 1948, p. 533). The FAR and the ACT were Five,” consists of five dimensions: agreeableness,
combined to constitute the Navy’s aviation selection conscientiousness, extroversion, neuroticism, and
battery (Brown, 1989). openness (1961). Between 1993 and 2004, Christal
led an effort to develop an instrument based on the
postwar aviation testing: air force Big Five. The result was the Self Description
and navy Inventory Plus, which added two dimensions to the
For both the Air Force and the Navy, these early Big Five: Service Orientation and Team Orientation.
efforts formed the basis for later testing. The Air The Self Description Inventory Plus became part of
Force tests helped pave the way for the development the AFOQT in 2005, although it was not at that
of the Air Force Officer Qualifying Test in 1951. time used for operational selection or classification
The AFOQT contained many of the same kinds of (Weissmuller & Schwartz, 2007).
content areas prevalent in the earliest Air Force tests, In 1953, the Navy introduced the Aviation
including current affairs, mathematics, reading Selection Test Battery (ASTB), which added a spatial
comprehension, and biographical information. A test but otherwise maintained the same general
total of 16 tests also included such areas as general content categories as in the FAR and ACT. The
science, aerial orientation, and visualization of Aviation Classification Test was renamed the Aviation
maneuvers (Valentine & Craeger, 1961). This four- Qualification Test and new forms were developed at
hour comprehensive battery has been in continuous this time. Further revisions were made in 1971 and
use since, while undergoing several revisions (Waters, a new test for non-aviation candidates was added
1997). It yields five composite scores: Pilot, (Brown, 1989). The ASTB remains in use today for
Navigator-Technical, Academic Aptitude, Verbal, Navy, Marine Corps, and Coast Guard aviation
and Quantitative. It has been used for both selection programs, and includes the following components:
and classification (Waters, 1997). Across a multitude math skills, reading skills, mechanical comprehen-
of investigations, validities generally in the range of sion, spatial apperception, aviation and nautical
.20 to .40 have been reported (Brown, 1989). information, and an aviation supplemental test
Interest in psychomotor testing and other “appa- (Naval Aerospace Medical Institute, 2010b).
ratus-based” testing for Air Force pilot selection The Biographical Inventory, which was included
continued even after such testing was dropped from among the tests adopted in World War II, was also
the Aircrew Classification Battery in 1995. Many initially included in the ASTB (Frank & Baisden,
studies conducted in the 1970s showed promising 1993). A factor analysis by Stricker (2005) identi-
results, and in 1981, a “project to develop and vali- fied five factors: (1) commissioned officer, (2) sci-
date a computer-based test system known as the ence and engineering interests, (3) flight experience,
Basic Attributes Test (BAT)” (Carretta & Ree, 1993, (4) masculine activities, and (5) school athletics. As
p. 190) was launched. The BAT measured not only Stricker wrote in 1993, when a revised version of
psychomotor abilities, but “cognitive abilities, per- the measure was still in use: “This device has consis-
sonality, and attitudes toward risk” (Carretta, & tently been one of the most valid predictors of
Ree, 1993, p. 192) as well. The BAT was imple- retention vs. attrition in the battery, overshadowing
mented for pilot selection in 1993 and has been tests of general ability, mechanical comprehension,
shown to be a valid predictor of pilot training suc- spatial ability, and aviation information” (p. 7).
cess (Carretta, Zelenski, & Ree, 2000). It was However, the official website for the ASTB now
replaced in 2007 by the Test of Basic Aviation Skills explains: “Although the [Biographical Inventory]
(TBAS), described as a measure of “psychomotor was initially a powerful predictor of attrition, its
skills proven to be correlated to the completion of ability to predict which students will complete avia-
Specialized Undergraduate Pilot training, including tion training has essentially declined to zero over a
hand-eye coordination and listening response” period of years and thus, was suspended” (Naval
(Reimer, 2006, para. 6). The TBAS is combined Aerospace Medical Institute, 2010a, “What is the
with AFOQT and flying hours to produce a Pilot Biographical Inventory,” para. 1).

140 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


postwar aviation testing: army since 1958. Validity studies have demonstrated the
The Army continued to develop aviation selection utility of this composite for predicting academic
tests, following the creation of the Air Force as a performance in the United States Military Academy
separate branch, to meet its remaining needs for (USMA) (Brown, 1987; Davidson, 1977). These
rotary-wing and fixed-wing (airplane, jet) pilots. The whole-person programs have historically incorpo-
Army, unlike the other services, had training pro- rated data from such available sources as the
grams for both commissioned officer and warrant Scholastic Aptitude Test (SAT), the American
officer pilots. Warrant officer pilot selection, typi- College Test (ACT), and high school rank. In fact,
cally associated with helicopter assignments, created Arabian and Shelly in 2000 reported that all service
special challenges, because candidates could qualify ROTC programs and all service academies made
with only a high school education. As it became use of such data in their screening programs.
clear that leader training for these pilots was needed, While the services have derived much benefit
and that the “double hurdle of leader-pilot prerequi- from the use of scores from tests and measures
sites was one many applicants could not negotiate” obtained from outside sources, of more interest for
(Drucker & Kaplan, 1966, p. 30), the need to this chapter are efforts to develop measures specifi-
develop a new selection battery became evident. cally for the purpose of officer selection. In the
After developing a number of interim batteries course of discussing the Air Force, Navy, and Marine
relying heavily on Air Force and Navy tests, the Corps’ aviation screening programs, this chapter has
Army implemented the Flight Aptitude Selection also touched on a number of such developments
Test (FAST) in 1966. The test had two forms—one that impact pre-commissioning screening. These
for commissioned officers for fixed-wing training, include the development of the Navy’s ASTB and
and one for warrant officers for rotary-wing train- the Air Force’s AFOQT (Arabian and Shelby,
ing. Four content areas were identified: “(1) bio- 2000).
graphical data and interest information, (2) spatial This review has not yet addressed developments
ability, (3) mechanical ability, and (4) aviation in the Army’s history of pre-commissioning screen-
information” (Brown, Dohme, & Sanders, 1982, ing, which is separate and distinct from its aviation
p. 1174). Validities for predicting flight grades were screening process. Thus, the rest of this section will
obtained in the range of .38 to .44. For a variety of be devoted to such developments. Separate research
job-related and test-related considerations, a shorter activities were associated with each of the three pri-
Revised FAST (RFAST) was developed and imple- mary pre-commissioning programs: USMA, the
mented in 1980. It also predicted training success Army Reserve Officers Training Corps (ROTC),
well, with a validity of .33 (Brown, Dohme, & and Officer Candidate School (OCS). As noted ear-
Sanders, 1982). This instrument was later replaced lier, the Army Alpha test developed in World War I
by the Alternate Flight Aptitude Selection Test was administered to officers as well as enlisted sol-
(AFAST). diers (Zeidner & Drucker. 1988). Otherwise, there
In 2004, the Army began work on the develop- was little systematic screening of officer candidates
ment of a new Selection Instrument for Flight prior to World War II. In response to a congressio-
Training (SIFT). A prototype battery was devel- nal mandate in 1812, West Point administered a
oped, which included “measures of task prioritiza- fairly basic examination in reading, writing, and
tion, perceptual speed and accuracy, motivation to arithmetic until 1902, when high school graduation
become an Army aviator, and several personality replaced the test as a requirement for entry (Ambrose,
traits” (Bruskiewicz et al., 2007, p. v). 1966).
Significant research on Army officer selection
Pre-commissioning Screening began with the Second World War. At USMA, the
The most common type of pre-commissioning initial concern was predicting academic success
screening program is based on a “whole person” (Ambrose, 1966). The Personnel Research Branch
concept, in which different types of indicators are of the Adjutant General’s Office, later known as the
combined to gauge the probability of the candidate’s Army Research Institute for the Behavioral and
success. An example is the formula used in West Social Sciences (ARI), developed a test with language
Point selection, where measures of academic poten- and math components during the war. After the war,
tial (weighted 60%), leadership potential (30%), attention turned to the more elusive concept of lead-
and physical proficiency (10%) are combined for ership. A West Point Biographical Inventory, com-
screening purposes. This formula has been in use posed of measures of personal history, personality,

ru m s ey 141
and background, was developed in 1947 (Brogden conducted in the late 1960s and early 1970s, led to
& Burke, 1950). Its success in predicting important the development of the Cadet Evaluation Battery in
outcomes was limited, a shortcoming that was 1972 (Rumsey & Mohr, 1978). The technical-
judged to be related to a tendency for respondents managerial cognitive subtest was used for selection
to respond in a way that would maximize their into the ROTC Advanced Course beginning in
scores, rather than reflect their true characteristics. 1978. This test, under a different name, was also
Developers then turned to a forced-choice approach, used for selection into OCS beginning in 1979.
designed to counteract this tendency to “fake,” and The second major historical event in Army offi-
more positive linkages to ratings of leadership were cer selection research was stimulated by a recom-
found, with correlations ranging from .27 to .29 mendation by a 1977 Army study group for a more
(Brogden, Burke, & Frankfeldt, 1952). “performance-based” approach to Army pre-
Also in the period following the war, a screening commissioning assessment (Department of the
measure was developed for entrance into the Army Army, 1978). Following this recommendation, the
ROTC Advanced Course. The ROTC Qualifying Army Research Institute (ARI) built three types of
Examination, consisting of quantitative and verbal measures based on a job analysis conducted to iden-
tests, was found to be a good predictor of academic tify critical officer performance dimensions.
grades. Similarly, the Officer Candidate Test, testing Situational exercises based on standard platoon-
arithmetic reasoning, reading comprehension, and leader types of tasks were generated for an ROTC
data interpretation, was developed for Army OCS assessment center. A structured interview was a
selection in 1942 (Parrish & Drucker, 1957). second assessment tool (Rogers et al., 1982); and a
There have been three major developments in paper and pencil test assessing a variety of cognitive
the history of Army officer selection research since skills, the Officer Selection Battery (OSB), was a
World War II. One was the Officer Prediction pro- third (Fischl et al., 1986). The OSB was found to
gram, stimulated by a perception in the mid-1950s predict ratings of performance and potential by
that ROTC selection procedures were deficient in ROTC instructors with validities of .21 to .29 and
their assessment of leader potential, particularly final grade in post-commissioning training at an
combat leadership potential. A wide variety of cog- average level of .52 (Fischl et al., 1986). The OSB
nitive, physical and non-cognitive measures were was incorporated into the ROTC selection system
developed for administration to officers who par- as part of a “whole person assessment,” although the
ticipated in an assessment center consisting of inte- Scholastic Aptitude Test and American College Test
grated military exercises (e.g., inspecting vehicles, eventually replaced it for that purpose.
directing evacuation of an office) administered over The third major development is ongoing. It
a three-day period in an escalating hostilities simu- involves renewed interest and application of non-
lation (Helme, Willemin, & Day, 1971; Helme, cognitive measures in selection into precommis-
Willemin, & Grafton, 1974). sioning training programs. USMA has been
The outcome of the Officer Prediction program exploring the potential predictive value of two per-
was to identify measures that differentially predicted sonality dimensions in particular, hardiness and
performance in three different types of scenarios: grit. Hardiness “refers to a specific set of attitudes
technical, administrative, and combat. Since the and skills” that lead to “resilience and growth in
predictors for technical and administrative tasks stressful situations (Maddi et al., 2010). In a project
were comparable, these were combined. On the known as the Baseline Officer Longitudinal Data
cognitive side, tests of knowledge of tactics and Set, or BOLDS, hardiness, a social judgment mea-
practical skills were good predictors of combat lead- sure, and cognitive measures such as the SAT and
ership performance; while measures of knowledge ACT, were all found to relate to performance of
of history, politics and culture, and math and physi- USMA cadets (Bartone, Snook, & Tremble, 2002).
cal science were good predictors of technical- More recent investigations of the relationship
managerial leadership. A number of measures of between hardiness and performance of USMA
non-cognitive dimensions, including endurance cadets have also shown positive results (e.g., Bartone
and physical leader, predicted combat leadership, et al., 2009; Maddi et al., 2010).
and measures of such non-cognitive constructs as Grit “entails working strenuously toward chal-
verbal/social leader and scientific interest predicted lenges . . . over years despite failure, adversity, and
technical-managerial leadership (Helme, Willemin, plateaus in progress” (Duckworth et al. 2007,
& Grafton, 1974). The results from this research, pp. 1087-1088). Grit has been associated with

142 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


completion of a USMA summer training program the ASVAB was much greater for technical than for
(Duckworth et al.). motivational criteria. This project incorporated,
ARI has been exploring the link between “ratio- so far as is known, the most comprehensive set of
nal” biodata measures and performance in precom- criteria ever used in personnel research, but the pur-
missioning programs. An item in a rational biodata suit of improved performance measures has not
inventory is developed to represent a particularly ended. The hands-on measures used in Project A
promising construct, rather than being selected on represented discrete job tasks, but could not be
an arbitrary basis or because of a random observed supposed to represent the full complexity of perfor-
relationship with some outcome. Building on some mance, with the flow of events, interrelationships
promising findings using this approach (e.g., among activities, and unexpected interruptions
Kilcullen et al., 1999), the Army has investigated its characteristic of a realistic job environment. Other
potential use in predicting success in precommis- measures, such as rating scales, supplemented the
sioning training. Measures incorporating rational hands-on measures, but these have limitations as
biodata scales been found to relate to attrition well. Currently, research is being conducted on
(Putka, 2009) and performance (Putka et al., 2011) more complex simulations in an attempt to repre-
in ROTC and to performance in OCS (Allen et al., sent job performance more fully.
2011). The measure that performed so well in the Although the joint service project to link enlist-
ROTC context has now been implemented as part ment standards to job performance demonstrated
of the process of selecting recipients of four-year that the services can conduct validation research
ROTC scholarships (Putka et al., 2011). using job performance measures, there have been
very few attempts to do so before or since. Generally,
Conclusions and Future Directions the effort is viewed as prohibitively expensive.
As noted earlier, enlisted and officer selection are Suitable performance measures are not available
very different worlds. The services can now claim a operationally, so they have to be developed in order
sophisticated, state of the art selection and classifica- for the validation effort to be conducted. The pro-
tion system for enlisted service members. Their tests cess of determining what should be measured, and
are highly reliable, cover a wide range of content, then developing measures that sufficiently encom-
and have been shown to be highly predictive of per- pass critical requirements for a particular job, is a
formance both in training and on the job. The com- major endeavor. The more jobs that need to be cov-
puter adaptive delivery system is highly efficient and ered, the greater the effort.
psychometrically sound. Tests are updated periodi- This criterion problem is a difficult, but hope-
cally to counteract obsolescence and compromise. fully not insurmountable, one. To the extent that
Person–job match is achieved through a sophisti- the military’s job analysis techniques can be designed
cated system involving empirically based matching to efficiently distinguish the elements that are criti-
of groups of tests with groups of jobs. cal for selection and classification from those that
Yet the services are not likely to stand on the are not, the effort of developing performance mea-
status quo. The demands on the testing system are sures can be reduced. Insofar as similar requirements
great, and testing technology and policy must across different jobs can be recognized as inter-
advance to meet the challenges ahead. Although the changeable for selection and classification purposes,
tests are soundly constructed, much of the work the necessity of developing separate measures for
that the selection of tested constructs is based on each job can be alleviated.
was accomplished many decades ago. The services
have conducted much research to explore new con- Officer Selection
tent areas. Some of this work, particularly in the Although there have been many important develop-
non-cognitive realm, is now beginning to affect the ments in officer selection research, officer selection
accessioning process. has typically not received the same level of attention
Some might argue that existing tests have proven as enlisted selection. The requirement that officers
to be so valid for the prediction of performance that have a college degree has greatly reduced the need for
further development is unnecessary. However, the additional screening for them. However, consider-
strength of this argument depends partly on the com- able expense can be saved by screening out from
prehensiveness and representativeness of the perfor- costly pre-commissioning training programs those
mance criteria that have been used in the validation who have neither the inclination nor the ability to be
analyses. Project A demonstrated that the validity of effective military officers. Non-cognitive measures

ru m s ey 143
can be useful in identifying those who have the intan- (Chair), Predicting leader performance: Insights from Army offi-
gible qualities associated with effective leadership. cer research. Symposium conducted at the annual meeting of
the American Psychological Association, Washington, DC.
Historically, there has been considerable investment Ambrose, S. E. (1966). Duty, honor, country: A history of West
in the development of non-cognitive officer selec- Point. Baltimore: The Johns Hopkins Press.
tion measures, and recently there has been a revival Ames, V. C., & Older, H. J. (1948). Chapter II: Aviation psy-
of this approach. chology in the United States Navy. Review of Educational
The challenges of improving current methods of Research, 18, 532–542.
Arabian, J. M., & Shelby, J. A. (2000). Policies, procedures, and
describing jobs and measuring performance are not people: The initial selection of U.S. military officers (pp. 1–1
unique to enlisted selection. To some extent, because to 1–7). In Officer selection. Cedex, France: Research
there is less diversity and more generality in officer Technology Organization, NATO.
jobs, the challenges are somewhat less than on the Bartone, P. T., Eid, J., Johnsen, B. H., Laberg, J. C., & Snook, S. A.
enlisted side. However, officer jobs tend to entail (2009). Leadership and Organization Development Journal,
30(6), 498–521.
more complexity, and in that respect are more dif- Bartone, P. T., Snook, S. A., & Tremble, T. R. (2002). Cognitive
ficult to define and present more performance mea- and personality predictors of leader performance in West
surement difficulties than enlisted jobs. Thus, efforts Point cadets. Military Psychology, 14, 321–338.
to develop new methods of job description and per- Borman, W. C., Schneider, R. J., Houston, J. S., & Bearden, R. M.
formance measurement for military applications (2009). The Navy Computerized Adaptive Personality Scales:
Evidence for validity (abstract, briefing slides). Paper pre-
will need to consider the unique characteristics of sented at the 51st annual meeting of the International
job requirements for both the enlisted and officer Military Testing Association, Tartu, Estonia.
populations. Brogden, H. E. (1959). Efficiency of classification as a function
of number of jobs, percent rejected, and the validity and
intercorrelation of job performance estimates. Educational
Final Words and Psychological Measurement, 19, 181–190.
The military today has a well-deserved reputation Brogden, H. E., & Burke, L. (1950). Validation of the West Point
for the quality of its service members. Quality may Biographical Inventory, WPB-1, against first-year Aptitude for
be viewed as the product of potential as measured at Service ratings. (Rep. No. 829). Washington, DC: Personnel
Research Section, Personnel Research and Procedures Branch,
entry, training, and experience. The “potential”
Adjutant General’s Office (Army).
component of this equation is not determined Brogden, H. E., Burke, L. K., & Frankfeldt, E. (1952). Validation
entirely by the selection tools employed. Unless a of the West Point Personal Inventory (Rep. No. 882).
significant pool of applicants is available, the utility Washington, DC: Personnel Research Section, Personnel
of any screening system will be limited. However, Research and Procedures Branch, Adjutant General’s Office
(Army).
having an ample applicant pool does not guarantee
Brown, D. C. (1987). Military officers: Commissioning sources and
that those chosen will meet the services’ needs. selection criteria (Final Rep. No. 87-42). Alexandria, VA:
Some means of separating those with high potential Human Resources Research Organization.
from the remainder is another essential requirement Brown, D. C. (1989). Officer aptitude selection measures. In
for that. The services have devoted substantial effort M. F. Wiskoff & G. M. Rampton (Eds.), Military personnel
measurement: Testing, assignment, evaluation (pp. 97–127).
and resources over many decades to develop the best
New York: Praeger.
tools for that purpose. In an era of multiple threats Brown, W. R., Dohme, J. A., & Sanders, M. G. (1982). Changes
and mounting personnel costs, the payoff for in the U.S. Army aviator selection and training program.
improved screening and assignment procedures will Aviation, Space, and Environmental Medicine, 53, 1173–1176.
only get greater. This speaks to the need to identify Bruskiewicz, K. T., Katz, L. C., Houston, J., Paulin, C., O’Shea,
G., & Damos, D. (2007). Predictor development and pilot test-
and overcome those obstacles to an optimal selec-
ing of a prototype selection instrument for Army flight training
tion and classification system that still remain. (Tech. Rep. No. 1195). Arlington, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
Campbell, J. P., & Knapp, D. J. (Eds.) (2001). Exploring the
Acknowledgment limits in personnel selection and classification. Mahwah, NJ:
The author wishes to express his great appreciation Erlbaum.
to Dorothy Young for her help in locating many of Carey, N. B. (1992). Does choice of a criterion matter? Military
the articles referenced in this chapter. Psychology, 4, 103–117.
Carretta, T. R., & Ree, M. J. (1993). Basic Abilities Test:
Psychometric equating of a computer-based test. The
References International Journal of Aviation Psychology, 3, 189–201.
Allen, M. T. Babin, N. E., Oliver, J. T., & Russell, T. L. (2011). Carretta, T. R., Zelenski, W. E., & Ree, M. J. (2000). Basic
Predicting leadership performance and potential in U. S. Attributes Test (BAT) retest performance. Military Psychology,
Army Officer Candidate School (OCS). In M. G. Rumsey 12, 221–232.

144 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


Christal, R. E. (1969). Comments by the chairman. In R. E. Hedge, J. W., & Teachout, M. S. (1992). An interview approach
Christal (Chair), Division of Military Psychology Symposium: to work sample criterion measurement. Journal of Applied
Collecting, analyzing, and reporting information describing Psychology, 4, 453–461.
jobs and occupations. Proceedings, 77th Annual Convention of Heffner, T. S., & White, L. A. (2009, September). Expanded
the American Psychological Association, 77–85. Enlistment Eligibility Metrics (abstract). Paper presented at
Davidson, T. G. (1977). CEER/ACEER as a predictor of academic U.S. Army Accessions Command Research Consortium,
grade point average. (Rep. No. 77-014). West Point, NY: Hampton, VA.
Office of the Director of Institutional Research, United Helme, W. H., Willemin, L. P., & Day, R. W. (1971). Psychological
States Military Academy. factors measured in the Differential Officer Battery (Tech.
Department of the Army (1978). A review of education and train- Research Rep. No. 1173). Arlington, VA: U.S. Army
ing for officers (RETO). Washington, DC: Department of the Behavior and Systems Research Laboratory.
Army. Helme, W. H., Willemin, L. P., & Grafton, F. C. (1974).
Department of Defense (1985). Defense manpower quality, Vol. I. Prediction of officer behavior in a simulated combat situation
Washington, DC: Office of the Assistant Secretary of Defense (Research Rep. No. 1182). Arlington, VA: U.S. Army
(Manpower, Installations, and Logistics). Research Institute for the Behavioral and Social Sciences.
Drasgow, F., Embretson, S. E., Kyllonen, P. C., & Schmitt, N. Hunter, D. R. (1989). Aviator selection. In M. F. Wiskoff &
(2006). Technical review of the Armed Services Vocational G. M. Rampton (Eds.), Military personnel measurement:
Aptitude Battery (Final Rep. No. 06-25). Alexandria, VA: Testing, assignment, evaluation (pp. 129–167). New York:
Human Resources Research Organization. Praeger.
Drasgow, F., Stark, S., & Chernyshenko, S. (Aug. 2007). Kilcullen, R. N., Mael, F. A., Goodwin, G. F., & Zazanis, M. M.
Developing TAPAS (Tailored Adaptive Personality Assessment (1999). Predicting U. S. Army Special Forces field perfor-
System. Presentation to meeting of the Military Accession mance. Human Performance in Extreme Environments, 4(1),
Policy Working Group, Monterey, CA. 53–63.
Drucker, A. J., & Kaplan, H. (1966). Identifying successful Knapp, D. (2006). The U.S. Joint-Service Job Performance
pilots through research. U.S. Army Aviation Digest, 12, 29–32. Measurement Project. In W. Bennett, C. E. Lance, &
Duckworth, A. L., Peterson, C., Matthews, M. D., & Kelly, D. D. J. Woehr (Eds.), Performance measurement: Current per-
R. (2007). Grit: Perseverance and passion for long-term spectives and future challenges (pp. 113–140). Mahwah, NJ:
goals. Journal of Personality and Social Psychology, 92(6), Erlbaum.
1087–1101. Knapp, D. J., & Tremble, T. R. (Eds.) (2007). Concurrent valida-
Farmer, W. L., Bearden, R. M., Eller, E. D., et al. (2003). JOIN: tion of experimental Army enlisted personnel selection and clas-
Job and Occupational Interest in the Navy. Proceedings, 45th sification measures (Tech. Rep. No. 1205). Arlington, VA:
Annual Conference of the Military Testing Association, 62–69. U.S. Army Research Institute for the Behavioral and Social
Faulkner, R. N., & Haggerty, H. R. (1947). Personnel research Sciences.
and development in the Bureau of Naval Personnel: History Kyllonen, P. C. (1994). Cognitive abilities testing: An agenda for
and scope of the program. In D. B. Stuit (Ed.), Personnel the 1990s. In M. G. Rumsey, C. B. Walker, & J. H. Harris,
research and test development in the Bureau of Naval Personnel Personnel selection and classification (pp. 103–125). Hillsdale,
(pp. 3–11). Princeton: Princeton University Press. NJ: Erlbaum.
Fischl, M. A., Edwards, D. S., Claudy, J. G., & Rumsey, M. G. Kroeker, L. P. (1989). Personnel classification/assignment models.
(1986). Development of Officer Selection Battery Forms 3 and In M. F. Wiskoff & G. M. Rampton (Eds.), Military person-
4 (Tech. Rep. No. 603). Alexandria, VA: U.S. Army Research nel measurement: Testing, assignment, evaluation (pp. 41–73).
Institute for the Behavioral and Social Sciences. New York, NY: Praeger.
Fiske, D. W. (1947). Validation of naval aviation cadet selection Laabs, G. L., & Baker, H. G. (1989). Selection of critical tasks
tests against training criteria. Journal of Applied Psychology, for Navy job performance measures. Military Psychology, 1,
31, 601–614. 3–16.
Flanagan, J. C. (1948). The Aviation Psychology Program in the Laurence, J. H., & Ramsberger, P. F. (1991). Low aptitude men
Army Air Forces (Rep. No. 1). Washington, DC: U.S. in the military: Who profits, who pays? New York, NY:
Government Printing Office. Praeger.
Flanagan, J. C. (1954). The critical incident technique. Lightfoot, M. A., Ramsberger, P. F., & Greenston, P. M. (2000).
Psychological Bulletin, 51, 327–358. Matching recruits to jobs: Enlisted Personnel Allocation System
Frank, L. H., & Baisden, A. G. (1993). The 1992 Navy and (Special Rep. No. 41). Alexandria, VA: U.S. Army Research
Marine Corps Aviation Selection Test Battery development. Institute for the Behavioral and Social Sciences.
Proceedings of the 35th Annual Conference of the Military Looper, L. T. (1997). Changing technology and military mis-
Testing Association, 35, 14–19. sions: Impact on U.S. military personnel systems. Proceedings,
Furer, J. A. (1959). Administration of the Navy Department in 39th Annual Conference of the International Military Testing
World War II. Washington, DC: U.S. Government Printing Association, 270-275
Office. Maddi, S. R., Matthews, M. D., Kelly, D. R., Resurreccion, N.,
Guilford, J. P., & Lacey, J. I. (Eds.) (1947). Printed classification & Villarreal, B. J. (2010). Relationship between hardiness
tests: Report No. 5. Washington, DC: Government Printing and performance in challenging environments. Paper presented
Office. at the annual meeting of the American Psychological
Guinn, N., Johnson, A. L., & Kantor, J. E. (1975). Screening for Association.
adaptability to military service (Tech. Rep. No. 75–30). Maier, M. H. (1993). Military aptitude testing: The past fifty years.
Lackland Air Force Base, TX: Personnel Research Division, (Tech. Rep. No. 93-007). Monterey, CA: Personnel Testing
Air Force Human Resources Laboratory. Division, Defense Manpower Data Center.

ru m s ey 145
Maier, M. H., & Fuchs, E. F. (1969). Development of improved Putka, D. J., Kilcullen, R., Legree, P., & Wasko, L. (2011).
Aptitude Area composites for enlisted classification. (Tech. Identifying the leaders of tomorrow: Validating predictors of
Research Rep. No. 1159). Arlington, VA: U.S. Army leader potential and performance. In M. G. Rumsey (Chair),
Behavioral Science Research Laboratory. Predicting leader performance: Insights from Army officer
Maier, M. H., & Fuchs, E. F. (1972). An improved differential research. Symposium conducted at the annual meeting of the
Army classification system. (Tech. Research Rep. No. 1177). American Psychological Association, Washington, DC.
Arlington, VA: Behavior and Systems Research Laboratory. Reimer, K. (2006, July 1). AETC deploys new pilot screening test
Martin, C. J., & Hoshaw, C. R. (1997). Policy and program for FY07. Retrieved from http://www.aetc.af.mil/news/story.
management perspectives. In W. A. Sands, B. K. Waters, & asp?id=123023176/
J. R. McBride (Eds.), Computerized adaptive testing: From Rogers, D. L., Roach, B. W., & Short, L. O. (1986). Mental abil-
inquiry to operation (pp. 11–20). Washington, DC: American ity testing in the selection of Air Force officers: A brief historical
Psychological Association. overview. (AFHRL TP-86-23). Brooks Air Force Base, TX:
Matthews, W. T. (1977). Marine Corps enlisted attrition (CRC Air Force Human Resources Laboratory, Air Force Systems
No. 341). Arlington, VA: Center for Naval Analyses. Command.
McBride, J. R. (1997). Technical perspective. In W. A. Sands, Rogers, R. W., Lilley, L. W., Wellins, R. S., Fischl, M. A., &
B. K. Waters, & J. R. McBride (Eds.), Computerized Adaptive Burke, W. P. (1982). Development of the pre-commissioning
Testing (pp. 29–44). Washington, DC: U.S. American Leadership Assessment Program (Tech. Rep. No. 560).
Psychological Association. Alexandria, VA: U.S. Army Research Institute of the
Melton, A. W. (1947). Apparatus tests. Report No. 4. Washington, Behavioral and Social Sciences.
DC: U.S. Government Printing Office. Rumsey, M. G., & Mohr, E. S. (1978). Male and female factors on
Mitchell, J. L., & Driskell, W. E. (1996). Military job analysis: A the Cadet Evaluation Battery (Tech. Paper No. 331).
historical perspective. Military Psychology, 8, 119–142. Alexandria, VA: U.S. Army Research Institute for the
Naval Aerospace Medical Institute (2010a, Jan 22). ASTB Behavioral and Social Sciences.
frequently asked questions. Retrieved from http://www.med. Russell, T. L., & Sellman, W. S. (2008). Review of information
navy.mil/sites/navmedmpte/nomi/nami/Pages/ASTB and communications technology literacy tests. Paper presented
FrequentlyAsked Questions.aspx at the 23rd Annual Conference of the Society for Industrial
Naval Aerospace Medical Institute (2010b, Jan 22). ASTB infor- and Organizational Psychology, San Francisco, CA.
mation and sample questions. Retrieved from http://www. Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB
med.navy.mil/sites/navmedmpte/nomi/nami/Pages/ASTB and CAT. In W. A. Sands, B. K. Waters, & J. R. McBride
Overview.aspx. (Eds.), Computerized adaptive testing: From inquiry to opera-
Navy Personnel Research Studies and Technology (1998). Sailor tion (pp. 3–9). Washington, DC: American Psychological
21. A research vision to attract, retain, and utilize the 21st cen- Association.
tury sailor. Millington, TN: Navy Personnel Research Studies Schmidt, F. L. (1994). The future of personnel selection in the
and Technology. U.S. Army. In M. G. Rumsey, C. B. Walker, & J. H. Harris,
North, R. A., & Griffin, G. R. (1977). Aviator selection 1919– Personnel selection and classification (pp. 103–125). Hillsdale,
1977 (Special Rep. No. 77-2). Pensacola, FL: Naval Aerospace NJ: Erlbaum.
Medical Research Laboratory. Sellman, S. W. (2007). Research and implementation plan.
Odell, C. E. (1947). Selection and classification of enlisted per- Addressing recommendations for enhancing ASVAB and DOD
sonnel. In D. B. Stuit (Ed.), Personnel research and test devel- enlisted personnel and job classification system (FR-07-46).
opment in the Bureau of Naval Personnel (pp. 21–30). Alexandria, VA: Human Resources Research Organization.
Princeton, NJ: Princeton University Press. Staff, Personnel Research Section, Classification and Replacement
Olson, P. T. (1968). Use of Army school samples in estimating ACB Branch, the Adjutant General’s Office (1945). The Army
test validity (Tech. Research Note No. 199). Washington, General Classification Test. Psychological Bulletin, 42, 760–768.
DC: U.S. Army Behavioral Science Research Laboratory. Stark, S., Chernyshenko, O. S., & Drasgow, F. (2005). An IRT
Oppler, S. H., McCloy, R. A., Peterson, N. G., Russell, T. L., & approach to constructing and scoring pairwise preference
Campbell, J. P. (2001). The prediction of multiple compo- items involving stimuli on different dimensions: The multi-
nents of entry-level performance. In J. P. Campbell & Knapp, unidimensional pairwise-preference model. Applied
D. J. (Eds.) (2001). Exploring the limits in personnel selection Psychological Measurement, 29, 184–203.
and classification (pp. 349–388). Mahwah, NJ: Erlbaum. Stricker, L. J. (1993). The Navy’s Biographical Inventory: What
Parrish, J. A., & Drucker, A. J. (1957). Personnel research for accounts for its success? (pp. 7–12). Proceedings, 35th Annual
Officer Candidate School (Tech. Research Rep. No. 1107). Conference of the Military Testing Association. Williamsburg, VA.
Washington, DC: Personnel Research and Procedures Stricker, L. J. (2005). The Biographical Inventory in naval aviation
Division, Personnel Research Branch, The Adjutant General’s selection: Inside the black box. Military Psychology, 17, 55–67.
Office (Army). Thomas, P. J. (1970). A comparison between the Armed Services
Personnel Testing Division, Defense Manpower Data Center Vocational Aptitude Battery and the Navy Basic Test Battery in
(2008). ASVAB Technical Bulletin No. 3: CAT-ASVAB Forms predicting Navy school performance (Tech. Bulletin No. STB
5-9. Retrieved from http://www.official-asvab.com/catasvab_ 70-4) San Diego, CA: Navy Personnel and Training Research
res.htm Laboratory.
Putka, D. J. (Ed.) (2009). Initial development and validation of Trent, T. (1993). The Armed Services Applicant Profile (ASAP).
assessments for predicting disenrollment of four-year scholarship In T. Trent & J. H. Laurence (Eds.), Adaptability screening for
recipients from the Reserve Officer Training Corps (Study Rep. the Armed Forces (pp. 71–99). Washington, DC: Department
No. 2009-06). Arlington, VA: U. S. Army Research Institute of Defense, Office of Assistant Secretary of Defense (Force
for the Behavioral and Social Sciences. Management and Personnel).

146 mil i tary sel e c tio n a n d c l a s s i fi cat io n i n t h e u ni t e d s tat e s


Trent, T., & Laurence, J. H. (1993).Preface. In T. Trent & validity studies (AFHRL-TR-90-22). Brooks Air Force
J. H. Laurence (Eds.), Adaptability screening for the Armed Base, TX: Air Force Human Resources Laboratory, Air Force
Forces (pp. v–vii). Washington, DC: Department of Defense, Systems Command.
Office of Assistant Secretary of Defense (Force Management White, L. A., & Young, M. C. (1998). Development and valida-
and Personnel). tion of the Assessment of Individual Motivation. Paper pre-
Trippe, M. D., & Russell, T. L. (2008). Issues in information and sented at the annual meeting of the American Psychological
communication technology test development: A literature review Association, San Francisco.
and summary of best practices—Delivery Order 3: Development Willemin, L. P., & Karcher, E. K. (1958). Development of combat
and validation of similar instruments report (TO-08-16). aptitude areas (PRB Tech. Research Rep. No. 1110).
Alexandria, VA: Human Resources Research Organization. Washington, DC: Personnel Research Branch, Personnel
Tupes, E. C., & Christal, R. E. (1961). Recurrent personality fac- Research and Procedures Division, The Adjutant General’s
tors based on trait ratings. (Tech. Rep. No. 61-97). Lackland Office (Army).
Air Force Base, TX: Aeronautical Systems Division, Personnel Wolfe, J. H. (Ed.). (1997). Enhanced Computer-Administered
Laboratory. Test (ECAT) battery (special issue). Military Psychology, 9(1).
Underhill, C. M., Lords, A. O., & Bearden, R. M. (2006). Fake Young, M. C., Heggestad, E. D., Rumsey, M. G., & White, L. A.
resistance of a forced-choice paired-comparison personality mea- (2000). Army pre-implementation research findings on the
sure. Paper presented at the 48th annual meeting of the Assessment of Individual Motivation. Paper presented at the
International Military Testing Association, Kingston, annual meeting of the American Psychological Association,
Canada. Washington, DC.
Valentine, L. D., & Creager, J. A. (1961). Officer selection and Young, M. C., & White, L. A. (2006). Preliminary opera-
classification tests: Their development and use (ASD-TN-61- tional findings from the Army’s Tier Two Attrition Screen
145). Lackland Air Force Base, TX: Personnel Laboratory, (TTAS). Paper presented at the Army Science Conference,
Aeronautical Systems Division, Air Force Systems Command. Orlando, FL.
Waters, B. K. (1997). Army Alphas to CAT-ASVAB: Four-score Young, M. C., White, L. A., Heggestad, E. D., & Barnes, J. D.
years of military personnel selection and classification testing. (2004). Operational validation of the Army’s new pre-enlistment
In R. F. Dillon (Ed.), Handbook on testing (pp. 187–203). attrition screening measure. Paper presented at the annual
Westport, CT: Greenwood Press. meeting of the American Psychological Association,
Waters, D. D., Russell, T. L., & Sellman, S. W. (2007). Review of Honolulu, HI.
non-verbal reasoning tests. Alexandria, VA: Human Resources Zeidner, J. & Drucker, A. J. (1988). Behavioral sciences in the
Research Organization. Army: A corporate history of the Army Research Institute.
Weeks, J. L., Mullins, C. J., & Vitola, B. M. (1975). Airman Alexandria, VA: U.S. Army Research Institute for the
classification batteries from 1948 to 1975: A review and evalu- Behavioral and Social Sciences.
ation. (Tech. Paper No. 75-78). Lackland Air Force Base, TX: Zeidner, J., Harper, B. P., & Karcher, E. K. (1956). Reconstitution
Air Force Human Resources Laboratory. of the Aptitude Areas (PRB Tech. Research Rep. No. 1095).
Weissmuller, J. J., & Schwartz, K. L. (2007). Self-Description Washington, DC: Adjutant General’s Office (Army).
Inventory Plus Initiative: Assault on Occam’s Razor. Presented Zeidner, J., & Johnson, C. D. (1994). Is personnel classification
at the annual meeting of the International Military Testing a concept whose time has passed? In M. G. Rumsey, C. B.
Association, Queensland, Australia. Walker, & J. H. Harris (Eds.), Personnel selection and classifi-
Welsh, J. R., Kucinkas, S. K., & Curran, L. T. (1990). Armed cation (pp. 377–410). Hillsdale, NJ: Erlbaum.
Services Vocational Battery (ASVAB): Integrative review of

ru m s ey 147

You might also like