0% found this document useful (0 votes)
38 views23 pages

Unit 1 - Introduction To Psychological Tests

1. The document discusses the history and meaning of psychological tests. It notes that psychological tests have existed for thousands of years, dating back to civil service exams in ancient China. 2. It describes how Charles Darwin's theory of evolution and individual differences influenced Sir Francis Galton to study human traits and abilities. Galton initiated research into quantifying individual differences through sensory and motor tasks. 3. The text outlines key characteristics of psychological tests, including that they provide standardized, quantitative or qualitative measures of behavior through a limited sample, with scores interpreted against a reference group. Testing involves administering and scoring, while assessment is a more comprehensive process.

Uploaded by

Manroop Kaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views23 pages

Unit 1 - Introduction To Psychological Tests

1. The document discusses the history and meaning of psychological tests. It notes that psychological tests have existed for thousands of years, dating back to civil service exams in ancient China. 2. It describes how Charles Darwin's theory of evolution and individual differences influenced Sir Francis Galton to study human traits and abilities. Galton initiated research into quantifying individual differences through sensory and motor tasks. 3. The text outlines key characteristics of psychological tests, including that they provide standardized, quantitative or qualitative measures of behavior through a limited sample, with scores interpreted against a reference group. Testing involves administering and scoring, while assessment is a more comprehensive process.

Uploaded by

Manroop Kaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Government College for Girls, Ludhiana

B.A. III (Hons) Sem V Psychology


Unit 1: Introduction to Psychological Tests: History of Psychological Testing;
Nature of Psychological Tests; Classification of Tests; Use of Tests
MEANING OF PSYCHOLOGICAL TESTS:

According to the dictionary ‘test’ is defined as a series of questions on the basis of which
some information is sought. In psychology, the meaning of test is something more than this. A
psychological test is a standardized procedure to measure quantitatively or qualitatively one or
more than one aspect of a trait by means of a sample of verbal or nonverbal behavior. The
purpose of a psychological test is twofold. First, it attempts to compare the same individual on
two or more than two aspects of a trait, and second, two or more than two persons may be
compared on the same trait. Such a measurement may be either quantitative or qualitative. In the
words of Bean (1953), a test is “an organized succession of stimuli designed to measure
quantitatively or to evaluate qualitatively some mental process, trait or characteristic”. Likewise
Anastasi and Urbina (1997) have defined a psychological test as “essentially an objective and
standardized procedure for sampling behavior and describing it with scores or categories”.
Kaplan and Saccuzzo (2001) have opined, “A psychological test or educational test is a set of
items designed to measure characteristics of human beings that pertain to behavior”. These
definitions reveal some important characteristics of a psychological test.

First, test is an organized succession of stimuli, which means that the stimuli (popularly
known as items) in the test are organized in a certain sequence, and are based upon some
principles of test construction. Usually, the items of a test are processed through item analysis
and its procedure of administration is standardized to ensure maximum objectivity. In fact,
standardized procedures become essential for ensuring that testing procedures remain uniform
for different examiners in different situations. The lack of standardization may change not only
the character of the test but its difficulty level which may ultimately reduce the validity of the
test.

Second, both quantitative and qualitative measurements are possible through


psychological tests. The reading ability of a child may be measured with the help of a test
specially designed for the purpose. His reading ability score may be evaluated (or qualitatively
measured) with respect to the average performance of the reading ability of the other children of
his age or class. Thus, a test provides both quantitative and qualitative measurement of a trait.

Third, a psychological test is based upon a limited sample of behavior. This means that
any psychological test does not assess the totality of a person’s behavior. Rather it is focused on
the limited effect of that behavior. For example, when testing the vocabulary of a person, the test
constructor must settle for a sample of 40 to 50 words and predict for that person’s word
knowledge from this limited sample. In reality, the totality of the person’s word knowledge
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
might be poorer or stronger that the 40 or 50-word vocabulary test. Obviously, then, the
implication of the test-as-sample concept is that the test score invariably contains some degree of
error. Such measurement errors can be minimized by means of a careful test design, but it can
never be fully eliminated.

Fourth, psychological tests usually provide scores or categories which are, subsequently,
interpreted with reference to a standardized sample. The standardized sample should be
representative of the population for whom the test is meant so that it may be possible to evaluate
each person’s test score or results in comparison to the reference group. For example, knowing
that a college student scored 120 on a test of abstract reasoning conveys little meaning. But if we
know that the average score for a college student was 110 and that only 2 percent of these
students scored 120 or above, we have a definite basis for making a test prediction, that is we can
say that the examinee has a good prospect in college.

Some psychological tests are norm-referenced tests which means that results from them
are interpreted with reference to the standardization sample whereas some psychological tests are
criterion-referenced, which are, in fact, used to determine where an examinee stands with
reference to a tightly defined criterion or educational objective. On such tests, the comparison is
done with an objective standard rather than with the performance of other examinees. For
example, results of the criterion-referenced arithmetic test might state that a student does the
simple arithmetic works like addition, subtraction, multiplication, division of three digits with
80% accuracy whereas the goal of the school system is 90%. Thus, here, the performance of
other students is irrelevant and what is relevant is whether the student meets the accepted
criterion.

Finally, for getting the meaning of a psychological test clear, it is also essential that we
must make a distinction between testing and assessment. In fact, testing is a more limited venture
which primarily consists of administering, scoring, and interpreting the test scores. Assessment,
on the other hand, is a more comprehensive and wider term that includes the entire process of
compiling and synthesizing the information to make a prediction about the person.

HISTORY OF PSYCHOLOGICAL TESTING:

Early Antecedents

The origins of testing are neither recent nor American. Evidence suggests that the
Chinese had a relatively sophisticated civil service testing program more than 4000 years ago.
Every third year in China, oral examinations were given to help determine work evaluations and
promotion decisions.

By the Han Dynasty (206 B.C.E to 220 C.E.), the use of test batteries (two or more test
used in conjunction) was quite common. These early tests related to such diverse topics as civil
law, military affairs, agriculture, revenue, and geography. Tests had become quite well
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
developed by the Ming Dynasty (1368-1644 C.E.). During this period, a national multistage
testing program involved local and regional testing centers equipped with special testing booths.
Those who did well on the tests at the local level went on to provincial capitals for more
extensive essay examinations. After this second testing, those with the highest test scores went
on to the nation’s capital for a final round. Only those who passed this third set of tests were
eligible for public office.

The Western world most likely learned about testing programs through the Chinese.
Reports by British missionaries and diplomats encouraged the English East India Company in
1832 to copy the Chinese system as a method of selecting employees for overseas duty. Because
testing programs worked well for the company, the British government adopted a similar system
of testing for its civil service in 1855. After the British endorsement of a civil service testing
system, the French and German governments followed suit. In 1883, the U.S. government
established the American Civil Service Commission, which developed and administered
competitive examinations for certain government jobs. The impetus of the testing movement in
the Western world grew rapidly at that time.

Charles Darwin and Individual Differences

Tests are specifically designed to measure individual differences in ability and


personality among people. No two people are exactly alike in ability and typical behavior.
Although human beings realized long ago that individuals differ, developing tools for measuring
such differences was no easy matter. To develop a measuring device, we must understand what
we want to measure. An important step toward understanding individual differences came with
the publication of Charles Darwin’s highly influential book, The Origin of Species, in 1859.
According to Darwin’s theory, higher forms of life evolved partially because of differences
among individual forms of life within a species. Given that individual members of a species
differ, some possess characteristics that are more adaptive or successful in a given environment
than are those of other members. Darwin also believed that those with the best or most adaptive
characteristics survive at the expense of those who are less fit and that the survivors pass their
characteristics on to the next generation. Through this process, he argued, life has evolved to its
presently complex and intelligent levels.

Sir Francis Galton, a relative of Darwin, soon began applying Darwin’s theories to the
study of human beings. Given the concepts of survival of the fittest and individual differences,
Galton set out to show that some people possessed characteristics that made them more fit than
others, a theory he articulated in his book, Hereditary Genius, published in 1869. Galton (1883)
subsequently began a series of experimental studies to document the validity of his position. He
concentrated on demonstrating that individual differences exist in human sensory and motor
functioning, such as reaction time, visual acuity and physical strength. In doing so, Galton
initiated a search for knowledge concerning human individual differences, which is now one of
the most important domains of scientific psychology.
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
Galton’s work was extended by the U.S. psychologist James McKeen Cattell, who coined
the term mental test. Cattell’s doctoral dissertation was based on Galton’s work on individual
differences in reaction time. As such, Cattell perpetuated and stimulated the forces that
ultimately led to the development of modern tests.

Experimental Psychology and Psychophysical Measurement

A second major foundation of testing can be found in experimental psychology and early
attempts to unlock the mysteries of human consciousness through the scientific method. Before
psychology was practiced as a science, mathematical models of the mind were developed, in
particular those of J.E. Herbart. Herbart eventually used these models as the basis for educational
theories that strongly influenced 19th-century educational practices. Following Herbart, E.H.
Weber attempted to demonstrate the existence of a psychological threshold, the minimum
stimulus necessary to activate a sensory system. Then, following Weber, G.T. Fechner devised
the law that the strength of a sensation grows as the logarithm of the stimulus intensity.

Wilhelm Wundt, who set up a laboratory at the University of Leipzig in 1879, is credited
with founding the science of psychology, following in the tradition of Weber and Fechner.
Wundt was succeeded by E.B. Titchner, whose student, G. Whipple, recruited L.L. Thurstone.
Whipple provided the basis for immense changes in the field of testing by conducting a seminar
at the Carnegie Institute in 1919 attended by Thurstone, E. Strong, and other early prominent
U.S. psychologists. From this seminar came the Carnegie Interest Inventory and later the Strong
Vocational Interest Blank.

Thus, psychological testing developed from at least two lines of inquiry: one based on the
work of Darwin, Galton, and Cattell on the measurement of individual differences and the other
(more theoretically relevant and probably stronger) based on the work of the German
psychophysicists Herbart, Weber, Fechner, and Wundt. Experimental psychology developed
from the latter.

The efforts of these researchers, however necessary, did not by themselves lead to the
creation of modern psychological tests. Such tests also arose in response to important needs, such
as classifying and identifying the mentally and emotionally handicapped. One of the earliest tests
resembling present-day procedures, the Seguin Form Board Test, was developed in an effort to
educate and evaluate the mentally disabled. Similarly, Kraeplin (1912) devised a series of
examinations for evaluating emotionally impaired people.

An important breakthrough in the creation of modern tests came at the turn of the 20th
century. The French minister of public instruction appointed a commission to study ways of
identifying intellectually subnormal individuals in order to provide them with appropriate
educational experiences. One member of that commission was Alfred Binet. Working in
conjunction with the French physician T. Simon, Binet developed the first major general

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
intelligence test. Binet’s early effort launched the first systematic attempt to evaluate individual
differences in human intelligence.

The Evolution of Intelligence and Standardized Achievement Tests

The history and evolution of Binet’s intelligence test was instructive. The first version of
the test, known as the Binet-Simon Scale, was published in 1905. This instrument contained 30
items of increasing difficulty and was designed to identify intellectually subnormal individuals.
Like all well-constructed tests, the Binet-Simon Scale of 1905 was augmented by a comparison
or standardization sample. Binet’s standardization sample consisted of 50 children who had been
given the test under standard conditions – that is, with precisely the same instructions and
format. In obtaining this standardization sample, the authors of the Binet test had norms with
which they could compare the results from any new subject. Without such norms, the meaning of
scores would have been difficult, if not impossible, to evaluate. However, by knowing such
things as the average number of correct responses found in the standardization sample, one could
at least state whether a new subject was below or above it.

Binet was aware of the importance of a standardization sample. Further development of


the Binet test involved attempts to increase the size and representativeness of the standardization
sample. A representative sample is one that comprises individuals similar to those for whom the
test is to be used. When the test is used for the general population, a representative sample must
reflect all segments of the population in proportion to their actual numbers.

In 1908, the Binet-Simon Scale had been substantially improved. It was revised to
include nearly twice as many items as the 1905 scale. Even more significantly, the size of the
standardization sample was increased to more than 200. The 1908 Binet-Simon Scale also
determined a child’s mental age, thereby introducing a historically significant concept. In
simplified terms, you might think of mental age as a measurement of a child’s performance on
the test relative to other children of that particular age group. The mental age concept was one of
the most important contributions of the revised 1908 Binet-Simon Scale.

In 1911, the Binet-Simon Scale received a minor revision. By this time, the idea of
intelligence testing had swept across the world and was especially supported in the United States.
By 1916, L.M. Terman of Stanford University had revised the Binet test for use in the United
States. Terman’s revision, known as the Stanford-Binet Intelligence Scale (Terman, 1916), was
the only American version of the Binet test that flourished. It also characterizes one of the most
important trends in testing – the drive toward better tests.

Terman’s 1916 revision of the Binet-Simon Scale contained numerous improvements.


The standardization sample was increased to include 1000 people, original items were revised,
and many new items were added. Terman’s 1916 Stanford-Binet Intelligence Scale added
respectability and momentum to the newly developing testing movement.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
World War I. The testing movement grew enormously in the United States because of
the demand for a quick, efficient way of evaluating the emotional and intellectual functioning of
thousands of military recruits in World War I. The war created a demand for large-scale group
testing, because relatively few trained personnel could evaluate the huge influx of military
recruits. However, the Binet test was an individual test.

Shortly after the United States became actively involved in World War I, the army
requested the assistance of Robert Yerkes, who was then the president of the American
Psychological Association. Yerkes headed a committee of distinguished psychologists who soon
developed two structured group tests of human abilities: the Army Alpha and the Army Beta.
The Army Alpha required reading ability, whereas the Army Beta measured the intelligence of
illiterate adults.

World War I fueled the widespread development of group tests. About this time, the
scope of testing also broadened to include tests of achievement, aptitude, interest, and
personality. Because achievement, aptitude, and intelligence tests overlapped considerably, the
distinctions proved to be more illusory than real. Even so, the 1916 Stanford-Binet Intelligence
Scale had appeared at a time of strong demand and high optimism for the potential of measuring
human behavior through tests. World War I and the creation of group tests had then added
momentum to the testing movement. Shortly after the appearance of the 1916 Stanford-Binet
Intelligence Scale and the Army Alpha test, schools, colleges, and industry began using tests. It
appeared to many that this new phenomenon, the psychological test, held the key to solving the
problems emerging from the rapid growth of population and technology.

Achievement Tests. Among the most important developments following World War I
was the development of standardized achievement tests provide multiple-choice questions that
are standardized on a large sample to produce norms against which the results of new examinees
can be compared.

Standardization achievement tests caught on quickly because of the relative ease of


administration and scoring and the lack of subjectivity or favoritism that can occur in essay or
other written tests. In school settings, standardized achievement tests allowed one to maintain
identical testing conditions and scoring standards for a large number of children. Such tests also
allowed a broader coverage of content and were less expensive and more efficient than essays. In
1923, the development of standardization achievement tests culminated in the publication of the
Stanford Achievement Test by T.L. Kelley, G.M. Ruch, and L.M. Terman. By the 1930s, it was
widely held that the objectivity and reliability of these new standardized tests made them
superior to essay tests.

Rising to the challenge. For every movement there is a countermovement, and the
testing movement in the United States in the 1930s was no exception. Critics, soon became vocal
enough to dampen enthusiasm and to make even the most optimistic advocates of tests defensive.
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
Researchers, who demanded nothing short of the highest standards, noted the limitations and
weaknesses of existing tests. Not even the Stanford-Binet, a landmark in the testing field, was
safe from criticism. Although tests were used between the two world wars and many new tests
were developed, the accuracy and utility of tests remained under heavy fire.

Near the end of the 1930s, developers began to reestablish the respectability of tests.
New, improved tests reflected the knowledge and experience of the previous two decades. By
1937 the Stanford-Binet had been revised again. Among the many improvements was the
inclusion of a standardization sample of more than 3000 individuals. No other individual
intelligence test before or since has had a larger standardization sample. Only group tests such as
the Scholastic Assessment Test (SAT), which can be administered on a mass scale, have larger
samples.

A mere two years after the 1937 revision of the Stanford-Binet test, David Wechsler
published the first version of the Wechsler Intelligence scales, the Wechsler-Bellevue
Intelligence Scale (W-B) (Wechsler, 1939). The Wechsler-Bellevue scale contained several
interesting innovations in intelligence testing. Unlike the Stanford-Binet test, which produced
only a single score (the so-called IQ, or intelligence quotient), Wechsler’s test yielded several
scores, permitting an analysis of an individual’s pattern or combination of abilities.

Among the various scores produced by the Wechsler test was the performance IQ.
Performance tests do not require a verbal response; one can use them to evaluate intelligence in
people who have few verbal or language skills. The Stanford-Binet test had long been criticized
because of its emphasis on language and verbal skills, making it inappropriate for many
individuals, such as those who cannot speak or who cannot read. In addition, few people believed
that language or verbal skills play an exclusive role in human intelligence. Wechsler’s inclusion
of a nonverbal scale thus helped overcome some of the practical and theoretical weaknesses of
the Binet test. In 1986 the Binet test was drastically revised to include performance subtests.

Personality Tests: 1920-1940

Just before and after World War II, personality tests began to blossom. Whereas
intelligence tests measured ability or potential, personality tests measured presumably stable
characteristics or traits that theoretically underlie behavior. Traits are relatively enduring
dispositions (tendencies to act, think, or feel in a certain manner in any given circumstance) that
distinguish one individual from another. One of the basic goals of traditional personality tests is
to measure traits.

The earliest personality tests were structured paper-and-pencil group tests. These tests
provided multiple-choice and true/false questions that could be administered to a large group.
Because it provides a high degree of structure – that is, a definite stimulus and specific
alternative responses that can be unequivocally scored – this sort of test is a type of structured

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
personality tests. The first structured personality test, the Woodworth Personal Data Sheet, was
developed during World War I and was published in final form just after the war.

The motivation underlying the development of the first personality test was the need to
screen military recruits. The first structured personality test was simple by today’s standards.
Interpretations of the Woodworth test depended on the now discredited assumption that the
content of an item could be accepted at face value.

The introduction of the Woodworth test was enthusiastically followed by the creation of a
variety of structured personality tests, all of which assumed that a subject’s response could be
taken at face value. However, researchers scrutinized, analyzed, and criticized the early
structured personality tests, just as they had done with the ability tests. Indeed, the criticism of
tests that relied on face value alone became so intense that structured personality tests were
nearly driven out of existence. The development of new tests based on more modern concepts
followed, revitalizing the use of structured personality tests. Thus, after an initial surge of
interest and optimism during most of the 1920s, structured personality tests declined by the late
1930s and early 1940s. Following World War II, however, personality tests based on fewer or
different assumptions were introduced, thereby rescuing the structured personality test.

During the brief but dramatic rise and fall of the first structured personality tests, interest
in projective tests began to grow. In contrast to structured personality tests, which in general
provide a relatively unambiguous test stimulus and specific alternative responses; projective tests
provide an ambiguous stimulus and unclear response requirements. Furthermore, the scoring of
projective tests is often subjective.

Unlike the early structured personality tests, interest in the projective Rorschach inkblot
test grew slowly. The Rorschach test was first published by Herman Rorschach of Switzerland in
1921. However, several years passed before the Rorschach came to the United States, where
David Levy introduced it. The first Rorschach doctoral dissertation written in a U.S. university
was not completed until 1932, when Sam Beck, Levy’s student, decided to investigate the
properties of the Rorschach test scientifically. Although initial interest in Rorschach test was
lukewarm at best, its popularity grew rapidly after Beck’s work, despite suspicion, doubt, and
criticism from the scientific community.

Adding to the momentum for the acceptance and use of projective tests was the
development of the Thematic Apperception Test (TAT) by Henry Murray and Christina Morgan
in 1935. Whereas the Rorschach test contained completely ambiguous inkblot stimuli, the TAT
was more structured. Its stimuli consisted of ambiguous pictures depicting a variety of scenes
and situations. Unlike the Rorschach test, which asked the subject to explain what the inkblot
might be, the TAT required the subject to make up a story about the ambiguous scene. The TAT
purported to measure human needs and thus to ascertain individual differences in motivation.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
The Emergence of New Approaches to Personality Testing

In 1943 the Minnesota Multiphasic Personality Inventory (MMPI) began a new era for
structured personality tests. The idea behind the MMPI – to use empirical methods to determine
the meaning of a test response – helped revolutionize structured personality tests. The problem
with early structured personality tests such as the Woodworth was that they made far too many
assumptions that subsequent scientific investigations failed to substantiate. The authors of the
MMPI, by contrast, argued that the meaning of a test response could be determined only by
empirical research. The MMPI, along with its updated companion the MMPI-2, is currently the
most widely used and referenced personality test. Its emphasis on the need for empirical data has
stimulated the development of tens of thousands of studies.

Just about the time the MMPI appeared, personality tests based on the statistical
procedure called factor analysis began to emerge. Factor analysis is a method of finding the
minimum number of dimensions, called factors, to account for a large number of variables. In the
early 1940s, J.R. Guilford made the first serious attempt to use factor analytic techniques in the
development of a structured personality test. By the end of that decade, R.B. Cattell had
introduced the Sixteen Personality Factor Questionnaire (16PF), which remains one of the most
well-constructed structured personality tests and an important example of a test developed with
the aid of factor analysis. Today, factor analysis is a tool used in the design or validation of just
about all major tests.

The Period of Rapid Changes in the Status of Testing

The 1940s saw not only the emergence of a whole new technology in psychology testing
but also the growth of applied aspects of psychology. The role and significance of tests used in
World War I were reaffirmed in World War II. By this time the U.S. government had begun to
encourage the continued development of applied psychological technology. As a result,
considerable federal funding provided paid, supervised training for clinically oriented
psychologists. By 1949 formal university standards had been developed and accepted, and
clinical psychology was born. Other applied branches of psychology – such as industrial,
counseling, educational, and school psychology – soon began to blossom.

One of the major functions of the applied psychologist was providing psychological
testing. The Shakow et al. (1947) report, which was the foundation of the formal training
standards in clinical psychology, specified that psychological testing was a unique function of
the clinical psychologist and recommended that testing methods be taught only to doctoral
psychology students. A position paper of the American Psychological Association published
seven years later (APA, 1954) affirmed that the domain of the clinical psychologist included
testing. It formally declared, however, that the psychologist would conduct psychotherapy only
in “true” collaboration with physicians. Thus, psychologists could conduct testing, but not
psychotherapy, independently. Indeed, as long as psychologists assumed the role of testers, they
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
played a complementary but often secondary role vis-à-vis medical practitioners. Though the
medical profession could have hindered the emergence of clinical psychology, it did not, because
as tester the psychologist aided the physician. Therefore, in the late 1940s and early 1950s,
testing was the major function of the clinical psychologist.

For better or worse, depending on one’s perspective, the government’s efforts to


stimulate the development of applied aspects of psychology, especially clinical psychology, were
extremely successful. Hundreds of highly talented and creative young people were attracted to
clinical and other applied areas of psychology. These individuals, who would use tests and other
psychological techniques to solve practical human problems, were uniquely trained as
practitioners of the principles, empirical foundations, and applications of the science of
psychology.

The Current Environment

During the 1980s and 1990s, several major branches of applied psychology emerged:
neuropsychology, health psychology, forensic psychology, and child psychology. Because each
of these important areas of psychology makes extensive use of psychological tests, psychological
testing again grew in status and use. Neuropsychologists use tests in hospitals and other clinical
settings to assess brain injury. Health psychologists use tests and surveys in a variety of medical
settings. Forensic psychologists use tests in the legal system to assess insanity, incompetency,
and emotional damages. Child psychologists use tests to assess childhood disorders. As in the
past, psychological testing in the 21st century remains one of the most important yet controversial
topics in psychology.

CLASSIFICATION OF TEST:

1. On the basis of the criterion of administrative conditions

Tests have been classified on the basis of administrative conditions into two types –
individual tests and group tests. Individual tests are those tests that are administered to one
person at a time. Kohs Block Design Test is an example of the individual test. Individual tests
are often used by school psychologists and counselors to motivate children and to observe how
they respond. Some individually administered tests are given orally, and they require the
constant attention of the examiner. Individual tests, in general, have two limitations, i.e., such
tests are time-consuming and require the services of trained and experienced examiners. As such,
these tests are used only when a crucial decision is necessary.

Group tests are tests which can be used among more than one person or in a group at a time.
Bell Adjustment Inventory is an example of the group test. Besides assessing adjustment, group
tests are adequate for measuring cognitive skills to survey the achievements, strengths and
weaknesses of the students in the classroom, etc.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
2. On the basis of the criterion of scoring

Scoring is one of the vital parts of a test. Based upon this criterion, tests are classified into
two types – objective test and subjective test. Objective tests are those whose items are scored
by competent examiners or observers in such a way that no scope for subjective judgment or
opinion exists and thus, the scoring remains unambiguous. Tests having multiple-choice, true-
false and matching items are usually called objective tests. In such items the problem as well as
its answer is given along with the distractor. The problem is known as the stem of the item. A
distractor answer is one which is similar to the correct answer but is not actually the correct one.
Such tests are also known as new-type tests or limited-answer tests.

Subjective tests are tests whose items are scored by the competent examiners or observers in
a way in which there exists some scope for subjective judgment and opinion. As a consequence,
some elements of vagueness and ambiguity remain in their scoring. These are also called essay
tests. Such tests are intended to assess an examinee’s ability to organize a comprehensive
answer, recall and select important information, and present the same logically and effectively.
Since in these tests the examinee is free to write and organize the answer, they are also known as
free-answer tests.

3. On the basis of the criterion of time limit in producing the response

Another way of classifying tests is whether they emphasize time limit or not. On the basis of
this criterion, the tests are classified into power tests and speed tests. A power test is one which
has a generous time limit so that most examinees are able to attempt every item. Usually such
tests have items which are generally arranged in increasing order of difficulties. Most of the
intelligence tests and aptitude tests belong to the category of power tests. In fact, power tests
demonstrate how much knowledge or information the examinees have.

Speed tests are those that have severe time limits but the items are comparatively easy and
the difficulties involved therein are more or less of the same degree. Here, very few examinees
are supposed to make errors. Speed tests, generally, reveal how rapidly, i.e., with what speed the
examinees can respond within a given time limit. Most of the clerical aptitude tests belong to this
very category.

In fact, whether a test is a power test or a speed test depends, in part, on the nature of the
examinees for whom it is meant. An arithmetical test for class VII students might emphasize
speed if it contained items that were easier for them, but the same test could be a power test for
class III or IV students or for less-prepared students. Today, a pure power test or pure speed test
is rare, rather a mixture of the two is common.

4. On the basis of the criterion of the nature or contents of items

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
A test may be classified on the basis of the nature of the items or the contents used therein.
Important types of the test based on this criterion are:

(i) Verbal Test: A verbal test is one whose items emphasize reading, writing and oral
expression as the primary mode of communication. Herein instructions are printed or
written. These are read by the examinees and accordingly items are answered. Jalota
Group General Intelligence Test and Mehta Group Test of Intelligence are some
common examples. Verbal tests are also called paper-pencil tests because the
examinee has to write on a piece of paper while answering the test items.
(ii) Non-verbal Test: Nonverbal tests are those that emphasize but don’t altogether
eliminate the role of language by using symbolic materials like pictures, figures, etc.
Such tests use the language in instruction but in items they don’t use language. Test
items present the problem with the help of figures and symbols. Nonverbal tests are
commonly used with young children as an attempt to assess the nonverbal aspects of
intelligence such as spatial perception. Raven Progressive Matrices is a good example
of nonverbal test.
(iii) Performance Test: Performance tests are those that require the examinees to perform
a task rather than answer some questions. Such tests prohibit the use of language in
items. Occasionally, oral language is used to give instruction, or the instruction may
also be give through gesture and pantomime. Different kinds of performance tests are
available. Some tests require examinees to assemble a puzzle, place pictures in a
correct sequence, place pages in the boards as rapidly as possible, point to a missing
part of the picture, etc. One feature of performance tests is that they are usually
administered individually so that the examiner can count the errors committed by the
examinee or the student and can assess how long it takes him to complete the given
task. Whatever may be the types of performance test, the common feature of all
performance tests is their emphasis on the examinee’s ability to perform a task rather
than answer some questions.
(iv) Non-language Test: Nonlanguage tests are those which don’t depend upon any form
of written, spoken or reading communication. Such tests remain completely
independent of the ability to use language in any way. Instructions are usually given
through gestures or pantomime and the examinees respond by pointing at or
manipulating objects such as pictures, blocks, puzzles, etc. Such tests are usually
administered to those persons or children who can’t communicate in any form of
ordinary language.

5. On the basis of the criterion of purpose or objective

Tests are also classified in terms of their objectives or purposes. Based upon this criterion,
tests are usually classified as intelligence tests, aptitude tests, personality tests,
neuropsychological test and achievement tests. Intelligence tests intend to assess intelligence
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
of the examinees. Aptitude tests assess potentials or aptitudes of the persons. Personality tests
assess traits, adjustments, interests, values, etc., of the persons. Neuropsychological tests are the
tests which are used in the assessment of persons with known or suspected brain dysfunctioning.
Achievement tests assess what the persons have acquired in the given area as a function of some
training or learning.

6. On the basis of the criterion of standardization

Tests are also classified on the basis of standardization. Based upon this criterion, tests are
classified into standardized tests and teacher-made tests. Standardized tests are those which
have been subjected to the procedure of standardization. However, the meaning of the term
‘standardization’ is controversial and includes at least the following conditions:

(i) The first condition for standardization is that there must be a standard manner of
giving instructions so that uniformity can be maintained in the evaluation of all those
who take the test.
(ii) The second condition for standardization is that there must be uniformity of scoring
and an index of fairness of correct answer through the procedure of item analysis
should be available.
(iii) The third condition is that reliability and validity of the test must be established and
the individuals for whom the test is intended should be explicitly mentioned.
(iv) The fourth condition, a controversial one, is that a standardized test should have
norms. However, according to Cronbach (1970) a test even without norms may be
called a standardized test. But the majority of psychologists favor the idea that a
standardized test should have norms as well.

Teacher-made tests are those that are constructed by teachers for use largely within their
classrooms. The effectiveness of such tests depends upon the skill of the teacher and his
knowledge of test construction. Items may come from any area of curriculum and they may be
modified according to the will of the teacher. Rules for administration and scoring are
determined by the teacher. Such tests are largely evaluated by the teachers themselves and no
particular norms are provided; however, they may be developed by the teacher for his own class.

CHARACTERISTICS OF A GOOD TEST:

For a test to be scientifically sound, it must possess the following characteristics:

(i) Objectivity: A test must have the trait of objectivity, i.e., it must be free from the
subjective element so that there is complete interpersonal agreement among experts
regarding the meaning of the items and scoring of the test. Obviously, objectivity,
here, relates to two aspects of the test – objectivity of the items and objectivity of the
scoring system. By objectivity of items is meant that the items should be phrased in
such a manner that they are interpreted in exactly the same way by all those who take
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
the test. For ensuring objectivity of items, items must have uniformity of order of
presentation (that is, either ascending or descending order). By objectivity of scoring
is meant that the scoring method of the test should be a standard one so that complete
uniformity can be maintained when the test is scored by different experts at different
times.
(ii) Reliability: A test must also be reliable. Reliability here refers to self-correlation of
the test. It shows the extent to which the results obtained are consistent when the test
is administered once or more than once on the same sample with a reasonable time
gap. Consistency in results obtained in a single administration is the index of internal
consistency of the test and consistency in results obtained upon testing and retesting
is an index of temporal consistency. Reliability, thus, includes both internal
consistency as well as temporal consistency. For a test to be called sound it must be
reliable because reliability indicates the extent to which the scores obtained in the test
are free from such internal defects of standardization which are likely to produce
errors of measurement.
(iii) Validity: Validity is another prerequisite for a test to be sound. Validity indicates the
extent to which the test measures what it intends to measure, when compared with
some outside independent criterion. In other words, it is the correlation of the test
with some outside criterion. The criterion should be an independent one and should
be regarded as the best index of trait or ability being measured by the test. Generally,
validity of the test is dependent upon the reliability because a test which yields
inconsistent results (poor reliability) is ordinarily not expected to correlate with some
outside independent criterion.
(iv) Norms: A test must also be guided by certain norms. Norms refer to the average
performance of a representative sample on a given test. There are four common types
of norms – age norms, grade norms, percentile norms and standard score norms.
Depending upon the purpose and use, a test constructor prepares any of these norms
for his test. Norms help in interpretation of the scores. In the absence of norms no
meaning can be added to the score obtained on the test.
(v) Practicability: A test must also be practicable from the point of view of the time
taken in its completion, length, scoring, etc. In other words, the test should not be
lengthy and the scoring method must not be difficult nor one which can only be done
by highly specialized persons.

USES AND LIMITATIONS OF PSYCHOLOGICAL TESTS AND TESTINGS:

Psychological tests are widely used for many purposes. It is very convenient to
distinguish the following five uses of tests:

1. In classification: Psychological tests are popularly used in making classification of


persons, that is, for assigning the persons to one category rather than to another one.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
There are different types of classification, each one giving emphasis upon a particular
purpose in assigning persons to categories. Important types of categories are placement,
screening, certification and selection, where psychological tests play a significant role in
each of these types.
Placement refers to the sorting of persons into appropriate programmes according
to their needs or skills. With the help of the appropriate psychological tests, teachers
often in class enroll some of the students into science faculty and some of them into
social science faculty. They enroll some students for Mathematics, Physics and
Chemistry programme whereas others are enrolled for Geography, History and Political
Science programme. Without the help of psychological tests it is not possible to do such
placement.
Screening refers to the procedures of identification of persons with special
characteristics or needs. With the help of psychological tests, psychometricians often
screen persons into creative persons and persons having exceptional talent in abstract
reasoning. They administered the tests and on the basis of the scores obtained, are able to
screen them in desired categories.
Certification and selection are done with the help of psychological tests.
Certification implies that an individual has at least a minimum proficiency in some
discipline or activity. When a person passes a certification examination, it automatically
confers some privileges. For example, when a driver passes driving examination, he gets
a license. This illustrates the process of certification. Selection is very much similar to
certification because it also confers some privileges on the part of the persons who have
been selected. These talks are well accomplished with the help of psychological tests.
The persons who are selected on the basis of the test scores are, for example, get
admission into certain course or gain employment in the organization.
2. In diagnosis and planning for treatment: Psychological tests play a significant role in
making diagnosis and in planning for treatment. Diagnosis means determining the nature
of the person’s abnormal behavior and classifying the behavior pattern within an accepted
system. Intelligence tests are considered important for diagnosis of mentally retarded
children. Likewise, through some psychological tests, a diagnosis of learning disability
can easily be done with the help of MMPI, a clinical psychologist readily diagnose
persons with pathological traits. A proper diagnostic programme not only provides
assignment of a label, but also the choice for treatment. When a child is diagnosed as
mentally retarded or having learning disability, a planning for his treatment is
accordingly done so that the maximum help can be rendered.
3. In self-knowledge: Psychological tests are also useful in providing self-knowledge to the
test takers of the extent that such knowledge tends to change their career path. Every
administration of a psychological test gives a feedback to the test takers regarding the
level of trait/ability being assessed. As a consequence, they bring a change in the desired
direction and mould their path for betterment.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
4. In evaluation of programmes: Psychological tests are often used in evaluation of
various types of educational and social programmes. In school and colleges different
types of programmes for betterment of academic achievement are carried out and the
persons want to know about its impact. Such impacts are easily assessed with the help of
various types of achievement and intelligence tests. Likewise, people in general and
political parties in particular want to assess the outcome of a social programme carried
out for the purpose of say, verifying the IQ levels of disadvantaged group. This is also
done with the help of various types of psychological tests.
5. In theoretical and applied branches of behavioral research: Psychological tests are
very useful in research. They are frequently used in both theoretical and applied
researches. With the help of such tests, psychologists frequently investigate theoretical
matters that have no immediate or obvious practical applications. Here, we can cite the
example of Witkin (1949) who for analyzing perceptual field dependence, developed the
tilting-room-tilting-chair test (TRTC). In fact, TRTC encouraged a good deal of research
on personality development but was seldom applied to any practical problems of testing.
Take example for an applied field, suppose neuropsychologists wish to test the
hypothesis that low level of lead absorption produces behavioral deficit in children. This
hypothesis can be easily tested by examining lead-burdened children and normal children
with the help of psychological tests. With the help of various types of psychological tests,
it has been reported that low-level lead absorption in children produces decrement in IQ,
impairment in reaction time and increase in undesirable classroom behavior. This
automatically shows that psychological tests are useful in applied areas too and there
should not be any debate about the validity of testing-based research findings.

Despite its various uses, psychological tests are not without limitations. Some of
the important limitations are as under:
1. Psychological tests represent an invasion of privacy: Psychological tests
may be invasion of privacy if they are used without the permission of the
testees to obtain personal and sensitive information.
2. Psychological tests permanently categorize the persons: On the basis of the
performance of psychological tests, the testees or examinees, are given certain
categories like mentally retarded, gifted, brain-damaged, etc., and the
authority behaves accordingly disregarding evidence of any further change.
This has a serious implication for the examinees. The examinees can
definitely change and great care should be taken in the interpretation and use
of the test results.
3. Psychological tests measure only limited and beneficial aspects of
behavior: It is said that the psychological tests cannot measure the most
important human traits. They force the examinees to take decisions based on
superficial and relatively unimportant criteria.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
4. Psychological tests create anxiety: Generally, it has been reported that when
the assessment is to be done through psychological tests, the examinees feel
anxious and this anxiety affect their performances. However, the examinees
who are familiar with specific types of tests are less anxious than those who
are familiar with the test contents (Sax, 1974).
5. Psychological tests penalize bright and creative examinees: Psychological
tests are insensitive to atypical and creative responses. Such responses are not
given much credit thus providing a discrimination against the talented
examinees.
Thus psychological tests have some limitations.

ETHICAL ISSUES IN PSYCHOLOGICAL TESTING:

Psychological testing refers to all the possible uses, applications and underlying
important concepts of psychological tests. To maintain its proper uses and applications, the
American Psychological Association (APA) has officially adopted a set of standards and rules in
1953 which have undergone continual review and refinement. The current version, called Ethical
Principles of Psychologists and Code of Conduct (APA,1992), consists of a preamble and six
general principles which guide psychologists towards the highest ideals in their profession. In
addition, it also provides eight ethical standards with enforceable rules for psychologists who are
working in different contexts. The APA Committee on Psychological Tests and Assessment
(CPTA) is especially designed for considering problems regarding sound testing and assessment
practices and for providing various types of technical advices.

The major ethical or moral issues relating to psychological testing can be described under
the following five headings:

1. Issues of human rights: Today the field of psychological testing has been heavily
influenced by recognition of various types of human rights. Among these rights is the
right to not to be tested. In fact, persons who don’t want to subject themselves to
testing, should not and ethically can’t be forced to accept this. Moreover, individuals
who finally decide to subject themselves to testing, have rights to know their test
scores, their interpretations as well as the basis of any decisions that affect their lives.
Likewise, these days other human rights such as the right to know who will have
access to the data of psychological testing, and the right to confidentiality of test
results are being popularly discussed. Test interpreters have an ethical obligation to
provide protection to these human rights whereas potential test takers are responsible
for demanding their rights. Such awareness of human rights today is casting a very
important influence on psychological testing and also shaping its future.
2. Issue of labeling: On the basis of psychological testing, a person is given a certain
label or diagnosed as having a certain psychiatric disorder. This labeling has many
harmful effects. For example, suppose a person has been diagnosed with chronic
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
schizophrenia, which in fact, has a little chance of being cured. In fact, labeling
someone a chronic schizophrenic may be a self-fulfilling prophecy. Since this
disorder is incurable, nothing can be done and when nothing can be done why should
one bother to provide help to such a person. Because no help is given, the person
remains a chronic case. Thus, labeling can stigmatize a person for life and it also
affects one’s access to help. Such a labeling creates additional problems. When a
person is labeled schizophrenic, it automatically implies that he is not responsible for
it because schizophrenia is a disease or illness and nobody can be blamed for
becoming ill. Therefore, such labeling will make him passive and will leave no
incentive to alter the negative conditions surrounding his life. Therefore, labeling will
not only stigmatize persons but it will also lower tolerance for stress and make
treatment difficult. In view of these potential negative effects and dangers of labeling,
a person should have right to not to be labeled.
3. Issues of invasion of privacy: When people respond to items of psychological tests,
they have little idea of what is being revealed by their responses but somehow, they
feel that their privacy has been invaded. Such a feeling is definitely detrimental for
people.
In fact, here are two sides to this issue. Dahlstrom (1969) investigated this issue in
detail and pointed out two related aspects of this issue. He has pointed out that this
issue of invasion of privacy is based on serious misunderstandings. In fact,
psychological tests have very limited and pinpointed aim and they can’t invade the
privacy of the persons. Another aspect of this issue, again pointed out by Dahlstrom
(1969), is the ambiguous nature of the notion of invasion of privacy itself. In reality
psychologists don’t consider it wrong, evil or even detrimental to find out or collect
information about the person. The person’s privacy is invaded when most information
is used inappropriately or wrongly. Psychologists are ethically and even legally bound
to maintain confidentiality and they don’t reveal any more information about a person
then is necessary to accomplish the purpose for which the testing was started. In fact,
the ethical code of APA (1992) has included confidentiality, which obviously dictates
that personal information obtained by the psychologist from any source is
communicated to others only with the person’s consent. Exception to this exists in
only those circumstances in which withholding information may cause danger to the
person or society as well as cases that have subpoenaed records.
4. Issues of divided loyalties: This is one of the vital issues of psychological testing and
was first pointed out by Jackson and Messick (1967) and still today remains a central
problem in the field of psychological testing. In fact, divided loyalties is today a
major dilemma for psychologists who use the test in different fields such as industry,
schools, clinics, government, military and so on. A psychologist has to face a conflict,
which arises when the individual’s welfare is put at odds on the one hand and that of
the institution that employs the psychologist on the other. For example, suppose a

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
psychologist working for an individual firm to identify individuals who are accident
prone, has the responsibility towards the institution to identify such persons as well as
the responsibility to protect the rights and welfare of the persons seeking
employment. Here, the psychologist’s loyalty stands divided. Likewise, a
psychologist has to maintain test security at any cost but also he must not violate the
person’s right to know the basis of an adverse decision. However, if this basis is
explained to one person, this information may go to the persons with the same
problem, who rightly then decide to outsmart the test. Again, the psychologist is
trapped with two opposing principles.
5. Responsibility of test constructors and test users: Ethical issues also put some
responsibilities on test constructors, or developers, and test users. In fact, the test
constructor is responsible for providing all the necessary information. Latest
standards for test use state that the test constructors must provide a test manual which
may clearly state the appropriate use of the test, including data relating to reliability,
validity, and norms, clearly specify about the scoring and administration standards.
There are some responsibilities which lie with test users too. According to APA
(1974) almost any test can be useful if it is used in the right circumstances but even
the best test can hurt the subject if it is used inappropriately. To minimize such
potential damage, APA (1974) makes uses of the tests responsible for knowing the
reason for using the test, as well as the consequences of using the test. It also makes
test users to maximize the test’s effectiveness and to minimize unfairness, if any. In
other words, test users must possess sufficient and adequate knowledge to understand
the basic principles underlying the test construction and supporting research of any
test they administer. They must also be aware of the psychometric qualities of the test
being used as well as the relevant literature. At any cost, a test user cannot claim
ignorance. The test user is responsible for finding out relevant and pertinent
information using any test.
These ethical issues, in fact put psychologists on vigil so that they must guard
both the interest of the test as well as the welfare of the test takers.

GENERAL STEPS OF TEST CONSTRUCTION:

Before the real work of test construction starts, certain broad decisions are taken by the
investigator. These preliminary decisions have far-reaching consequences. It is in this
preliminary stage that the test constructor outlines the major objectives of the test in general
terms, and specifies the populations for whom the test is intended. He also indicates the possible
conditions under which the test can be used and its important uses. For example, a test
constructor may decide to construct an intelligence test meant for students of the tenth grade
broadly aiming at diagnosing the manipulative and organizational ability of the pupils. Having
decided the above preliminary things, he must go ahead with the following steps.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
1. Planning of the Test: The first step in the construction of a test is careful planning. At
this stage the test constructor specifies the broad and specific objectives of the test in
clear terms. He decides upon the nature of the content or items to be included, the type of
instructions to be included, the method of sampling, a detailed arrangement for the
preliminary administration and the final administration, a probable length and time limit
for the completion of the test, probable statistical methods to be adopted, etc. Planning
also includes the total number of reproductions of the test to be made and a preparation of
manual.
2. Writing Items of the Test: The second step in test construction is the preparation of the
items of the test. According to Bean (1953), an item is defined as “a single question or
task that is not often broken down into any smaller units.” Item writing starts with the
planning done earlier. If the test constructor decides to prepare an essay test, the essay
items are written down. However, if he decides to construct an objective test, he writes
down the objective items such as the alternative response item, matching item, multiple-
choice item, completion item, short-answer item, pictorial form of item, etc. Depending
upon the purpose, he decides to write any of these objective types of items.
Item writing is essentially a creative art. There are no set rules to guide and
guarantee writing of good items. However, there are some essential prerequisites, which
must be met if the item writer wants to write good and appropriate items. These
requirements are enumerated as follows:
(i) The item writer must have a thorough knowledge and complete mastery of
the subject-matter. In other words, he must be fully acquainted with all
facts, principles, misconceptions, fallacies in a particular field so that he
may be able to write good and appropriate items.
(ii) The item writer must be fully aware of those persons for whom the test is
meant. He must be aware of the intelligence level of those persons so that
he may manipulate the difficulty level of the items for proper adjustment
with their ability level. He must also be able to avoid irrelevant clues to
the correct responses.
(iii) The item writer must be familiar with different types of items along with
their advantages and disadvantages. He must also be aware of the
characteristics of good items and the common probable errors in writing
items.
(iv) The item writer must have a large vocabulary. He must know the different
meanings of a word so that confusion in writing the items may be avoided.
He must be able to convey the meaning of the items in the simplest
possible language.
(v) After writing down the items, they must be submitted to a group of subject
experts for their criticisms and suggestions, which must then be duly
modified.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana
After the items have been written down, they are reviewed by
some experts or by the item writer himself and then arranged in the order
in which they are to appear in the final test. Generally, items are arranged
in an increasing order of difficulty, and those having the same form (say,
alternative form, matching, multiple-choice, etc.) and dealing with the
same contents are placed together.
3. Preliminary Administration (or the experimental try-out) of the Test: When the
items have been written down and modified in the light of the suggestions and criticisms
given by the experts, the test is said to be ready for its experimental try-out. The purpose
of experimental try-put or preliminary administration of the test is manifold. According
to Conrad (1951), the main purpose of the experimental try-out of any psychological test
is as given below:
(i) Finding out the major weaknesses, omissions, ambiguities and
inadequacies of the items. In other words, try-out helps in identifying the
ambiguous and indeterminate items, non-functioning distractors in
multiple-choice items, very difficult or very easy items, and the like.
(ii) Determining the difficulty values of each item which, in turn, helps in
selecting items for their even and proper distribution in the final form.
(iii) Determining the validity of each individual item. The experimental try-out
helps in determining the discriminatory power of each individual item.
The discriminatory power here refers to the extent to which any given item
discriminates successfully between those who possess the trait in larger
amounts and those who possess the same trait in the least amount.
(iv) Determining a reasonable time limit of the test.
(v) Determining the appropriate length of the test. In other words, it helps in
determining the number of items to be included in the final form.
(vi) Determining the intercorrelations of items so that overlapping can be
avoided.
(vii) Identifying any weakness and vagueness in directions or instructions of
the test as well as in the fore-exercises or sample questions of the test.

For achieving these aims of experimental try-out, Conrad (1951) recommended at least
three preliminary administrations of the test. The aim of the first administration is to detect any
gross defects, ambiguities, and omissions in items and instructions. For the first administration,
the number of examinees should not be less than 100. He refers to the first try-out as the “Pre-
try-out”. The aim of the second preliminary administration is to provide data for item analysis,
and for this the number of examinees should be around 400. Conrad calls this second try-out “the
try-out proper”. The sample for this must be similar to those for whom the test is intended. Item
analysis is a technique of selecting discriminating items for the final composition of the test. It
aims at obtaining three kinds of information regarding the items: (i) the difficulty value of the
item, (ii) the discrimination index of the item, and (iii) the effectiveness of distracters. The third
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
preliminary administration is carried out to detect any minor defects that may not have been
detected by the first two preliminary administrations. Conrad calls this third try-out the ‘final
trial administration’. At this stage the items are selected after item analysis and they constitute
the test in the final form. The ‘final trial administration’ indicates how effective the test will
really be when it would be administered on the sample for which it is really intended. Thus the
third preliminary administration would be a kind of ‘dress rehearsal’ providing a sort of final
check on the procedure of administration of the test and its time limit. After the final trial
administration is over, no material change is ordinarily to be induced in the test.

4. Reliability of the Final Test: When on the basis of the experimental or empirical try-out
the test is finally composed of the selected items, the final test is again administered on a
fresh sample in order to compute the reliability coefficient. The size of the sample for this
purpose should not be less than 100. Reliability is the self-correlation of the test and it
indicates the consistency of the scores in the test. There are three common ways of
calculating reliability coefficient, namely test-retest method, split-half method, and the
equivalent-form method. Besides these, the Kuder Richardson formulas and the Rulan
formula are also used in computing the reliability coefficient of the test.
5. Validity of the Final Test: Validity refers to what the test measures and how well it
measures. If a test measures a trait that it intends to measure well, we say that the test is a
valid one. After estimating, the reliability coefficient of the test, the test constructor
validates the test against some outside independent criteria by comparing the test with the
criteria. Thus, validity may also be defined as the correlation of the test with some
outside independent criteria. Validity should be computed from the data obtained from
the samples other than those used in item analysis. This procedure is known as cross-
validation. There are three main types of validity: content validity, construct validity and
criterion-related validity. The usual statistical techniques employed in computing validity
coefficients are Pearsonian r, biserial r, pointbiserial r, chi-square, phi –coefficient, etc.
6. Preparation of Norms for the Final Test: Finally, the test constructor also prepares
norms of the test. Norms are defined as the average performance or score of a large
sample representative of a specified population. Norms are prepared to meaningfully
interpret the scores obtained on the test for the obtained scores on the test themselves
convey no meaning regarding the ability or trait being measured. But when these are
compared with the norms, a meaningful inference can immediately be drawn. The
common types of norms are the age norms, the grade norms, the percentile norms, and
the standard score norms. All these types of norms are not suited to all types of tests.
Keeping in view the purpose and type of test, the test constructor develops a suitable
norm for the test. The preliminary considerations in developing norms are that the sample
must be representative of the true population; it must be randomly selected; and it should
preferably represent a cross-section of the population.
7. Preparation of Manual and Reproduction of the Test: The last step in test
construction is the preparation of a manual of the test. In the manual the test constructor
Miss Umang Bharti, Lecturer, Department of Psychology
Government College for Girls, Ludhiana
reports the psychometric properties of the test, norms and references. This gives a clear
indication regarding the procedures of the test administration, the scoring methods and
time limits, if any, of the test. It also includes instructions as well as the details of
arrangement of materials, that is, whether items have been arranged in random order or in
any other order. In general, the test manual should yield information about the
standardization sample, reliability, validity, scoring as well as practical considerations.
The test constructor, after seeing the importance and requirement of the test, finally
orders for printing of the test and the manual.

Miss Umang Bharti, Lecturer, Department of Psychology


Government College for Girls, Ludhiana

You might also like