Psychological Testing
Psychological Testing
Psychological testing, also called psychometrics, the systematic use of tests to quantify
psychophysical behavior, abilities, and problems and to make predictions about psychological
performance.
The word “test” refers to any means (often formally contrived) used to elicit responses to
which human behavior in other contexts can be related. When intended to predict relatively
distant future behavior (e.g., success in school), such a device is called an aptitude test. When
used to evaluate the individual’s present academic or vocational skill, it may be called
an achievement test. In such settings as guidance offices, mental-health clinics, and psychiatric
hospitals, tests of ability and personality may be helpful in the diagnosis and detection of
troublesome behavior. Industry and government alike have been prodigious users of tests for
selecting workers. Research workers often rely on tests to translate theoretical concepts
(e.g., intelligence) into experimentally useful measures.
1. Objectivity
2. Reliability
3. Validity
4. Norms
5. Practicability
1. Objectivity:
The test should be free from subjective—judgment regarding the ability, skill,
knowledge, trait or potentiality to be measured and evaluated.
2. Reliability:
This refers to the extent to which they obtained results are consistent or reliable.
ADVERTISEMENTS:
When the test is administered on the same sample for more than once with a
reasonable gap of time, a reliable test will yield same scores. It means the test is trustworthy.
There are many methods of testing reliability of a test.
3. Validity:
It refers to extent to which the test measures what it intends to measure. For example,
when an intelligent test is developed to assess the level of intelligence, it should assess the
intelligence of the person, not other factors.
Validity explains us whether the test fulfils the objective of its development. There are many
methods to assess validity of a test.
4. Norms:
Norms refer to the average performance of a representative sample on a given test. It
gives a picture of average standard of a particular sample in a particular aspect. Norms are the
standard scores, developed by the person who develops test. The future users of the test can
compare their scores with norms to know the level of their sample.
5. Practicability:
ADVERTISEMENTS:
The test must be practicable in- time required for completion, the length, number of items or
questions, scoring, etc. The test should not be too lengthy and difficult to answer as well as
scoring.
ROTTER INCOMPLETE SENTENCE BLANK
- Sentence Completion Test
- Semi structured projective technique in which the subject is asked to finish a sentence for
which the first word or words are supplied.
Introduction:
RISB was developed by Julian Rotter and Benjamin Willerman in the early 1940s as
a means of screening large groups of soldiers to evaluate adjustment and fitness to return to duty
and to obtain specific information for evaluation and treatment.
The original RISB was published in 1950, and the most recent revisions, including
separate forms for clients in high school, college and adulthood, were published in 1992.
Measuring both adjustment and maladjustment is a chief aim of thee RISB, with the goal of
identifying both the presence and the relative absense of psychopathology. Therefore, the RISB
is intended help guide an initial clinical interview, formulate a diagnosis and arrive at a treatment
plan, rather than provide a comprehensive evaluation of the personality dynamics.
This over-all adjustment score is of particular value for screening purposes with college students
and in experimental studies. The ISB has also been used in a vocational guidance center to select
students requiring broader counseling than was usually given, in experimental studies of the
effect of psychotherapy and in investigations of the relationship of adjustment to a variety of
variables.
In the manual for the most recent edition of the RISB, 1992, new norms was based on data
collection from three studies conducted between 1977 and 1988.
Compared to other projective tests, sentence completion tests have been described as one of the
most valid among SCT's, the RISB has the most consistent evidence supporting its use in the
diagnosis and assessment of adjustment. Initial studies of Rotter and colleagues indicated that the
RISB was able to correctly identify 78% of the adjusted repondents and 59% of the maladjusted
respondents for women and 89% of adjusted and 52% maladjusted respondents for men.
In terms of Face Validity, RISB was constructed with low face validity, which is a test attribute
in keeping with the SCT orientation towards uncovering latent personality characteristics of
which an individual maybe unaware or unwilling to divulge directly
An overall score of 145 is generally perceived as the cutoff score for identifying significant
adjustment issues. However, as Rotter point out, this cutoff score is not absolute as an index of
psychopathology; rather, it should be used as a guide in the clinical judgement process.
A formal scoring system exist for the RISB, but some assessors choose not to use it, and clinical
judgement typically plays significant role in scoring. Thus, like that of the TAT, its scientific
standing is questioned by those who insist on tests with establishedreliability and validity. As
such, the RISB is often used to complement other personality measures and to provide more
personal details about the psychological problems of a particular client.
Advantages:
1. Freedom of Response
5.The method is extremely flexible in that new sentence beginnings can be constructed or tailor
made for a variety of clinical, applied and experimental purposes
Disadvantages:
1. Susceptible to semi-objective scoring, it cannot be machine scored and requires general skill
and knowledge of personality analysis for clinical appraisal and interpretation.
C1 (4) = Typical of the C1 category are responses in which concern is expressed regarding such
things as the world state of affairs, financial problems, specific school difficulties, physical
complaints, identification with minority groups, and so on. In general it might be said that
subsumed under C1 are minor problems which are not deep-seated or incapacitating, and more or
less specific difficulties.
C2 (5) = More serious indications of maladjustment are found in the C2 category. On the whole
the responses refer to broader, more generalized difficulties than are found in C1. I Included here
are expressions of inferiority feelings, psychosomatic complaints, concern over possible failure,
generalized school problems, lack of goals, feeling of inadequacy, concern over vocational
choice, and difficulty in heterosexual relationships as well as generalized social difficulty.
C3 (6) = Expression of severe conflict or indications of maladjustments are rated C3. Among the
difficulties found in this area are suicidal wishes, sexual conflicts, severe family problems, fear
of insanity, strong negative attitudes toward people in general, feelings of confusion, expression
of rather bizarre attitudes, and so forth.
“P” or positive responses are those indicating a healthy or hopeful frame of mind. These are
evidence by humorous or flippant remarks, optimistic responses, and acceptance reactions.
Responses range from P1to P3 depending on the degree of good adjustment expressed in the
statement. The numerical weights for the positive responses are
P1 (2) = In the P1 class common responses are those which deal with positive attitudes toward
school, hobbies, sports, expression interest in people, expression of warm feeling toward some
individual and so on.
P2 (1) =Generally found under the heading of P2 are those replies which indicate a generalized
positive feeling toward people, good social adjustment, healthy family life, optimism and humor.
P3 (0) = Clear cut good natured humor, real optimism, and warm acceptance are types of
responses which are subsumed under the P3 group. The ISB deviates from the majority of the
test in that it scores humorous responses.
NEUTRAL RESPONSES
“N” or neutral responses are those not falling clearly into either of the above categories. They are
generally on a simple descriptive level. Two general types of responses which account for a large
share of those that fall in the neutral category. One group includes those lacking emotional tone
or personal reference. The other group is composed of many responses which are found as often
among maladjusted as among adjusted individual and through clinical judgment could no be
legitimately place in either C or P group. All the N responses are scored 3. For example, “Most
girls . . . are females” or “When I was child . . . I spoke as a child”. These types of responses will
lie in neutral responses.
RELIABILITY
NORMS
The RISB manual reports adequate internal consistency, stability and interrater agreement.
Because the RISB is designed to sample broad content areas, assessing the internal consistency
of the measure yields only conservative estimate of its reliability. However the RISB still yields
moderate reliability values for both half-split reliability estimate and Cronbach's alpha. Split-half
estimates for different forms of RISB range from .74 to .84 in males and .83 to .86 for females.
Cronbach's alpha was .69 for a sample of college men. Thus moderate internal consistency is
evident in spite of the RISB's diverse content.
SCORING
RISB in Comparison with:
SCORING:
In terms of inter-scorer reliability, the original validity study of RISB found
coefficient of .91 for males and .96 for females. Since that time, such estimates have been
replicated in the literature, and the coefficients of agreement have ranged from as high as .99 to a
low of .72
Interpretation of RISB
The raw score is the simple numerical count response. Such as the number of the correct
responses of intelligence test.
Conflict responses:
Indicate unhealthy and maladjustment of behavior. Conflict response range is C1, C2 and C3
according to severity of conflicts.
Conflict responses
Total responses of C1 0
Total responses of C2 7
Total responses of C3 1
Positive responses:
Indicate helpful and healthy mind. Positive response range is P1, P2 and P3 according to
degree of responses.
Positive responses
Total responses of P1 3
Total responses of P2 16
Total responses of P3 5
Neutral responses:
Means neither positive nor negative or equal to N=5
Neutral responses
Total responses of N 8
Omission
Means no answer given or incomplete thought
Quantitative Analysis
0 7 1 8 3 16 5 40
0 35 6 24 6 16 0 87
Total Responses
Positive response Neutral response Conflict response
22 24 41
Total = 87
Predetermined Subject
Cut of score Score point
135 87
Analysis of RISB:
Her responses showed positive conditions in more areas, she feels comfort in the society. The
client score is 87 which as below from 135 it shows her mental stability. About social world, her
thoughts and feelings were positive.
Analysis of BDI:
The clients score on BDI is 5 it indicates that the subject is suffering from “Minimal
Depression”.
Thematic Apperception Test
Thematic apperception test (TAT) is a projective psychological test developed during the
1930s by Henry A. Murray and Christiana D. Morgan at Harvard University. Proponents of the
technique assert that subjects' responses, in the narratives they make up about ambiguous
pictures of people, reveal their underlying motives, concerns, and the way they see the social
world. Historically, the test has been among the most widely researched, taught, and used of such
techniques.
History
The TAT was developed by American psychologist Murray and lay psychoanalyst
Morgan at the Harvard Clinic at Harvard University during the 1930s. Anecdotally, the idea for
the TAT emerged from a question asked by one of Murray's undergraduate students, Cecilia
Roberts. She reported that when her son was ill, he spent the day making up stories about images
in magazines and she asked Murray if pictures could be employed in a clinical setting to explore
the underlying dynamics of personality.
Murray wanted to use a measure that would reveal information about the whole person but found
the contemporary tests of his time lacking in this regard. Therefore, he created the TAT. The
rationale behind the technique is that people tend to interpret ambiguous situations in accordance
with their own past experiences and current motivations, which may be conscious or
unconscious. Murray reasoned that by asking people to tell a story about a picture, their defenses
to the examiner would be lowered as they would not realize the sensitive personal information
they were divulging by creating the story.
Murray and Morgan spent the 1930s selecting pictures from illustrative magazines and
developing the test. After 3 versions of the test (Series A, Series B, and Series C), Morgan and
Murray decided on the final set of pictures, Series D, which remains in use today. Although she
was given first authorship on the first published paper about the TAT in 1935, Morgan did not
receive authorship credit on the final published instrument. Reportedly, her role in the creation of
the TAT was primarily in the selection and editing of the images, but due to the primacy of the
name on the original publication the majority of written inquiries about the TAT were addressed
to her; since most of these letters included questions that she could not answer, she requested that
her name be removed from future authorship.
During the time Murray was developing the TAT he was also involved in Herman Melville
studies. The therapeutic technique originally came to him from the "Doubloon chapter" in Moby
Dick.[6] In this chapter, multiple characters inspect the same image (a Doubloon), but each
character has vastly different interpretations of the imagery—Ahab sees symbols of himself in
the coin, while the religiously devout Starbuck sees the Christian Trinity. Other characters
provide interpretations of the image that give more insight into the characters themselves based
on their interpretations of the imagery. Crew members, including Ahab, project their self
perceptions onto the coin which was nailed to the mast. Murray, a lifelong Melvillist, often
maintained that all of Melville's oeuvre was for him a TAT.
After World War II, the TAT was adopted more broadly by psychoanalysts and clinicians to
evaluate emotionally disturbed patients. Later, in the 1970s, the Human Potential
Movement encouraged psychologists to use the TAT to help their clients understand themselves
better and stimulate personal growth.
Procedure
The TAT is popularly known as the picture interpretation technique because it uses a
series of provocative yet ambiguous pictures about which the subject is asked to tell a story. The
TAT manual provides the administration instructions used by Murray, although these procedures
are commonly altered. The subject is asked to tell as dramatic a story as they can for each picture
presented, including the following:
Psychometric characteristics
Thematic Apperception Tests are meant to evoke an involuntary
display of one’s subconscious. There is no standardization for evaluating one’s TAT responses;
each evaluation is completely subjective because each response is
unique. Validity and reliability are, consequently, the largest question marks of the TAT. There
are trends and patterns, which help identify psychological traits, but there are no distinct
responses to indicate different conditions a patient may or may not have. Medical professionals
most commonly use it in the early stages of patient treatment. The TAT helps professionals
identify a broad range of issues that their patients may suffer from. Even when individual scoring
procedures are examined, the absence of standardization or norms make it difficult to compare
the results of validity and reliability research across studies. Specifically, even studies using the
same scoring system often use different cards, or a different number of cards. Standardization is
also absent amongst clinicians, who often alter the instructions and procedures. Murstein
explained that different cards may be more or less useful for specific clinical questions and
purposes, making the use of one set of cards for all clients impractical.
Reliability
Internal consistency, a reliability estimate focusing on how highly test items correlate to
each other, is often quite low for TAT scoring systems. Some authors have argued that internal
consistency measures do not apply to the TAT. In contrast to traditional test items, which should
all measure the same construct and be correlated to each other, each TAT card represents a
different situation and should yield highly different response themes. Lilienfeld and
colleagues countered this point by questioning the practice of compiling TAT responses to form
scores. Both inter-rater reliability (the degree to which different raters score TAT responses the
same) and test–retest reliability (to degree to which individuals receive the same scores over
time) are highly variable across scoring techniques. However, Murray asserted that TAT answers
are highly related to internal states such that high test-retest reliability should not be
expected. Gruber and Kreuzpointner (2013) developed a new method for calculating internal
consistency using categories instead of pictures. As they demonstrated in a mathematical proof,
their method provides a better fit for the underlying construction principles of TAT, and also
achieved adequate Cronbach's alpha scores up to .84
Validity
The validity of the TAT, or the degree to which it measures what it is supposed to
measure, is low. Jenkins has stated that “the phrase ‘validity of the TAT’ is meaningless,
because validity is specific not to the pictures, but to the set of scores derived from the
population, purpose, and circumstances involved in any given data collection." That is, the
validity of the test would be ascertained by seeing how clinician's decisions were assisted based
on the TAT. Evidence on this front suggests it is a weak guide at best. For example, one study
indicated that clinicians classified individuals as clinical or non-clinical at close to chance levels
(57% where 50% would be guessing) based on TAT data alone. The same study found that
classifications were 88% correct based on MMPI data. Using TAT in addition to the MMPI
reduced accuracy to 80%
Alternate considerations
Despite the conflicting information about the psychometric
characteristics of the TAT, proponents have argued that the TAT should not be judged using
traditional standards of reliability and validity. According to Holt, “the TAT is a complex
method of assessing people, which does not lend itself to the standard rules of thumb about test
standards [. . .]” (p. 101). For example, it has been argued that the purpose of the TAT is to
reveal a wide range of personality characteristics and complex, nuanced patterns, as opposed to
traditional psychological tests that are designed to measure unitary and narrow
constructs. Hibbard and colleagues examined several considerations about traditional views of
reliability and validity as they apply to the TAT. First, they noted that traditional views of
reliability may limit the validity of a measure (such as occurs with multi-faceted concepts in
which characteristics are not necessarily related to each other, but are meaningful in
combination). Further, Cronbach's alpha, a commonly used measure of internal consistency, is
dependent on the number of items in scale. For the TAT, most scales use only a small number of
cards (with each card treated like an item) so alphas would not be expected to be very high.
Many clinicians also discount the importance of psychometrics, believing that generalizability of
the findings to a given client’s situation is more important than generalizing findings to the
population.
Scoring systems
When he created the TAT, Murray also developed a scoring system based on
his need-press theory of personality. Murray's system involved coding every sentence given for
the presence of 28 needs and 20 presses (environmental influences), which were then scored
from 1 to 5, based on intensity, frequency, duration, and importance to the plot. However,
implementing this scoring system is time-consuming and was not widely used. Rather, examiners
have traditionally relied on their clinical intuition to come to conclusions about storytellers.
Although not widely used in the clinical setting, several formal scoring systems have been
developed for analyzing TAT stories systematically and consistently. Three common methods
that are currently used in research are the:
Defense Mechanisms Manual (DMM)
This assesses three defense mechanisms: denial (least
mature), projection (intermediate), and identification (most mature). A person's thoughts/feelings
are projected in stories involved.
Story Design measures the examinee's ability to identify and formulate a problem
situation.
Story Orientation assesses the examinee's level of personal control, emotional distress,
confidence and motivation.
Story Solutions assesses how impulsive the examinee is. In addition to evaluating the
types of problem solutions that are provided, the number of problem solutions that
examinees provide for each of the TAT cards is summed.
Story Resolution provides information on the examinee's ability to formulate problem
solutions that maximize both short and long-term goals.
Examiners are encouraged to explore information obtained from the TAT stories as hypotheses
for testing rather than concrete facts.
General Interpretation
Interpretation of the responses will vary depending on the examiner and
what type of scoring was used. It is common that the standard scoring systems are used more in
research settings than clinical settings. Individuals can select certain scoring systems if they have
the goal to evaluate a specific variable such as motivation, defense mechanisms, achievement,
problem-solving skills, etc. If a clinician selects not to use a scoring system, there are some
general guidelines that can be utilized. For example, the stories created by the individuals in
response to the TAT cards are a combination of three things: the card stimulus, the testing
environment, and the personality of the examinee. For each card, the individual must
subjectively interpret the pictures which involves the individual taking their own experiences and
feelings to create a story. Therefore, it is beneficial to look at the common themes in the stories’
content and structure to help make conclusions.
With interpretation of the responses, it is important for the clinician to consider some cautions to
verify the information is as accurate as possible. First, the examiner should always be
conservative when interpreting responses. It is important to always err on the side of caution
instead of making bold conclusions. The examiner should also consider all the data when using
the TAT in a testing or evaluative setting. One response should not be given more importance
over the other responses. Additionally, the examiner should take the individual’s developmental
status and cultural background into consideration when examining responses. All of these
cautions should be considered when an examiner is using the TAT.
Criticisms
Like other projective techniques, the TAT has been criticized on the basis of poor
psychometric properties (see above). Criticisms include that the TAT is unscientific because it
cannot be proved to be valid (that it actually measures what it claims to measure), or reliable
(that it gives consistent results over time). As stories about the cards are a reflection of both the
conscious and unconscious motives of the storyteller, it is difficult to disprove the conclusions of
the examiner and to find appropriate behavioral measures that would represent the personality
traits under examination. Characteristics of the TAT that make conclusions based on the stories
yielded from TAT cards hard to be disproved have been termed "immunizing tactics." These
characteristics include the Walter Mitty effect (i.e., the assertion that individuals will exhibit high
levels of a given trait in TAT stories that do not match their overt behavior because TAT
responses may represent how a person wishes they were, not how they truly are) and the
inhibition effect (i.e., the assertion that individuals will not exhibit high levels of a trait in TAT
responses because they are repressing that trait). In addition, as the present needs of the
storyteller change over time, it is not expected that later stories will produce the same results.
The lack of standardization of the cards given and scoring systems applied is problematic
because it makes comparing research on the TAT very difficult. With a dearth of sound evidence
and normative samples, it is tough to determine how much useful information can be gathered in
this manner.
Some critics of the TAT cards have observed that the characters and environments are dated,
even 'old-fashioned', creating a 'cultural or psycho-social distance' between the patients and the
stimuli that makes identifying with them less likely. In specific situations it is even hard to
identify with people of opposite gender. Also, in researching the responses of subjects given
photographs versus the TAT, researchers found that the TAT cards evoked more 'deviant' stories
(i.e., more negative) than photographs, leading researchers to conclude that the difference was
due to the differences in the characteristics of the images used as stimuli.
In a 2005 dissertation, Matthew Narron, Psy.D. attempted to address these issues by reproducing
a Leopold Bellak 10 card set photographically and performing an outcome study. The results
concluded that the old TAT elicited answers that included many more specific time references
than the new TAT.
Contemporary applications
Despite criticisms, the TAT continues to be used as a tool for research into
areas of psychology such as dreams, fantasies, mate selection and what motivates people to
choose their occupation. Sometimes it is used in a psychiatric or psychological context to
assess personality disorders, thought disorders, in forensic examinations to evaluate crime
suspects, or to screen candidates for high-stress occupations. It is also commonly used in routine
psychological evaluations, typically without a formal scoring system, as a way to explore
emotional conflicts and object relations.
TAT is widely used in France and Argentina using a psychodynamic approach.
David McClelland and Ruth Jacobs conducted a 12-year longitudinal study of leadership using
TAT and found no gender differences in motivational predictors of attained management level.
The content analysis, however, "revealed 2 distinct styles of power-related themes that
distinguished the successful men from the successful women. The successful male managers
were more likely to use reactive power [that is, aggressive themes while the successful female
managers were more likely to use resourceful [that is, nurturing power themes. Differences
between the sexes in the power themes were less pronounced among the managers who had
remained in lower levels of management.
Popular culture
Due to the test's earlier popularity within psychology, the TAT has appeared in
a wide variety of media. For example, the Thomas Harris novel Red Dragon (1981) includes a
scene where the imprisoned psychiatrist and serial killer Dr. Hannibal Lecter mocks a previous
attempt to administer the test to him, while Michael Crichton included the TAT in the battery of
tests given to the disturbed main character Harry Benson in his novel The Terminal Man (1972).
The test is also given to the main characters in two widely differing tales about the human
mind: A Clockwork Orange (1962) and Daniel Keyes's Flowers for Algernon (1958–1966).
Italian poet Edoardo Sanguineti wrote a collection of poetry called T.A.T (1966–1968) that refers
to the Test.
References.
https://www.encyclopedia.com/medicine/psychology/psychology-and-psychiatry/
thematic-apperception-test
https://www.mentalhelp.net/psychological-testing/thematic-apperception-test/
American Psychiatric Association. (2000). Diagnostic and statistical manual of mental
disorders (Revised 4th ed.). Washington, DC: Author.
American Psychological Association(2002). Ethical Principles of Psychologists and
Code Of Conduct. Washington, DC: Author.
Beck, A.T., Steer R.A., Brown G.K. (1996). Beck Depression Inventory Manual, 2nd
Edition. San Antonio, TX, Psychological Corporation.
Beck, A. T.,Steer, R.A., & Garbin, G.M. (1988). Psychometric properties of the Beck
Depression Inventory: Twenty-five years of evaluation." Clinical Psychology Review, 8,
77- 100.
Beck A.T., Beamesderfer, A.(1974). Assessment of depression: the depression
inventory. Mod
Probl, 7,151–169.
Beck, J.S., Beck, A.T., & Jolly, J.B.(2001). Beck Youth Inventories, San Antonio, TX:
Psychological Association.
Dozois, D.J.A.,Dobsson, K.S., & Ahnberg, J.L.(1998). A psychometric evaluation of the
Beck Depression Inventory-II. Psychological Assessment, 10, 83-89.
Gallagher D, Breckenridge J, Steinmetz J, et al(1982). The Beck Depression Inventory
and Research Diagnostic Criteria: congruence in an older population. J Consult Clin
Psychol, 51,945–946.
Gregory, R. J. (2007). Psychological testing: History, principles, and applications (5th
ed.). Boston: Pearson Education, Inc.
Leigh, I.W., & Anthony-Tolbert, S.(2001).Reliability of the BDI-II with deaf persons.
Rehabilitation Psychology, 46,195- 202.
Osman, A.Kopper, B.A, Barrios, F. et al., (2007). Reliability and Validity of the Beck
Depression Inventory-II with Adolescent Psychiatric Inpatients. Psychological
Assessment,16,120- 132.
Ward, L.C.(2006). Comparison of Factor Structure Models for the Beck Depression
Inventory-II. Psychological Assessment, 28, 81- 88.