0% found this document useful (0 votes)
21 views26 pages

HANDOUTS

The document discusses the history of psychological testing. It describes early antecedents of testing from the Han Dynasty in China to the establishment of civil service exams in the US in the 1800s. It also discusses Charles Darwin's work on individual differences and early mental testing pioneers like Binet and Stern in the late 1800s and early 1900s who developed intelligence tests.

Uploaded by

Jonnafe Ignacio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views26 pages

HANDOUTS

The document discusses the history of psychological testing. It describes early antecedents of testing from the Han Dynasty in China to the establishment of civil service exams in the US in the 1800s. It also discusses Charles Darwin's work on individual differences and early mental testing pioneers like Binet and Stern in the late 1800s and early 1900s who developed intelligence tests.

Uploaded by

Jonnafe Ignacio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Topic: INTRODUCTION TO PSYCHOLOGICAL TESTING

Handout #1: Psychological Testing (Lec)


Instructor: Joselito Miguel C. Sibayan III

BASIC CONCEPTS OF PSYCHOLOGICAL TESTING

Test. A measurement device or technique used to quantify behavior or aid in the


understanding and prediction of behavior.

Item. A specific stimulus to which a person responds overtly; this response can be scored or
evaluated (for example, classified, graded on a scale, or counted)

Scale. Refers to a group of items that pertain to a single variable and are arranged in order of
difficulty or intensity. The process of arriving at the sequencing of the items is called scaling.

Battery. A group of several tests or subtests, that are administered at one time to one person.
It is a term often used in test titles.

Measurement. A process of measuring the individual’s intelligence, achievement, personality,


attitude and values, achievement and anything that can be expressed quantitatively.

Evaluation. A continuous process of determining the extent to which instructional objectives


are attained.

Standardization. Can refer to (1) the uniformity of procedure in all important aspects of the
administration, scoring, and interpretation of tests; and (2) the use of standards for
evaluating test results. The National Achievement Test, which is administered to thousands of
high school students in the Philippines, provide a good example of standardization.

These standards are most often norms derived from a group of individuals, known as the
normative sample in the process of developing a psychological test.

Psychological tests are measurement instruments that have five defining elements:

1. Psychological tests are systematic procedures.

2. Psychological tests measure a sample of behavior.

3. The behaviors sampled by tests are relevant to cognitive or affective functioning


or both. The behaviors can be either overt or covert.

4. Test results are evaluated and scored.

5. To evaluate test results, it is necessary to have standards based on empirical data.

Psychological testing, therefore, refers to all the possible uses, applications, and underlying
concepts of psychological and educational tests. The main use of these tests is to evaluate
individual differences or variations among individuals.
PSYCHOLOGICAL TESTING VS PSYCHOLOGICAL ASSESSMENT

• The process of psychological assessment can occur in health care, counseling, or forensic
settings, as well as in educational and employment settings.

• Psychological assessment is a flexible process aimed at reaching a defensible


determination concerning one or more psychological issues or questions, through the
collection, evaluation, and analysis of data appropriate to the purpose at hand.

Examples of issues that require investigation through psychological assessment:

• Diagnostic questions, such as differentiating between depression and dementia.

• Making predictions, such as estimating the likelihood of suicidal or homicidal behaviors.

• Evaluative judgements, such as those involved in child custody decisions or in assessing the
effectiveness of programs or interventions.

Typical differences between Psychological Testing and Assessment

Basis Psychological Testing Psychological Assessment

More complex; each assessment


Degree of Simpler; involves one uniform
involves various procedures and
complexity procedure.
dimensions.

Shorter, lasting from a few minutes Longer, lasting from a few hours to a
Duration
to a few hours. few days or more.

Knowledge of testing and other


Qualifications Knowledge of tests and testing
assessment methods as well as of
for use procedures.
the area assessed.

Key to the Not the tester; anyone can be a


The assessor.
process tester without affecting the results.

To measure an ability or attribute To answer a referral question, solve


Purpose
(usually numerical). a problem, or arrive at a decision.

Degree of Entails both structured and


Highly structured.
structure unstructured aspects.

Relatively simple investigation of Very difficult due to variability of


Evaluation of
reliability and validity based on methods, assessors, nature of
results
group results. presenting questions, etc.

TYPES OF TESTS

• Individual Tests. Those that can be given to only one person at a time.

• Group Tests. Can be administered to more than one person at a time by a single examiner,
such as when an instructor gives everyone in the class a test at the same time.

• Tests of Ability. The faster or the more accurate the test taker’s responses, the better his/
her scores on a particular characteristic.
‣ Achievement Tests. Refer to a measure of previous learning.

‣ Aptitude Tests. Refer to the potential for learning or acquiring a specific skill.

‣ Intelligence Tests. Traditionally distinguished from achievement and aptitude, these


refer to a person’s general potential to solve problems, adapt to changing
circumstances, think abstractly, and profit from experience.

• Tests of Typical Performance. Used to investigate not what a person can do, but what he
usually does.

‣ Personality Tests. Are related to the overt and covert dispositions of the individual.
For example, the tendency of a person to show a particular behavior or response in a
given situation. Personality tests usually measure typical behavior.

✓ Structured. Provide a statement, usually of the “self-report” variety, and


require the subject to choose between two or more alternative responses
such as “True” or “False.”

✓ Projective. In this type of personality test, either the stimulus (test


materials), the required response, or both are ambiguous or unclear. Unlike
structured personality tests, projective types asks the individual to give a
spontaneous response. One example is the famous Rorschach Inkblot Test.

‣ Interest Inventories. Measures an individual’s preference for certain activities or


topics, and thereby help determine occupational choices.

‣ Creativity Tests. They emphasize novelty and originality in the solution of problems
or in the production of artistic works.

• Speed Tests. Are those which a subject must be taken in a limited amount of time, answer
a series of questions or tasks of uniformly low level of difficulty.

• Power Tests. Have more items which are more difficult and the time is generous enough
that a very large percentage of test takers will have ample time to complete all of the
items.

PARTICIPANTS IN THE TESTING PROCESS AND THEIR ROLES

• Test authors. They conceive, prepare, and develop tests. They also find a way to
disseminate their tests, by publishing them either commercially or through professional
publications such as books or periodicals.

• Test publishers. They publish, market, and sell tests, thus controlling their distribution.

• Test reviewers. They prepare evaluative critiques of tests based on their technical and
practical merits.

• Test users. They select or decide to take a specific test off the shelf and use it for some
purpose. They may also participate in other roles, e.g., as examiners or scorers.
• Test administrators. They administer the test either to one individual at a time or to
groups. This is also referred as examiners.

• Test examiners. They take the test by choice or necessity.

• Test scorers. They tally the raw responses of the test taker and transform them into test
scores through objective or mechanical scoring or through the application of evaluative
judgments.

• Test score interpreters. They interpret test results to their ultimate consumers who may
be individual test takers or their relatives, other professionals, or organizers of various
kinds.
Topic: HISTORY OF PSYCHOLOGICAL TESTING
Handout #2: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III

HISTORICAL PERSPECTIVE

• Early antecedents

‣ Han Dynasty (206 BCE - 220 BCE). The use of test batteries was quite common.
These early tests related to such diverse topics as civil law, military affairs,
agriculture, revenue, and geography.

‣ Ming Dynasty (1368-1644 CE). Tests had become quite well developed. During this
period, a national multistage testing program involved local and regional testing
centers equipped with special testing booths. The purpose of this is to hire people
that are eligible for public office.

‣ 1832. British missionaries and diplomats encouraged the English East India Company
to copy the Chinese system as a method of selecting employees for overseas duty.

‣ 1883. The US government established the American Civil Service Commission, which
developed and administered competitive examinations for certain government jobs.

• Charles Darwin and Individual differences

‣ 1869. Sir Francis Galton published his book Hereditary Genius. It contained theories
in which he set out to show that some people possessed characteristics that made
them more fit than others.

‣ 1883. Galton started a series of experimental studies to document the validity of his
position, by demonstrating that individual differences exist in human sensory and
motor functioning.

‣ 1890. The term mental test was coined by James McKeen Cattell, which was inspired
by Galton’s efforts.

• Experimental Psychology and Psychophysical Measurement

‣ Johann Friedrich Herbart. He used mathematical models of the mind as the basis
for educational theories that strongly influenced 19th-century educational practices.

‣ Gustav Fechner. He devised the law that the strength of a sensation grows as the
logarithm of stimulus intensity, thus coining the term psychophysics.

‣ Guy Montrose Whipple. He conducted a seminar at the Carnegie Institute in 1969,


where he provided the basis for immense changes in the field of psychological
testing. From this seminar came the Carnegie Interest Inventory and later the Strong
Vocational Interest Blank.

• World War I

‣ Robert Yerkes. Headed a committee of distinguished psychologists who soon


developed two structured group tests of human abilities: the Army Alpha and the
Army Beta.
‣ Ever since the Army Alpha and Army Beta, the war fueled the widespread
development of group tests.

‣ The scope of testing also broadened to include tests of achievement, aptitude,


interest, and personality.

• Achievement Tests

‣ 1923. The development of standardized achievement tests reached its end in the
publication of the Stanford Achievement Test (SAT).

‣ 1930s. It was widely held that the objectivity and reliability of the new standardized
tests made them superior to essay tests.

• Personality Tests: 1920-1940

‣ Traits. Relatively enduring dispositions that distinguish one individual from another.
They are stable, and a collection of them form a psychological type. Optimism and
pessimism can be viewed as traits.

‣ Woodworth Personal Data Sheet. The first structured personality test, which was
developed during World War I and was published in final form after the war.

‣ Rorschach Inkblot Test. It was first published by Herman Rorschach of Switzerland in


1921. The test came to the United States several years after, where David Levy
introduced it.

‣ Thematic Apperception Test. A test developed by Henry Murray and Christina


Morgan in 1935, wherein it is more structured and purported to measure human
needs and thus to ascertain individual differences in motivation.

• The Emergence of New Approaches to Personality Testing

‣ Minnesota Multiphasic Personality Inventory (MMPI). In 1943, this initiated a new


era for structured personality tests. It used empirical methods to determine the
meaning of a test response, which helped revolutionize structured personality tests.

‣ Factor analysis. A method of finding the minimum number of dimensions (known as


factors) to account for large number of variables.

‣ 1940s. J.R. Guilford made the first serious attempt to used factor analysis in the
development of a structured personality test. By the end of this decade, Raymond B.
Cattell had introduced the Sixteen Personality Factor Questionnaire (16PF), which is
considered as one of the most well-structured personality tests.
Topic: TESTING AND SOCIETY
Handout #3: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III

TYPES OF DECISIONS

• Individual decisions
‣ Tests can be used to counsel or advise the examinee; presumably, the results of these
tests have some influence on the actions or decisions of the examinee.
‣ An example is the use of vocational interest tests in order to aid high school or
college students in deciding for their career path/s.
• Institutional decisions
‣ In educational settings, these decisions include those concerning admission,
placement, in either advanced or remedial programs, and the advancement or
retention of students.
‣ In industrial settings, these decisions include those concerning personnel selection,
identification of fast-track employees, placement in training programs, and
evaluations of job performance and promotability.
• Comparative decisions
‣ Involve comparisons of two or more people, actions, objects, options, and so forth.
‣ Personnel selection is a good example of a comparative decision on the part of the
organization.
‣ Require less information than would be required for an absolute decisions.
• Absolute decisions
‣ Involve decisions about a single person, option, or object; instead of having to choose
between two well-defined options.
‣ Often require more precise measurement than is required for comparative decisions.

SOCIETAL CONCERNS

• Ability testing
‣ Over the last 50 years, the driving issue of this debate has been the existence and
meaning of racial-based, ethnic-based, and gender-based differences in test scores.
‣ Are the differences real?
✓ These differences may be due to bias in the tests. If this explanation were
true, then the tests themselves would be the cause of unfair outcomes, such
as the loss of jobs, scholarships, and other opportunities.
‣ Are the differences large?
✓ For instance, some researchers suggest that there are gender differences in
verbal ability, but these are so small as to be of little consequence.
✓ A variation on this same theme is the debate over whether cognitive ability
is really that important in the real world. It is widely accepted that cognitive
ability is important in school, but critics suggest that it has little relevance
in other settings.
‣ Do tests help or hurt?
✓ Researchers argue that tests provide opportunities for talented members of
underrepresented groups to demonstrate their abilities and that without
tests, it would be very difficult for any member of a disadvantaged group to
get ahead.
‣ Efficiency versus equity
✓ Tests generally contribute to the efficiency of the workforce, because it will
lead to the selection of more productive workers. On the other hand, tests
may reduce the equity in the assignment of jobs in the sense that they may
lead to the hiring of fewer minorities.
• Invasion of privacy
‣ Confidentiality
✓ One potential concern is that the results of psychological tests might
become available to people who have no legitimate use for these results.
✓ It is important to realize that virtually all testing professionals accept the
idea that the results of a psychological test should never be broadcast
indiscriminately; this is especially true of tests whose content or results are
sensitive.
✓ Refer to Standard IV and VI B. of the Psychological Association of the
Philippines Code of Ethics.
✓ Sample case: When safekeeping data, it is important to remember that:
(a) Electronic passwords should not be used in case the safe keeper loses
access.
(b) Sealed cartons are better than steel cabinets because they do not
become rusty.
(c) Clients should be reminded of the danger of transmitting results
through e-mail.
(d) None of the above
‣ Informed Consent
✓ Some psychological tests and assessment procedures involve deception. For
example, some honesty and integrity tests appear to be nothing more than
surveys of beliefs and experiences.
✓ The subjects must be informed of the general purpose and nature of the
research, as well as of the possible dangers and threats involved.
✓ Refer to Standard III J. of the Psychological Association of the Philippines
Code of Ethics.
• Fair use of tests
‣ We can clarify the debate over test fairness by making a distinction between fairness
of the testing process and fairness of the testing outcomes.
‣ There might be obstacles that prevent some persons from performing well, and the
test administration will probably be viewed as unfair.
‣ Tests may be used for purposes that are inherently objectionable. Several reviewers
have suggested that occupational and professional licensing examination often serve
no legitimate purpose and are used merely to restrict the number of persons in a job
or an occupation.
Topic: RELIABILITY & VALIDITY
Handout #4: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III

DEFINING RELIABILITY

• According to Toplis, Dulewicz & Fletcher (2005), reliability refers to the stability and
consistency of test results obtained.

• Murphy & Davidshofer (2005) believe that the reliability or consistency of test scores is
critically important in determining whether a test can provide good measurement.

• For Kaplan & Sacuzzo (2010), tests that are relatively free of measurement error are
deemed to be reliable.

HISTORY AND THEORY OF RELIABILITY

• 1733. Abraham De Moivre introduced the basic notion of sampling error.

• 1896. Karl Pearson developed the product moment correlation.

• 1904. British psychologist Charles Spearman worked out most of the basics of
contemporary reliability theory and published his work in an article entitled The Proof and
Measurement of Association between Two Things. He was responsible for the advanced
development of reliability assessment.

• 1937. Kuder and Richardson published an article that introduced several new reliability
coefficients.

Item Response Theory. Takes advantage of computer technology to advance psychological


measurement significantly. It states that each item on a test has its own characteristic curve
that describes the probability of getting each particular item right or wrong given the ability
of each test taker.

Classical Test Theory. Assumes that each person has a true score that would be obtained if
there were no errors in measurement. The difference between the true score (T) and the
observed score (X) results from measurement error (E).

ESTIMATING RELIABILITY

• Test-retest reliability

‣ Considers the consistency of the test results when the test is administered on
different occasions or two different times.
‣ Tests that measure some constantly changing characteristic are NOT appropriate for
this type of evaluation (e.g. Rorschach’s Inkblot)
• Parallel form method (also Equivalent form method)
‣ Evaluating the test across different forms of the test (two independent tests with the
measure of the same attribute/s).
‣ Pearson product moment correlation coefficient (also Pearson r) is used as an
estimate.
• Internal Consistency- refers to the consistency of test results, ensuring that the various
items measuring the different constructs deliver consistent scores; intercorrelations among
items within the same test.
‣ Split-half method
✓ A single test is given and divided into halves that are scored separately.
✓ Split using odd-even system, fishbowl, or balancing the halves by difficulty.
✓ Spearman-Brown formula is used as an estimate.
‣ Kuder-Richardson 20 (KR20)
✓ Used for tests with dichotomous items, and should NOT be used with
personality tests and attitude scales.
✓ Horst’s Modification Formula can be used as an alternative.
‣ Cronbach’s Alpha
✓ Averages the correlation between every possible combination of split-halves,
and allows multi-level responses.
✓ 0.7 is generally accepted as a sign of good reliability.
‣ Average Inter-item correlation
✓ Uses all of the items on the instrument that are designed to measure the
same construct, where the correlation for each pairs of items are computed.

• Inter-rater reliability
‣ Two different raters/observers with the same training/orientation should get the
same results in measuring reliability.
‣ It tests how similarly people categorize items and how similarly people score items.
‣ Cohen’s kappa coefficient is used as an estimate.
✓ If K = 1, raters are in complete agreement.
✓ > 0.75 = excellent agreement; 0.40 - 0.75 = fair; < 0.40 = poor

WHAT TO DO ABOUT LOW RELIABILITY?

• Increase the number of items.

• Remove items that hinder reliability by using factor analysis and discriminability analysis
(correlation between each item and the total score).

• Correction for attenuation- estimating between two correlation measures w/o


measurement error.

DEFINING VALIDITY

• According to Murphy & Davidshofer (2005), a test is valid if it can be used to make correct
or accurate decisions.

• To Kaplan & Sacuzzo (2010), validity can be defined as the agreement between a test score
or measure and the quality it is believed to measure.

• “Does the test measure what it is supposed to measure?”


ASPECTS OF VALIDITY

• Face validity
‣ The mere appearance that a measure has validity.
‣ It is really not validity at all because it does not offer evidence to support
conclusions drawn from test scores, but it does not suggest that it’s unimportant.

• Content Validity
‣ It is established by showing that the behaviors sampled by the test are a
representative sample of the attribute being measured.
‣ A content domain represents the total set of behaviors that could be used to measure
a specific attribute or characteristic of individuals that are to be tested.
‣ Factor analysis and expert judgement are some methods to assess content validity.
‣ Construct underrepresentation describes the failure to capture important
components of a construct.
‣ Construct-irrelevant variance occurs when scores are influenced by factors irrelevant
to the construct.

• Criterion Validity
‣ Tells how well a test corresponds with a particular criterion (known standard).
‣ Such evidence is provided by high correlations between a test and a particular
criterion.
‣ Predictive validity is known as a forecasting function of tests.
‣ Concurrent validity comes from assessments of the simultaneous relationship
between the test and the orientation (such as between a learning disability test and
school performance). Thus, it applies when the test and the criterion can be
measured at the same time.

• Construct Validity
‣ An assembled evidence about what a test means; it is done by showing the
relationship between a test and other tests and measures.
‣ It is established through a series of activities in which a researcher simultaneously
defines some construct and develops the instrumentation to measure it.
‣ A construct is something but by mental synthesis. It can be broken down into its
component parts, known as domains.
‣ Convergent validity is obtained when a measure correlates well with other tests
believed to measure the same construct.
‣ Divergent validity occurs when a test should have low correlations with measures of
unrelated constructs, or evidence for what the test does not measure. It also
describes the uniqueness of a test and answers the question “Why should we create a
test if there is already one available to do the job?”

“A test can be reliable without being valid, but a test cannot be valid without being reliable.”
Topic: THEORIES OF INTELLIGENCE & THE BINET SCALES
Handout #5: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III

DEFINING INTELLIGENCE
• Alfred Binet: “The tendency to take and maintain a definite direction; the capacity to
make adaptations for the purpose of attaining a desired end, and the power of engaging in
self-criticism so that necessary adjustments in strategy can be made.”
• Charles Spearman: “The ability to educe either relations or correlates.”
• Freeman (1955): “Adjustment or adaptation of the individual to his total environment,”
“ability to learn,” and “the ability to carry on abstract thinking.”
• Das (1973): “The ability to plan and structure one’s behavior with an end in view.”
• Howard Gardner: “The ability to resolve genuine problems or difficulties as they are
encountered.”
• Sternberg (1986, 1988): “Mental activities involve in purposive adaptation to, shaping of,
and selection of real-world environments relevant to one’s life.”
• Anderson (2001): “Intelligence is two-dimensional and based on individual differences in
information-processing speed and executive functioning influenced largely by inhibitory
processes.

RESEARCH TRADITIONS USED TO STUDY INTELLIGENCE


• Psychometric approach. Examines the elemental structure of a test.
• Information-processing approach. Examines the processes that underlie how we learn and
solve problems.
• Cognitive approach. Focuses on how humans adapt to real-world demands.

BINET’S PRINCIPLES OF TEST CONSTRUCTION


• Principle 1: Age differentiation
‣ Refers to the simple fact that one can differentiate older children from younger
children based upon their mental capabilities.
‣ Whereas most 9-year-olds can tell that 10 pesos is worth more than a peso, a peso is
worth more than 50 centavos, and so on, while most 4-year-olds cannot.
‣ Mental age. The equivalent age capabilities of a child regardless of his or her
chronological age.
• Principle 2: General Mental Ability
‣ Refers to the total product of the various separate and distinct elements of
intelligence.
‣ Binet freed himself from the burden of identifying each element or independent
aspect of intelligence, as well as finding the relations for each.

SPEARMAN’S MODEL OF GENERAL MENTAL ABILITY


• Charles Spearman advanced the notion of a general mental ability factor underlying all
intelligent behavior. According to his theory, intelligence consists of one general factor (g)
plus a large number of specific factors.
• Positive manifold. According to this phenomenon, when a set of A representative sample is
diverse ability tests are administered to large samples of the one that comprises individuals
population, almost all of the correlations are positive. This similar to those for whom the
test is to be used.
resulted from the fact that all tests, no matter how diverse, are
influenced by g, according to Spearman.
An age scale is a scale where
• Recent theories of intelligence have suggested that human items are grouped according
intelligence can best be conceptualized in terms of multiple to age level rather than
intelligences rather than a single score. One such theory is called simply one set of items of
the gc-gf theory. increasing difficulty.

• Fluid intelligence. Abilities that allow us to reason, think, and


acquire new knowledge. Mental age was based on a
subject’s performance
• Crystallized intelligence. Represents the knowledge and compared with the average
understanding that we have acquired. performance of individuals in
a specific chronological age
group.
THE EARLY BINET SCALES
• 1905 Binet-Simon Scale The concept of the
‣ 30 items in increasing order of difficulty. intelligence quotient (IQ)
used a subject’s mental age in
‣ Used outdated terms such as idiot (severe), imbecile conjunction with his/her
(moderate), and moron (mild). chronological age.
‣ The first major measure of human intelligence.
‣ Lacked an adequate measuring unit to express results, To calculate IQ:
lacked normative sample (only 50 children), and had • Determine the subject’s
limited validity. birthday to know his/her
chronological age.
• The 1908 Scale • The subject’s mental age is
‣ Retained the principle of age differentiation. determined by his/her
score on the scale.
‣ Used the age scale format. • Divide mental age (MA) to
‣ Introduced the concept of mental age. chronological age (CA), and
the result multiplied by
100 to eliminate fractions.
TERMAN’S STANFORD-BINET INTELLIGENCE SCALE
• The 1916 Stanford-Binet Intelligence Scale (First Edition)
‣ The SB1 is a revised version of the Binet-Simon Scale done by Lewis M. Terman.
‣ Retained principles of age differentiation, general mental ability, age scale, and
mental age concept.
‣ Age range increased from only 3-14 years of age, up to average and superior adults.
‣ Provided the first significant application of the now outdated intelligence quotient.
‣ Increased size of the standardization sample, although all of them consisted of white
native-Californian children.
‣ People believed that mental age ceased to improve after 16 years of age, so 16 was
used as the maximum chronological age.
• The 1937 Scale (Second Edition)
‣ Maud Merrill was a student of Terman, and quickly became a professor at Stanford
University, where they started the revisions of the second edition together.
‣ Retained the same principles as the 1916 scale, along with the IQ concept.
‣ Extended the age range down to the 2-year-old level, and increased the maximum
possible mental age to 22 years, 10 months.
‣ Performance items were added. However, only some 25% of the items were
nonverbal, so the test was not balanced.
‣ The standardization sample came from 11 US states representing a variety of regions.
However, the sample only included whites and more urban subjects than rural ones.

• The 1960 and 1973 Stanford-Binet Revision (Third Edition)


‣ When Terman died in 1956, the revisions for the third edition went underway, and
Merrill was able to publish the final revision in 1960.
‣ Retained the same principles as the 1916 scale, but the IQ concept was removed.
‣ The IQ tables (maximum chronological age) were extended from 16 to 18.
‣ Added the concept of deviation IQ. It was simply a standard score with a mean of 100
and a standard deviation of 15.
‣ In 1972, a new standardization group consisting of a representative sample of 2,100
children had been obtained for use with the 1960 revision. The 1972 norms contained
nonwhites (including black and Spanish-surnamed individuals) and whites.

THURSTONE’S MULTIDIMENSIONAL MODEL


• Louis Thurstone disagreed with Spearman’s idea of one g score, the single number that
defined how intelligent we are. He didn’t think that one number was enough for a person’s
intelligence.

THE MODERN BINET SCALE (Fourth and Fifth Editions)


• The 1986 Revision (Fourth Edition) The basal level refers to the
‣ Robert Thorndike was asked to take over after Merrill’s minimum number of correct
responses that is obtained.
retirement. With the help of Elizabeth Hagen, and
Jerome Sattler, Thorndike produced the SB4 in 1986.
The ceiling level refers
‣ Retained principles of age differentiation, evaluation number of incorrect responses
of general mental ability, and the use of standard that indicate that the items
scores, while the concept of age scales had been are too difficult.
removed.
‣ A point scale is the currently widespread arrangement The 15 subtests include:
of tests into subtests, with all items of a given type 1. Working Memory
administered together. An age scale was used to • Bead Memory
• Memory for Sentences
provide a direct translation of the child’s performance
• Memory for Digits
to mental age. The point scale was used for SB4.
• Memory for Objects
‣ Contained 4 major content areas (as seen on the table 2. Quantitative Reasoning
below), and 15 separate subtests. • Quantitative
• Number Series
‣ Introduced the concept of adaptive testing in order
• Equation Building
to gauge the age group for the test. The Vocabulary 3. Verbal Reasoning
subtest serves as a “routing” measure at the beginning • Vocabulary
of each assessment. Performance on this subtest is • Comprehension
used to determine the appropriate entry level for • Absurdities
succeeding subtests. • Verbal Relations
4. Abstract/Visual Reasoning
Abstract/Visual Reasoning • Pattern Analysis
• Copying
Verbal Reasoning • Matrices
General Intelligence
Quantitative Reasoning
• Paper Folding and
Cutting
Working Memory

Hierarchical structure of the SB4.


• The Stanford-Binet Intelligence Scale: Fifth Edition (2003)
‣ The fifth edition of the SB scale was published by Gale H. Roid, who was also an
author/co-author of 7 other published tests, like the Tennessee Self-Concept Scale.
‣ This edition represents a good integration of the age scale and point scale formats.
‣ The examination begins with one of two routing measures: one verbal, one
nonverbal. The purpose of routing tests is to estimate the examinee’s level of ability.
‣ The routing tests are organized in a point scale format, which means that each
contains similar content of increasing difficulty. The remaining subtests are arranged
in an age scale format, which means that tasks of differing content are grouped on
the basis of difficulty.
‣ Examiners can complete scaled scores for each of the 5 nonverbal subtests (matrices
tasks, etc.) and each of the 5 corresponding verbal (analogies, etc.) subtests. These
scaled scores have a mean of 10 and a standard deviation of 3.
‣ The standard scores for nonverbal IQ (NVIQ), verbal IQ (VIQ), full scale IQ (FSIQ), and
each of the five factors (FRIQ, KNIQ, QRIQ, VSIQ, and WMIQ) have a mean of 100 and
a standard deviation of 15.

Fluid Reasoning (FR) Nonverbal Matrices Tasks

Knowledge (KN) Nonverbal Recognize absurdities in pictures

Quantitative Reasoning (QR) Nonverbal Quantitative Reasoning

Visual/Spatial Reasoning (VS) Nonverbal Form Board

Working Memory (WM) Nonverbal Block Pattern Memory

Nonverbal tasks; scored as NVIQ.

Fluid Reasoning (FR) Verbal Analogies

Knowledge (KN) Verbal Vocabulary

Quantitative Reasoning (QR) Verbal Verbal Quantitative Reasoning

Visual/Spatial Reasoning (VS) Verbal Positions and Directions

Working Memory (WM) Verbal Sentence Memory

Verbal tasks; scored as VIQ.

Nonverbal Matrices Tasks


Fluid Reasoning (FR)
Verbal Analogies

Nonverbal Recognize absurdities in pictures


Knowledge (KN)
Verbal Vocabulary

Nonverbal Quantitative Reasoning


Quantitative Reasoning (QR)
Verbal Verbal Quantitative Reasoning

Nonverbal Form Board


Visual/Spatial Reasoning (VS)
Verbal Positions and Directions

Nonverbal Block Pattern Memory


Working Memory (WM)
Verbal Sentence Memory

Verbal and nonverbal tasks of the SB5.


Topic: WRITING AND EVALUATING TEST ITEMS
Handout #1: Psychological Testing (Lab)
Instructor: Joselito Miguel C. Sibayan III

GUIDELINES IN ITEM WRITING


When a professor announces that there will be a test, one of the first questions is
“What kind of test?” Will it be true-false, multiple-choice, essay, or fill-in-the-blank? The test
constructor must determine the best format for getting these responses.
Here are some general considerations in writing test items:
• Carefully define your instructional objectives.
• Prepare a table of specifications and refer to it as you write the test items.
• Formulate well-defined questions. Avoid vague, ambiguous and too global questions.
• Prepare a scoring key and guide, preferably as the item is being written.
• Create an item pool, meaning that you should prepare more items than are actually
needed.
TABLE OF SPECIFICATIONS
The TOS is a test blueprint or guide which identifies what the test intends to measure
(knowledge, attitude, skills) and to what extent. It is a guaranteed adequate sampling of
instructional objectives in terms of learning tasks, equitably represented and distributed in
the test. A SAMPLE OF TOS IS ON THE NEXT PAGE

PARTS OF TABLE OF SPECIFICATIONS


• Course Content or General Topics • Specific Objectives
• Behavior or Learning Tasks • Number of Items
• General Objectives • Percentage of Items

STEPS IN ITEM ANALYSIS


1. Arrange all the scores from highest to lowest.
2. Determine the 27% of the high scorers and 27% of the low scorers.
3. Compute the percentage of passing for the upper group and the lower group.

Number of correct responses in the upper group


pu =
Total number of individuals in the upper group

Number of correct responses in the lower group


pl =
Total number of individuals in the lower group

4. Compute the difficulty index (p) of the item by using the formula below:
p= pu + p l
2 DECISIONS FOR EACH ITEM

5. Determine difficulty level of the items: D P Decision


Easy
‣ p = 0.76 or higher (EASY ITEM)
Poor Item Difficult
‣ P = 0.25 to 0.75 (AVERAGE ITEM) DISCARD
Average
‣ P = 0.24 and below (DIFFICULT ITEM) Difficult
Marginal Item Average
6. Determine the discriminatory scale (D) for each item: Easy
MODIFY
‣ 0.4 and above (VERY GOOD ITEM) Easy
Reasonably
Average
‣ 0.3 to 0.39 (REASONABLY GOOD) Good Item
Difficult
‣ 0.2 to 0.29 (MARGINAL) Average
Very Good RETAIN
‣ 0.19 and below (POOR ITEM) Easy
Item
Difficult
Virgen Milagrosa University Foundation
Martin P. Posadas Avenue
COLLEGE OF ARTS AND SCIENCES
San Carlos City, Pangasinan

TABLE OF SPECIFICATIONS FOR SCIENCE

Course/Subject: General Science Date of Examination: March 2, 2016


Course/Year: Grade IX Number of Items: 10
Name of Examination: Achievement Test Total Points/Score: 10

General Objective:
This test measures the knowledge of the students in the subject Mathematics
focused on the following topics: The Need to Respire, Punnett Square, Bond
Formation, Avogadro’s number, Global Warming, Beyond Our Solar System,
Biomimicry, and Force, Motion, and Energy.

Specific Objectives:

1. To recall the different phases of respiration.


2. To recognize the purpose or use of the Punnett Square.
3. To identify how some bond formations work.
4. To recall the pioneer behind Avogadro’s number.
5. To further understand the forces of nature, of biomimicry and global
warming.
6. To analyze and differentiate astronomy and astrology.
7. To illustrate some concepts of force, motion, and energy.

BEHAVIORAL OBJECTIVES TOTAL


PERCENTAGE
CONTENT AREA NUMBER
OF ITEMS
K C AP AN S E OF ITEMS

The Need to Respire 1 1 10%


Punnett Square 1 1 10%
Bond Formation 1 1 10%
Avogadro’s Number 1 1 10%
Global Warming 1 1 10%
Beyond Our Solar
1 1 2 20%
System
Biomimicry 1 1 10%
Force, Motion, and
1 1 2 20%
Energy
Total No. Of Items 5 2 1 1 1 10 100%
SUGGESTIONS FOR PREPARING MULTIPLE-CHOICE ITEMS
• Select a concept or an idea that is important the examinees to know or to understand.
• Structure the item around one central idea or problem that is clearly presented in the
stem and to which all the options relate in the same way.
• Make sure the item has one and only one correct answer.
• Use language that is simple, direct, and free of ambiguity. Do not make an item a test of
reading ability unless this is the purpose of the question.
• Use charts, tables, graphs, and diagrams freely. However, in using these devices, make sure
that each item is dependent of the other items in the set.
• Keep the purpose of the item clearly in mind.
• To promote fairness to the population being tested, consider the exposure of the
examinees to the material or subject matter being tapped.
• Aim for a distribution of items in terms of cognitive skills, knowledge, comprehension,
application, analysis, synthesis, and evaluation.
• If you intend for an item to be difficult, make certain it is difficult, because it requires
sophisticated reasoning or understanding of a high level concept, not because it tests
obscure or esoteric subject matter.
• Avoid items or phrases that might be offensive to any ethnic/minority group.
• Concentrate on positive statements and avoid the phrases “least," “not," or “except.”
Source: Behavioral Sciences Department, Don Mariano Marcos Memorial State University

• Use a question format rather than incomplete statements.


• Keep option lengths similar. Avoid making your correct answer the long or short answer.
• Balance the placement of the correct answer.
• Be grammatically correct. This can be avoided by using simple and unambiguous wording.
• Avoid clues to the correct answer.
• Try to avoid the “All of the above” or “None of the above” options. Students merely need
to recognize two correct options to get the answer correct.
Source: Brigham Young University (2001)
BEHAVIORAL OBJECTIVES
• Knowledge
‣ The knowledge objectives emphasize the processes of remembering. For the
purposes of measurement, the recall situation involves more than bringing to mind
recognizing the appropriate material.
• Comprehension
‣ The comprehension objectives emphasize the type of understanding that an
individual needs in order to know what is being communicated.
• Application
‣ Questions require the student to rearrange the material he/she has learned and then
apply it. Most problems involving substitution of data in a formula or equation that is
not stated are of this type.
• Analysis
‣ Emphasizes the breakdown a communication into its constituent parts, the
identification of the relationship of the parts, and detection of the way in which the
parts are organized. The communication may be presented in a form of the details of
an experiment, an array of data, steps in an argument, and so forth.
• Synthesis
‣ Emphasizes putting together of two or more elements or parts in such ways as to
constitute a pattern or structure not clearly preset before.
• Evaluation
‣ Emphasizes the making of judgements requiring the use of two or more criteria
simultaneously. In a sense, the answering of any question requires the application of
at least one criterion.

TAXONOMY OF ACTION VERBS APPROPRIATE TO THE COGNITIVE DOMAIN

KNOWLEDGE COMPREHENSION
REQUIRE QUOTE ASSOCIATE PREDICT
COUNT READ CLASSIFY REWRITE
DEFINE RECALL COMPARE TRANSLATE
DRAW RECOGNIZE COMPUTE
IDENTIFY RECITE CONTRAST
INDICATE RECORD CONVERT
LABEL REPEAT DESCRIBE
LIST STATE DIFFERENTIATE
MATCH TABULATE DISCUSS
NAME TRACE DISTINGUISH
OUTLINE WRITE ESTIMATE
POINT EXTRAPOLATE

APPLICATION ANALYSIS
APPLY CALCULATE ANALYZE
CHANGE CLASSIFY CONSTRUCT
COMPLETE DEMONSTRATE DETECT
DISCOVER EMPLOY DIAGRAM
EXAMINE ILLUSTRATE DIFFERENTIATE
MANIPULATE OPERATE EXPLAIN
PRACTICE PREPARE INFER
PRODUCE RELATE OUTLINE
SOLVE USE SEPARATE
UTILIZE SUBDIVIDE
SUMMARIZE

SYNTHESIS EVALUATION
ARRANGE CATEGORIZE APPRAISE RECOMMEND
COMBINE CONSTRUCT ASSESS SELECT
CREATE DESIGN COMPARE SUPPORT
DEVELOP EXPLAIN CRITIQUE TEST
FORMULATE GENERATE DETERMINE
GENERALIZE INTEGRATE EVALUATE
ORGANIZE PLAN GRADE
PREPARE PRESCRIBE JUDGE
PRODUCE PROPOSE JUSTIFY
REARRANGE RECONSTRUCT MEASURE
SPECIFY SUMMARIZE RANK
Topic: THE CESD-R SCALE
Handout #2: Psychological Testing (Lab)
Instructor: Joselito Miguel C. Sibayan III

ABOUT THE CESD-R


• The Center for Epidemiologic Studies Depression Scale (CESD or CES-D) was created in 1977
by Laurie Radloff, and was revised in 2004 by Willian Eaton and others.
• The CESD has been the workhorse of depression epidemiology since its first use in the
Community Mental Health Assessment Surveys in the 1970s, and used in the National Health
and Nutrition Examination Surveys.
• The scale is well-known and remains as one of the most widely used instruments in the
field of psychiatric epidemiology.

USING THE CESD-R


• The 20 items in CESD-R scale measures symptoms of depression in 9 different groups as
defined by the American Psychiatric Association Diagnostic and Statistical Manual, fifth
edition (DSM-V). These symptom groups are shown below, along with their associated scale
question numbers:
‣ Sadness (dysphoria): Items 2, 4, 6 ‣ Guilt (worthlessness): Items 9, 17

‣ Loss of interest (anhedonia): Items 8, 10 ‣ Tired (fatigue): Items 7, 16

‣ Appetite: Items 1, 18 ‣ Movement (agitation): Items 12, 13

‣ Sleep: Items 5, 11, 19 ‣ Suicidal ideation: Items 14, 15

‣ Thinking/concentration: Items 3, 20

CALCULATING THE OVERALL SCORE


• The total CESD-R score is calculated as a sum of responses to all 20 questions. As in the
original CESD, the range of possible scores is between 0 and 60.
• The response values for each question are:

Nearly every
Not at all or
1 - 2 days 3 - 4 days 5 - 7 days day for 2
Less than 1 day
weeks
Items 6, 19, & 20 3 3 2 1 0

All other items 0 1 2 3 3

DETERMINING CATEGORIES
The determination of possible depressive symptom category is based upon an algorithm with
the following logic:
1. Meets criteria for major depressive episode: Anhedonia or dysphoria nearly everyday
for the past 2 weeks, plus symptoms in an additional 4 DSM symptom groups.
2. Probable major depressive episode: Anhedonia or dysphoria nearly everyday for the
past 2 weeks, or 5-7 days in the past week, plus symptoms in an additional 3 DSM
symptom groups.
3. Possible major depressive episode: Anhedonia or dysphoria nearly everyday for the
past 2 weeks, or 5-7 days in the past week, plus symptoms in an additional 2 DSM
symptom groups.
4. Sub-threshold depressive symptoms: People who have CESD-style score of at least 16,
but do not meet above criteria.
5. No clinical significance: People who have a total CESD-style score less than 16 across all
20 questions.
Center for Epidemiologic Studies Depression Scale – Revised (CESD-R)

Last week

Below is a list of ways you might have felt Nearly


or behaved. Please check on the boxes to Not at all every day
tell us honestly how often you have felt or Less 1-2 3-4 5-7 for 2
this way in the past week or so. than 1 days days days weeks
day

I did not feel like eating; my appetite was


poor.
I felt like I could not shake off the blues,
even with the help of my family and
friends.
I had trouble keeping my mind on what I
was doing.

I felt depressed.

My sleep was restless and uneasy.

I felt like I’m satisfied with life.

I could not “get going”.

Nothing made me happy.

I felt like a bad person.

I lost interest in my usual activities.

I slept more than usual.

I felt like I was moving too slowly.


I felt fidgety; my movements seemed like I
was very nervous or restless.
I wished I were dead.

I wanted to hurt myself in some way.

I was tired all the time.

I did not like myself.

I lost a lot of weight without trying to.

I had no problem falling asleep.

I was able to focus on important things.

_______________________________________
Signature above printed name (optional)
Topic: THE BECK ANXIETY INVENTORY
Handout #3: Psychological Testing (Lab)
Instructor: Joselito Miguel C. Sibayan III

ABOUT THE BECK ANXIETY INVENTORY


• The BAI, created by Aaron T. Beck and other colleagues, is a 21-question multiple-choice
self-report inventory that is used for measuring the severity of anxiety in children and
adults.
• It is designed for individuals who are of 17 years of age or older and takes 5 to 10 minutes
to complete. Several studies have found that the BAI to be an accurate measure of anxiety
symptoms in children and adults.

TWO FACTOR APPROACH TO ANXIETY


• Though anxiety can be thought of as having several components, including cognitive,
somatic, affective, and behavioral components, Beck included only two components in the
BAI’s original proposal (cognitive and somatic).
• Cognitive sub-scale: Provides a measure of fearful thoughts and impaired cognitive
functioning.
• Somatic sub-scale: Measures the symptoms of physiological arousal.

CLINICAL USE
• The BAI was specifically designed as “an inventory for measuring clinical anxiety” that
minimizes the overlap between depression and anxiety scales.
• While several studies have shown that anxiety measures, including the State-Trait Anxiety
inventory (STAI), are either highly correlated or indistinguishable from depression, the BAI
is shown to be less contaminated by depressive content.

USING THE BAI & SCORING


• The BAI contains 21 questions, each answer being scored on 4 scale values of 0 (not at all)
to 3 (severely). Higher total scores indicate more severe anxiety symptoms. The
standardized cutoffs are:
‣ Score of 0 - 21: low anxiety

‣ Score of 22 - 35: moderate anxiety

‣ Score of 36 and above: potentially concerning levels of anxiety

• The total score is calculated by finding the sum of the 21 items.

Mildly, but it didn’t Moderately - it wasn’t Severely - it


Not at all
bother me much pleasant at times bothered me a lot

All questions 0 1 2 3

INTERPRETATION
• An indication of low anxiety is usually a good thing. However, it is possible that you might
be unrealistic in either your assessment which would be denial or that you have learned to
“mask” the symptoms commonly associated with anxiety.
• In an indication of moderate anxiety, your body is trying to tell you something. Look for
patterns as to when and why you experience the symptoms described in the inventory. You
may have some conflict issues that need to be resolved.
• Persistent and high anxiety is not a sign of personal weakness or failure. It is, however,
something that needs to be proactively treated or there could be significant impacts to you
mentally and physically. You may want to consult a physician or counselor if the feelings
persist.
Beck Anxiety Inventory (BAI)
Below is a list of common symptoms of anxiety. Please carefully read each item in the list.
Indicate how much you have been bothered by that symptom during the past month, including
today, by circling the number in the corresponding space in the column next to each symptom.

Moderately -
Mildly, but it Severely - it
it wasn’t
Not at all didn’t bother bothered me
pleasant at
me much a lot
times

Numbness or tingling 0 1 2 3

Feeling hot 0 1 2 3

Wobbliness in legs 0 1 2 3

Unable to relax 0 1 2 3

Fear of worst happening 0 1 2 3

Dizzy or lightheaded 0 1 2 3

Heart pounding / racing 0 1 2 3

Unsteady 0 1 2 3

Terrified of afraid 0 1 2 3

Nervous 0 1 2 3

Feeling of choking 0 1 2 3

Hands trembling 0 1 2 3

Shaky / unsteady 0 1 2 3

Fear of losing control 0 1 2 3

Difficulty in breathing 0 1 2 3

Fear of dying 0 1 2 3

Scared 0 1 2 3

Indigestion 0 1 2 3

Faint / lightheaded 0 1 2 3

Face flushed 0 1 2 3

Hot / cold sweats 0 1 2 3

_______________________________________
Signature above printed name (optional)
Topic: CORRELATION AND REGRESSION
Handout #6: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III

BASIC CONCEPTS
• The American Psychological Association’s Task Force on Statistical Inference has suggested
that visual inspection of data is an important step in data analysis.
• A scatter diagram is a picture of the relationship between two variables. An example of a
scatter diagram is shown in Figure 6.1, which relates scores on a measure of anger for
medical students to scores on the CESD-R.

30

25

20
CESD-R (Y)

Figure 6.1. A scatter diagram.


The circled point shows a
15
person who had a score of 15
on X and 13 on Y.
10

10 20 30 40 50
Anger Inventory (X)

• The axes in the figure represent the scales for two variables. Values of X for the anger
inventory are shown in the horizontal axis, and values of Y for the CESD-R are on the
vertical axis.
• A correlation coefficient is a mathematical index that describes the direction and
magnitude of a relationship. Figure 6.2 shows three different types of relationships
between variables.
‣ A positive correlation means that when X increases, Y also increases.
✓ Higher scores in CESD-R correlates with higher scores in the anger inventory.
✓ Number of basketball games and fatigue: Players grow more tired the more
they play basketball.
‣ A negative correlation means that as X increases, Y decreases.
✓ Barbiturates and amount of activity: The higher the drug dose, the less
active the patients are.
‣ No correlation occurs when X and Y are not related.
✓ Shoe size and IQ is an example of a relationship that would lack correlation.

Y Y Y

X X X

Positive correlation Negative correlation No correlation


• Regression, a related technique, is used to make predictions about scores on one variable
from knowledge of scores on another variable. These predictions are obtained from the
regression line, which is defined as the best-fitting straight line through a set of points in
a scatter diagram.
• The regression coefficient, or b, is used to calculate the slope of a regression line. The
slope describes how much change is expected in Y each time X increases by one unit.
• The intercept, a, is the value of Y when X is 0. It is the point at which the regression line
crosses the Y axis.
• The residual is known as the difference between the observed score and predicted score.
Because residuals can be positive or negative, the best-fitting line is most appropriately
found by squaring each residual. Thus, the best-fitting line is obtained by keeping these
squared residuals as small as possible. This is known as the principle of least squares.

TYPES OF VARIABLES INVOLVED IN CORRELATION


1. Continuous variables — they can take on any values over an infinite range of values.
‣ Examples: Weight, height, intelligence, grade point average (GPA), etc.
2. True Dichotomous variables — they naturally form only two categories.
‣ Examples: Male/female, yes/no, true/false
3. Artificial Dichotomous variables — they reflect an underlying continuous scale forced
into dichotomy.
‣ Examples: Passing/failing, diagnosis, etc.

Continuous Artificial dichotomous True dichotomous


Continuous Pearson r Biserial r Point biserial r
Artificial dichotomous Biserial r Tetrachoric r Phi
True dichotomous Point biserial Phi Phi

CORRELATION COEFFICIENTS
• The Pearson product moment correlation, also Pearson r, is the most common measure of
correlation, and it measures the linear relationships between two continuous variables.
• Spearman rho is a method of correlation for finding the association between two sets of
ranks. The rho coefficient (p) is easy to calculate and is often used when the individuals in
a sample can be ranked on two variables but their actual scores are not known.
• The biserial correlation expresses the relationship between a continuous variable and an
artificial dichotomous variable.
‣ This might be used to assess a relationship between passing or failing the bar
examination (artificial dichotomous) and GPA in law school (continuous).
• The point biserial correlation expresses the relationship between a continuous variable
and a true dichotomous variable.
‣ Gender (true dichotomous) and intelligence (continuous).
• Phi coefficient is used when both variables are dichotomous and at least one of the
dichotomies is “true”.
‣ Relationship between passing or failing the bar exam (artificial) and gender (true).
• Tetrachoric correlation is used when both variables are of the same type of dichotomy.
‣ Relationship between owning a smartphone (you only either own or not own a
smartphone) and gender
TERMS & ISSUES IN THE USE OF CORRELATION
• The standard error of estimate is the standard deviation of the residuals, but instead of
using one degree of freedom, two will be used (N — 2), since there are two constants (a
and b). It is a measure of accuracy of prediction—more accurate when the value is smaller.
• Coefficient of determination tells us the proportion
Explained by
of the total variation on the scores on Y that we know SAT scores
as a function of information about X. Square the
correlation coefficient in order to get the coefficient 18%
Not explained
of determination. For example, if the correlation by SAT scores
coefficient between GPA and SAT scores is 0.42, its
determination coefficient is 0.18. This suggests that 82%
18% of the variance in the GPA scores are affected by
SAT scores. This is represented in a pie chart.
• The coefficient of alienation is a measure of non-
association between two variables. This is calculated as √1—r2, where r is the coefficient of
determination. For example, if the correlation coefficient between CESD-R and anger
inventory is 0.67, then its alienation coefficient is 0.57.
• Shrinkage is the amount of decrease observed when a regression equation is created for
one population and then applied to another. Say a regression equation is developed to
predict first-year college GPAs on the basis of SAT scores. Although the proportion of
variance in GPA might be fairly high for the original group, we can expect a smaller
proportion when the equation is used to predict GPA in the next year’s class.
• A third variable refers to any other possible explanations for the observed relationship
between two variables. For instance, if you study the relationship between television
viewing and aggressive behavior, other possible explanations for aggressive behavior might
include poor social adjustment, or some childhood trauma.
• Multivariate analysis considers the relationship among combinations of three or more
variables. For example, the prediction of success in the 1st year of college from the linear
combination of SAT verbal and quantitative scores is a problem for multivariate analysis.
• Discriminant analysis is a technique with an objective to assess the adequacy of a
classification, given the group memberships; or to assign objects to one group among a
number of groups, and whether significant differences exist among them. It can be used to
understand the characteristics of a customer possessing store loyalty and a customer who
does not have store loyalty.

“Correlation is not the same as causation.”

You might also like