HANDOUTS
HANDOUTS
Item. A specific stimulus to which a person responds overtly; this response can be scored or
evaluated (for example, classified, graded on a scale, or counted)
Scale. Refers to a group of items that pertain to a single variable and are arranged in order of
difficulty or intensity. The process of arriving at the sequencing of the items is called scaling.
Battery. A group of several tests or subtests, that are administered at one time to one person.
It is a term often used in test titles.
Standardization. Can refer to (1) the uniformity of procedure in all important aspects of the
administration, scoring, and interpretation of tests; and (2) the use of standards for
evaluating test results. The National Achievement Test, which is administered to thousands of
high school students in the Philippines, provide a good example of standardization.
These standards are most often norms derived from a group of individuals, known as the
normative sample in the process of developing a psychological test.
Psychological tests are measurement instruments that have five defining elements:
Psychological testing, therefore, refers to all the possible uses, applications, and underlying
concepts of psychological and educational tests. The main use of these tests is to evaluate
individual differences or variations among individuals.
PSYCHOLOGICAL TESTING VS PSYCHOLOGICAL ASSESSMENT
• The process of psychological assessment can occur in health care, counseling, or forensic
settings, as well as in educational and employment settings.
• Evaluative judgements, such as those involved in child custody decisions or in assessing the
effectiveness of programs or interventions.
Shorter, lasting from a few minutes Longer, lasting from a few hours to a
Duration
to a few hours. few days or more.
TYPES OF TESTS
• Individual Tests. Those that can be given to only one person at a time.
• Group Tests. Can be administered to more than one person at a time by a single examiner,
such as when an instructor gives everyone in the class a test at the same time.
• Tests of Ability. The faster or the more accurate the test taker’s responses, the better his/
her scores on a particular characteristic.
‣ Achievement Tests. Refer to a measure of previous learning.
‣ Aptitude Tests. Refer to the potential for learning or acquiring a specific skill.
• Tests of Typical Performance. Used to investigate not what a person can do, but what he
usually does.
‣ Personality Tests. Are related to the overt and covert dispositions of the individual.
For example, the tendency of a person to show a particular behavior or response in a
given situation. Personality tests usually measure typical behavior.
‣ Creativity Tests. They emphasize novelty and originality in the solution of problems
or in the production of artistic works.
• Speed Tests. Are those which a subject must be taken in a limited amount of time, answer
a series of questions or tasks of uniformly low level of difficulty.
• Power Tests. Have more items which are more difficult and the time is generous enough
that a very large percentage of test takers will have ample time to complete all of the
items.
• Test authors. They conceive, prepare, and develop tests. They also find a way to
disseminate their tests, by publishing them either commercially or through professional
publications such as books or periodicals.
• Test publishers. They publish, market, and sell tests, thus controlling their distribution.
• Test reviewers. They prepare evaluative critiques of tests based on their technical and
practical merits.
• Test users. They select or decide to take a specific test off the shelf and use it for some
purpose. They may also participate in other roles, e.g., as examiners or scorers.
• Test administrators. They administer the test either to one individual at a time or to
groups. This is also referred as examiners.
• Test scorers. They tally the raw responses of the test taker and transform them into test
scores through objective or mechanical scoring or through the application of evaluative
judgments.
• Test score interpreters. They interpret test results to their ultimate consumers who may
be individual test takers or their relatives, other professionals, or organizers of various
kinds.
Topic: HISTORY OF PSYCHOLOGICAL TESTING
Handout #2: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III
HISTORICAL PERSPECTIVE
• Early antecedents
‣ Han Dynasty (206 BCE - 220 BCE). The use of test batteries was quite common.
These early tests related to such diverse topics as civil law, military affairs,
agriculture, revenue, and geography.
‣ Ming Dynasty (1368-1644 CE). Tests had become quite well developed. During this
period, a national multistage testing program involved local and regional testing
centers equipped with special testing booths. The purpose of this is to hire people
that are eligible for public office.
‣ 1832. British missionaries and diplomats encouraged the English East India Company
to copy the Chinese system as a method of selecting employees for overseas duty.
‣ 1883. The US government established the American Civil Service Commission, which
developed and administered competitive examinations for certain government jobs.
‣ 1869. Sir Francis Galton published his book Hereditary Genius. It contained theories
in which he set out to show that some people possessed characteristics that made
them more fit than others.
‣ 1883. Galton started a series of experimental studies to document the validity of his
position, by demonstrating that individual differences exist in human sensory and
motor functioning.
‣ 1890. The term mental test was coined by James McKeen Cattell, which was inspired
by Galton’s efforts.
‣ Johann Friedrich Herbart. He used mathematical models of the mind as the basis
for educational theories that strongly influenced 19th-century educational practices.
‣ Gustav Fechner. He devised the law that the strength of a sensation grows as the
logarithm of stimulus intensity, thus coining the term psychophysics.
• World War I
• Achievement Tests
‣ 1923. The development of standardized achievement tests reached its end in the
publication of the Stanford Achievement Test (SAT).
‣ 1930s. It was widely held that the objectivity and reliability of the new standardized
tests made them superior to essay tests.
‣ Traits. Relatively enduring dispositions that distinguish one individual from another.
They are stable, and a collection of them form a psychological type. Optimism and
pessimism can be viewed as traits.
‣ Woodworth Personal Data Sheet. The first structured personality test, which was
developed during World War I and was published in final form after the war.
‣ 1940s. J.R. Guilford made the first serious attempt to used factor analysis in the
development of a structured personality test. By the end of this decade, Raymond B.
Cattell had introduced the Sixteen Personality Factor Questionnaire (16PF), which is
considered as one of the most well-structured personality tests.
Topic: TESTING AND SOCIETY
Handout #3: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III
TYPES OF DECISIONS
• Individual decisions
‣ Tests can be used to counsel or advise the examinee; presumably, the results of these
tests have some influence on the actions or decisions of the examinee.
‣ An example is the use of vocational interest tests in order to aid high school or
college students in deciding for their career path/s.
• Institutional decisions
‣ In educational settings, these decisions include those concerning admission,
placement, in either advanced or remedial programs, and the advancement or
retention of students.
‣ In industrial settings, these decisions include those concerning personnel selection,
identification of fast-track employees, placement in training programs, and
evaluations of job performance and promotability.
• Comparative decisions
‣ Involve comparisons of two or more people, actions, objects, options, and so forth.
‣ Personnel selection is a good example of a comparative decision on the part of the
organization.
‣ Require less information than would be required for an absolute decisions.
• Absolute decisions
‣ Involve decisions about a single person, option, or object; instead of having to choose
between two well-defined options.
‣ Often require more precise measurement than is required for comparative decisions.
SOCIETAL CONCERNS
• Ability testing
‣ Over the last 50 years, the driving issue of this debate has been the existence and
meaning of racial-based, ethnic-based, and gender-based differences in test scores.
‣ Are the differences real?
✓ These differences may be due to bias in the tests. If this explanation were
true, then the tests themselves would be the cause of unfair outcomes, such
as the loss of jobs, scholarships, and other opportunities.
‣ Are the differences large?
✓ For instance, some researchers suggest that there are gender differences in
verbal ability, but these are so small as to be of little consequence.
✓ A variation on this same theme is the debate over whether cognitive ability
is really that important in the real world. It is widely accepted that cognitive
ability is important in school, but critics suggest that it has little relevance
in other settings.
‣ Do tests help or hurt?
✓ Researchers argue that tests provide opportunities for talented members of
underrepresented groups to demonstrate their abilities and that without
tests, it would be very difficult for any member of a disadvantaged group to
get ahead.
‣ Efficiency versus equity
✓ Tests generally contribute to the efficiency of the workforce, because it will
lead to the selection of more productive workers. On the other hand, tests
may reduce the equity in the assignment of jobs in the sense that they may
lead to the hiring of fewer minorities.
• Invasion of privacy
‣ Confidentiality
✓ One potential concern is that the results of psychological tests might
become available to people who have no legitimate use for these results.
✓ It is important to realize that virtually all testing professionals accept the
idea that the results of a psychological test should never be broadcast
indiscriminately; this is especially true of tests whose content or results are
sensitive.
✓ Refer to Standard IV and VI B. of the Psychological Association of the
Philippines Code of Ethics.
✓ Sample case: When safekeeping data, it is important to remember that:
(a) Electronic passwords should not be used in case the safe keeper loses
access.
(b) Sealed cartons are better than steel cabinets because they do not
become rusty.
(c) Clients should be reminded of the danger of transmitting results
through e-mail.
(d) None of the above
‣ Informed Consent
✓ Some psychological tests and assessment procedures involve deception. For
example, some honesty and integrity tests appear to be nothing more than
surveys of beliefs and experiences.
✓ The subjects must be informed of the general purpose and nature of the
research, as well as of the possible dangers and threats involved.
✓ Refer to Standard III J. of the Psychological Association of the Philippines
Code of Ethics.
• Fair use of tests
‣ We can clarify the debate over test fairness by making a distinction between fairness
of the testing process and fairness of the testing outcomes.
‣ There might be obstacles that prevent some persons from performing well, and the
test administration will probably be viewed as unfair.
‣ Tests may be used for purposes that are inherently objectionable. Several reviewers
have suggested that occupational and professional licensing examination often serve
no legitimate purpose and are used merely to restrict the number of persons in a job
or an occupation.
Topic: RELIABILITY & VALIDITY
Handout #4: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III
DEFINING RELIABILITY
• According to Toplis, Dulewicz & Fletcher (2005), reliability refers to the stability and
consistency of test results obtained.
• Murphy & Davidshofer (2005) believe that the reliability or consistency of test scores is
critically important in determining whether a test can provide good measurement.
• For Kaplan & Sacuzzo (2010), tests that are relatively free of measurement error are
deemed to be reliable.
• 1904. British psychologist Charles Spearman worked out most of the basics of
contemporary reliability theory and published his work in an article entitled The Proof and
Measurement of Association between Two Things. He was responsible for the advanced
development of reliability assessment.
• 1937. Kuder and Richardson published an article that introduced several new reliability
coefficients.
Classical Test Theory. Assumes that each person has a true score that would be obtained if
there were no errors in measurement. The difference between the true score (T) and the
observed score (X) results from measurement error (E).
ESTIMATING RELIABILITY
• Test-retest reliability
‣ Considers the consistency of the test results when the test is administered on
different occasions or two different times.
‣ Tests that measure some constantly changing characteristic are NOT appropriate for
this type of evaluation (e.g. Rorschach’s Inkblot)
• Parallel form method (also Equivalent form method)
‣ Evaluating the test across different forms of the test (two independent tests with the
measure of the same attribute/s).
‣ Pearson product moment correlation coefficient (also Pearson r) is used as an
estimate.
• Internal Consistency- refers to the consistency of test results, ensuring that the various
items measuring the different constructs deliver consistent scores; intercorrelations among
items within the same test.
‣ Split-half method
✓ A single test is given and divided into halves that are scored separately.
✓ Split using odd-even system, fishbowl, or balancing the halves by difficulty.
✓ Spearman-Brown formula is used as an estimate.
‣ Kuder-Richardson 20 (KR20)
✓ Used for tests with dichotomous items, and should NOT be used with
personality tests and attitude scales.
✓ Horst’s Modification Formula can be used as an alternative.
‣ Cronbach’s Alpha
✓ Averages the correlation between every possible combination of split-halves,
and allows multi-level responses.
✓ 0.7 is generally accepted as a sign of good reliability.
‣ Average Inter-item correlation
✓ Uses all of the items on the instrument that are designed to measure the
same construct, where the correlation for each pairs of items are computed.
• Inter-rater reliability
‣ Two different raters/observers with the same training/orientation should get the
same results in measuring reliability.
‣ It tests how similarly people categorize items and how similarly people score items.
‣ Cohen’s kappa coefficient is used as an estimate.
✓ If K = 1, raters are in complete agreement.
✓ > 0.75 = excellent agreement; 0.40 - 0.75 = fair; < 0.40 = poor
• Remove items that hinder reliability by using factor analysis and discriminability analysis
(correlation between each item and the total score).
DEFINING VALIDITY
• According to Murphy & Davidshofer (2005), a test is valid if it can be used to make correct
or accurate decisions.
• To Kaplan & Sacuzzo (2010), validity can be defined as the agreement between a test score
or measure and the quality it is believed to measure.
• Face validity
‣ The mere appearance that a measure has validity.
‣ It is really not validity at all because it does not offer evidence to support
conclusions drawn from test scores, but it does not suggest that it’s unimportant.
• Content Validity
‣ It is established by showing that the behaviors sampled by the test are a
representative sample of the attribute being measured.
‣ A content domain represents the total set of behaviors that could be used to measure
a specific attribute or characteristic of individuals that are to be tested.
‣ Factor analysis and expert judgement are some methods to assess content validity.
‣ Construct underrepresentation describes the failure to capture important
components of a construct.
‣ Construct-irrelevant variance occurs when scores are influenced by factors irrelevant
to the construct.
• Criterion Validity
‣ Tells how well a test corresponds with a particular criterion (known standard).
‣ Such evidence is provided by high correlations between a test and a particular
criterion.
‣ Predictive validity is known as a forecasting function of tests.
‣ Concurrent validity comes from assessments of the simultaneous relationship
between the test and the orientation (such as between a learning disability test and
school performance). Thus, it applies when the test and the criterion can be
measured at the same time.
• Construct Validity
‣ An assembled evidence about what a test means; it is done by showing the
relationship between a test and other tests and measures.
‣ It is established through a series of activities in which a researcher simultaneously
defines some construct and develops the instrumentation to measure it.
‣ A construct is something but by mental synthesis. It can be broken down into its
component parts, known as domains.
‣ Convergent validity is obtained when a measure correlates well with other tests
believed to measure the same construct.
‣ Divergent validity occurs when a test should have low correlations with measures of
unrelated constructs, or evidence for what the test does not measure. It also
describes the uniqueness of a test and answers the question “Why should we create a
test if there is already one available to do the job?”
“A test can be reliable without being valid, but a test cannot be valid without being reliable.”
Topic: THEORIES OF INTELLIGENCE & THE BINET SCALES
Handout #5: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III
DEFINING INTELLIGENCE
• Alfred Binet: “The tendency to take and maintain a definite direction; the capacity to
make adaptations for the purpose of attaining a desired end, and the power of engaging in
self-criticism so that necessary adjustments in strategy can be made.”
• Charles Spearman: “The ability to educe either relations or correlates.”
• Freeman (1955): “Adjustment or adaptation of the individual to his total environment,”
“ability to learn,” and “the ability to carry on abstract thinking.”
• Das (1973): “The ability to plan and structure one’s behavior with an end in view.”
• Howard Gardner: “The ability to resolve genuine problems or difficulties as they are
encountered.”
• Sternberg (1986, 1988): “Mental activities involve in purposive adaptation to, shaping of,
and selection of real-world environments relevant to one’s life.”
• Anderson (2001): “Intelligence is two-dimensional and based on individual differences in
information-processing speed and executive functioning influenced largely by inhibitory
processes.
4. Compute the difficulty index (p) of the item by using the formula below:
p= pu + p l
2 DECISIONS FOR EACH ITEM
General Objective:
This test measures the knowledge of the students in the subject Mathematics
focused on the following topics: The Need to Respire, Punnett Square, Bond
Formation, Avogadro’s number, Global Warming, Beyond Our Solar System,
Biomimicry, and Force, Motion, and Energy.
Specific Objectives:
KNOWLEDGE COMPREHENSION
REQUIRE QUOTE ASSOCIATE PREDICT
COUNT READ CLASSIFY REWRITE
DEFINE RECALL COMPARE TRANSLATE
DRAW RECOGNIZE COMPUTE
IDENTIFY RECITE CONTRAST
INDICATE RECORD CONVERT
LABEL REPEAT DESCRIBE
LIST STATE DIFFERENTIATE
MATCH TABULATE DISCUSS
NAME TRACE DISTINGUISH
OUTLINE WRITE ESTIMATE
POINT EXTRAPOLATE
APPLICATION ANALYSIS
APPLY CALCULATE ANALYZE
CHANGE CLASSIFY CONSTRUCT
COMPLETE DEMONSTRATE DETECT
DISCOVER EMPLOY DIAGRAM
EXAMINE ILLUSTRATE DIFFERENTIATE
MANIPULATE OPERATE EXPLAIN
PRACTICE PREPARE INFER
PRODUCE RELATE OUTLINE
SOLVE USE SEPARATE
UTILIZE SUBDIVIDE
SUMMARIZE
SYNTHESIS EVALUATION
ARRANGE CATEGORIZE APPRAISE RECOMMEND
COMBINE CONSTRUCT ASSESS SELECT
CREATE DESIGN COMPARE SUPPORT
DEVELOP EXPLAIN CRITIQUE TEST
FORMULATE GENERATE DETERMINE
GENERALIZE INTEGRATE EVALUATE
ORGANIZE PLAN GRADE
PREPARE PRESCRIBE JUDGE
PRODUCE PROPOSE JUSTIFY
REARRANGE RECONSTRUCT MEASURE
SPECIFY SUMMARIZE RANK
Topic: THE CESD-R SCALE
Handout #2: Psychological Testing (Lab)
Instructor: Joselito Miguel C. Sibayan III
‣ Thinking/concentration: Items 3, 20
Nearly every
Not at all or
1 - 2 days 3 - 4 days 5 - 7 days day for 2
Less than 1 day
weeks
Items 6, 19, & 20 3 3 2 1 0
DETERMINING CATEGORIES
The determination of possible depressive symptom category is based upon an algorithm with
the following logic:
1. Meets criteria for major depressive episode: Anhedonia or dysphoria nearly everyday
for the past 2 weeks, plus symptoms in an additional 4 DSM symptom groups.
2. Probable major depressive episode: Anhedonia or dysphoria nearly everyday for the
past 2 weeks, or 5-7 days in the past week, plus symptoms in an additional 3 DSM
symptom groups.
3. Possible major depressive episode: Anhedonia or dysphoria nearly everyday for the
past 2 weeks, or 5-7 days in the past week, plus symptoms in an additional 2 DSM
symptom groups.
4. Sub-threshold depressive symptoms: People who have CESD-style score of at least 16,
but do not meet above criteria.
5. No clinical significance: People who have a total CESD-style score less than 16 across all
20 questions.
Center for Epidemiologic Studies Depression Scale – Revised (CESD-R)
Last week
I felt depressed.
_______________________________________
Signature above printed name (optional)
Topic: THE BECK ANXIETY INVENTORY
Handout #3: Psychological Testing (Lab)
Instructor: Joselito Miguel C. Sibayan III
CLINICAL USE
• The BAI was specifically designed as “an inventory for measuring clinical anxiety” that
minimizes the overlap between depression and anxiety scales.
• While several studies have shown that anxiety measures, including the State-Trait Anxiety
inventory (STAI), are either highly correlated or indistinguishable from depression, the BAI
is shown to be less contaminated by depressive content.
All questions 0 1 2 3
INTERPRETATION
• An indication of low anxiety is usually a good thing. However, it is possible that you might
be unrealistic in either your assessment which would be denial or that you have learned to
“mask” the symptoms commonly associated with anxiety.
• In an indication of moderate anxiety, your body is trying to tell you something. Look for
patterns as to when and why you experience the symptoms described in the inventory. You
may have some conflict issues that need to be resolved.
• Persistent and high anxiety is not a sign of personal weakness or failure. It is, however,
something that needs to be proactively treated or there could be significant impacts to you
mentally and physically. You may want to consult a physician or counselor if the feelings
persist.
Beck Anxiety Inventory (BAI)
Below is a list of common symptoms of anxiety. Please carefully read each item in the list.
Indicate how much you have been bothered by that symptom during the past month, including
today, by circling the number in the corresponding space in the column next to each symptom.
Moderately -
Mildly, but it Severely - it
it wasn’t
Not at all didn’t bother bothered me
pleasant at
me much a lot
times
Numbness or tingling 0 1 2 3
Feeling hot 0 1 2 3
Wobbliness in legs 0 1 2 3
Unable to relax 0 1 2 3
Dizzy or lightheaded 0 1 2 3
Unsteady 0 1 2 3
Terrified of afraid 0 1 2 3
Nervous 0 1 2 3
Feeling of choking 0 1 2 3
Hands trembling 0 1 2 3
Shaky / unsteady 0 1 2 3
Difficulty in breathing 0 1 2 3
Fear of dying 0 1 2 3
Scared 0 1 2 3
Indigestion 0 1 2 3
Faint / lightheaded 0 1 2 3
Face flushed 0 1 2 3
_______________________________________
Signature above printed name (optional)
Topic: CORRELATION AND REGRESSION
Handout #6: Psychological Testing (Lec)
Instructor: Joselito Miguel C. Sibayan III
BASIC CONCEPTS
• The American Psychological Association’s Task Force on Statistical Inference has suggested
that visual inspection of data is an important step in data analysis.
• A scatter diagram is a picture of the relationship between two variables. An example of a
scatter diagram is shown in Figure 6.1, which relates scores on a measure of anger for
medical students to scores on the CESD-R.
30
25
20
CESD-R (Y)
10 20 30 40 50
Anger Inventory (X)
• The axes in the figure represent the scales for two variables. Values of X for the anger
inventory are shown in the horizontal axis, and values of Y for the CESD-R are on the
vertical axis.
• A correlation coefficient is a mathematical index that describes the direction and
magnitude of a relationship. Figure 6.2 shows three different types of relationships
between variables.
‣ A positive correlation means that when X increases, Y also increases.
✓ Higher scores in CESD-R correlates with higher scores in the anger inventory.
✓ Number of basketball games and fatigue: Players grow more tired the more
they play basketball.
‣ A negative correlation means that as X increases, Y decreases.
✓ Barbiturates and amount of activity: The higher the drug dose, the less
active the patients are.
‣ No correlation occurs when X and Y are not related.
✓ Shoe size and IQ is an example of a relationship that would lack correlation.
Y Y Y
X X X
CORRELATION COEFFICIENTS
• The Pearson product moment correlation, also Pearson r, is the most common measure of
correlation, and it measures the linear relationships between two continuous variables.
• Spearman rho is a method of correlation for finding the association between two sets of
ranks. The rho coefficient (p) is easy to calculate and is often used when the individuals in
a sample can be ranked on two variables but their actual scores are not known.
• The biserial correlation expresses the relationship between a continuous variable and an
artificial dichotomous variable.
‣ This might be used to assess a relationship between passing or failing the bar
examination (artificial dichotomous) and GPA in law school (continuous).
• The point biserial correlation expresses the relationship between a continuous variable
and a true dichotomous variable.
‣ Gender (true dichotomous) and intelligence (continuous).
• Phi coefficient is used when both variables are dichotomous and at least one of the
dichotomies is “true”.
‣ Relationship between passing or failing the bar exam (artificial) and gender (true).
• Tetrachoric correlation is used when both variables are of the same type of dichotomy.
‣ Relationship between owning a smartphone (you only either own or not own a
smartphone) and gender
TERMS & ISSUES IN THE USE OF CORRELATION
• The standard error of estimate is the standard deviation of the residuals, but instead of
using one degree of freedom, two will be used (N — 2), since there are two constants (a
and b). It is a measure of accuracy of prediction—more accurate when the value is smaller.
• Coefficient of determination tells us the proportion
Explained by
of the total variation on the scores on Y that we know SAT scores
as a function of information about X. Square the
correlation coefficient in order to get the coefficient 18%
Not explained
of determination. For example, if the correlation by SAT scores
coefficient between GPA and SAT scores is 0.42, its
determination coefficient is 0.18. This suggests that 82%
18% of the variance in the GPA scores are affected by
SAT scores. This is represented in a pie chart.
• The coefficient of alienation is a measure of non-
association between two variables. This is calculated as √1—r2, where r is the coefficient of
determination. For example, if the correlation coefficient between CESD-R and anger
inventory is 0.67, then its alienation coefficient is 0.57.
• Shrinkage is the amount of decrease observed when a regression equation is created for
one population and then applied to another. Say a regression equation is developed to
predict first-year college GPAs on the basis of SAT scores. Although the proportion of
variance in GPA might be fairly high for the original group, we can expect a smaller
proportion when the equation is used to predict GPA in the next year’s class.
• A third variable refers to any other possible explanations for the observed relationship
between two variables. For instance, if you study the relationship between television
viewing and aggressive behavior, other possible explanations for aggressive behavior might
include poor social adjustment, or some childhood trauma.
• Multivariate analysis considers the relationship among combinations of three or more
variables. For example, the prediction of success in the 1st year of college from the linear
combination of SAT verbal and quantitative scores is a problem for multivariate analysis.
• Discriminant analysis is a technique with an objective to assess the adequacy of a
classification, given the group memberships; or to assign objects to one group among a
number of groups, and whether significant differences exist among them. It can be used to
understand the characteristics of a customer possessing store loyalty and a customer who
does not have store loyalty.