0% found this document useful (0 votes)
107 views14 pages

B4T4 Utility Written Report

The document discusses the concept of utility as it relates to psychological testing. It defines utility as the usefulness or practical value of a test. Three key factors that affect a test's utility are discussed: psychometric soundness, cost, and benefits. The document also provides an overview of how utility analysis is conducted, which involves a cost-benefit analysis to evaluate the practical worth of an assessment tool and help determine where it can be most useful. Methods for conducting utility analysis, such as using expectancy data and Taylor-Russell tables, are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views14 pages

B4T4 Utility Written Report

The document discusses the concept of utility as it relates to psychological testing. It defines utility as the usefulness or practical value of a test. Three key factors that affect a test's utility are discussed: psychometric soundness, cost, and benefits. The document also provides an overview of how utility analysis is conducted, which involves a cost-benefit analysis to evaluate the practical worth of an assessment tool and help determine where it can be most useful. Methods for conducting utility analysis, such as using expectancy data and Taylor-Russell tables, are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

PAMANTASAN NG LUNGSOD NG MAYNILA

(University of the City of Manila)

Gen. Luna cor. Muralla St., Intramuros, Manila

College of Science

DEPARTMENT OF PSYCHOLOGY

Written Report for the Subject:

PSY 3202 - PSYCHOLOGICAL TEST DEVELOPMENT AND MEASUREMENT

Topic:

UTILITY

Submitted to:

Prof. Ma. Dolores Correa-Patag

Submitted by:

Alonso, Christina Lee DC.


Comayas, Gwynette S.
Mallo, Bettina Alessandra B.
Nuñez, Ian Karlo B.
Umali, Frances Claire B.
PSY 3-4

March 2024
UTILITY

CONTENT OUTLINE

I. The Concept of Utility


● Factors that affect a Test’s Utility
○ Psychometric Soundness
○ Cost
○ Benefits
II. Utility Analysis
● Concept of Utility Analysis
● How is Utility Analysis Conducted?
● Some Practical Considerations
III. Methods for Setting Cut Scores
● The Angoff Method
● The Known Groups Method
● IRT-Based Methods
● Other Methods
Defining Utility
Discussant: Frances Claire B. Umali

I. THE CONCEPT OF UTILITY

What is utility?
● In the context of testing and assessment, utility is referred to as the usefulness or
practical value of employing a test that offers improved efficiency.

A. Factors that Affect a Test's Utility


1. Psychometric Soundness
● Psychometric soundness refers to and has the characteristics of reliability and
validity of a test.
● Difference between an index of utility and an index of reliability or validity.
○ Index of Reliability - how consistently the test measures what it measures
○ Index of Validity - how accurately the test measures what it claims to
measure.
○ Index of Utility - provides information about the practical value or
usefulness of the information obtained from the test score.
● It has the general rule that "The higher the criterion-related validity of test scores
for making a particular decision, the higher the utility of the test is likely to be.".
However, there are exceptions that need to be considered due to its many factors
that contribute to the test's utility including the behavior of the targeted
test-takers and the targeted test users.
● An example of how a valid assessment tool may lack utility:
○ Researchers wanted to monitor 63 opiate-dependent volunteers who were
seeking treatment by developing a patch that could detect cocaine use
through their sweat that is continuously worn instead of doing the
commonly used test, which is the urine test. It was found that the
participants had 92% level of agreement between a positive urine test and a
positive test with the sweat patch. However, researchers felt compelled to
conclude that the sweat patch had limited utility, since there is correct
application of the patch.

2. Cost
● The cost is often referred to as the money and this is the most basic elements in
any utility analysis, specifically the financial cost of the selection of the device
used in the study. The term cost in the context of test utility is more on the side
of its disadvantages, losses or expenses in both economic and non-economic
terms.
● On the other hand, in terms of test utility decisions, cost can be interpreted
traditionally which is in an economic sense where it relates to the expenditure
associated with testing or not.

Funds will be allocated for the test purchase if the test needs to be
conducted including…

1. a particular test;
2. supply of blank test protocols and;
3. computerized test processing, scoring and interpretation service.

Other costs of testing may include

1. payments to professional personnel and staff associated with test


administration, scoring, and interpretation;
2. facility rental, mortgage, and/or other charges related to the usage of the
test facility and;
3. insurance, legal, accounting, licensing, and other routine costs of doing
business.

○ However, these costs for private clinics are covered by the fees charged to
test-takers while for research organizations, the costs will be paid from the
test user's funds that will then originate from sources of private donations
or government grants.
● In economic costs, an example is the commercial airline facing high fuel costs
who decides to save money in the method of cross-cutting. There will be
potential consequences even though it reduces expenditure or saves money as it
may lead to significant losses in terms of trust from the customer, revenue and
potential safety-related incidents.
● For noneconomic costs, example is the utility of four X-ray pictures as compared
to two X-ray picture to detect fractured ribs of child abuse victims. According to
Hansen (2008) that the four-view X-ray has a better identification of structure
than then two-view X-ray. The researchers recommended adopting the enhanced
protocol despite the additional financial costs to better detect abuse.

3. Benefit
● The benefit is referred to as the profit, gains of advantages that can be viewed in
economic and noneconomic terms that are associated with testing.
● In economic terms, an example is the implementation of a new test to select
employees who are more productive resulting an increase productivity among the
employees that leads to the greater overall company profit
● Some of the noneconomic benefits in an industrial setting are…
○ Increase in quality and of workers' performance
○ Decrease in time needed to train workers
○ Reduction in the number of accidents
○ Reduction in worker turnover

Utility Analysis
Discussant: Bettina Alessandra B. Mallo, Gwynette S. Cobayas, & Ian Karlo B. Nuñez

II. UTILITY ANALYSIS

A. The Concept of Utility Analysis

Utility analysis is a collection of techniques that involve a cost-benefit analysis, aimed at


producing information that is significant and relevant to a decision regarding the usefulness
and practical worth of an assessment tool. Utility analysis is not a single technique used for
only one particular goal, but rather it is a term for several potential methods, each having
different input data requirements and producing different results.

Some utility analyses are more straightforward and easy to comprehend in terms of
answers to relatively simple questions, while others are quite complex, utilizing complex
mathematical models and intricate weighting schemes for the various variables being
considered. For example, while developing a new diagnostic test, researchers may employ
statistical techniques to evaluate large data sets and simulate the test's sensitivity, specificity,
and predictive values across different patient populations

Utility analysis may help make informed decision regarding:


● Choosing between various testing options
● Determining where the test is most useful and/or beneficial
● Identifying the needs of improvement of a test

In simplified terms, utility analysis facilitates the selection of the best assessment tool
among various alternatives by considering both the costs and benefits.

● Should we invest in this test?

B. How is a Utility Analysis Conducted?


1. Expectancy Data
● Expectancy table
○ Predicts the likelihood of test takers scoring within a particular range on
the predictor will demonstrate successfully on the criterion.
○ Displays the probability of success or failure at different score levels,
usually categorized as "passing," "acceptable," or "failing".
○ Particularly useful where the emphasis is on whether or not individuals
have reached predetermined performance standards rather than how they
compare to others.
○ For example, the expectancy table helps university admissions officers
make decisions about which students to admit based on their CAT scores
and the likelihood of academic success associated with those scores.

Advantages:
➢ Easy to understand and apply.
➢ Provides a straightforward way to interpret scores and predict
outcomes.
➢ Significantly aid in decision-making processes, particularly
concerning individuals or groups scoring within a specific range on
a predictor.
Limitations:
➢ Oversimplifies the evaluation as it dichotomizes performance into
success and failure.
➢ Focuses on predicting outcomes without considering the cost of
testing.

● Taylor-Russell tables
○ Developed by H.C. Taylor and J.T. Russell in 1939.
○ A method for evaluating the validity of a test in relation to the amount of
information it contributes beyond the base rate. In other words, the tables
provide an estimate of the percentage of employees recruited through the
use of a specific test who will demonstrate success in their respective jobs.
○ To use Taylor-Russell tables, the following information must be present:
■ Definition of success: Success must be clearly defined by
dichotomizing some outcome variable. For example, a general
weighted average of 2.0 or better may be defined as success in
college and those below 2.0 may be defined as failures.
■ Determination of base rate: Base rate is defined as the percentage
of current people hired who are considered successful.
■ Definition of selection ratio: Selection ratio is the percentage
indicating the relationship between the number of individuals
intended for hiring and the pool of available candidates for
employment.
● For example, 50 available position and 500 applicants
○ Selection ratio = Number of available position /
Number of Applicants
= 50 / 500 or 0.1
■ Determination of validity coefficient: Validity coefficient is the
correlation of the test with some criterion. For example, measure of
work quality.

Table 1. Taylor-Russell Table for a Base rate of .60

Advantages:
➢ Easy to utilize and provide valuable information into the
relationships between selection ratio, criterion-related validity, and
existing base rate
➢ Facilitates decision-making by quantifying the impact of different
selection procedures on hiring success.
➢ Compares the utility of various tests for the same purpose.
Limitations:
➢ Assumes a linear relationship between predictor scores and
criteria.
➢ Identifying a criterion value to separate successful from
unsuccessful performance can be challenging.
➢ Does not directly indicate the likely average increase in
performance.
➢ Similar to the expectancy table, it dichotomizes performance into
successful and unsuccessful categories.

● Naylor-Shine tables
○ Provides a logical way to assess the contribution of an assessment tool
within the context of established procedure for selection or evaluation.
This involves obtaining the difference between the means of the selected
and unselected groups. By identifying this difference, organizations can
determine additional value provided by the test beyond existing selection
methods.
○ In Naylor-Shine tables, utility is defined in terms of the increase in mean
criterion performance (ex. Final GPA),given the predictive validity of the
selection procedure and the selection ratio.
○ Purposes of the Naylor-Shine model:
■ Estimate the increase in mean criterion performance based on a
certain selection procedure
■ Determine a cutoff value for the predictor to meet a desired level
of mean criterion performance in the selected group.

Advantages:
➢ Provides information needed to use the Brogden-Cronbach-Gleser
utility formula.
➢ Avoids dichotomizing criterion performance and consider the
non-linear relationship between predictors and criteria.
➢ Offer a clear framework for communicating the meaning of test
scores to stakeholders, aiding in decision-making processes.
➢ Can be used for showing average performance gain or determining
the selection ratio needed to achieve a particular performance gain.
Limitations:
➢ Overestimate utility unless top-down selection is employed.
➢ These tables express utility in terms of standardized performance
gains.
➢ Does not address financial aspects such as testing costs.

2. The Brogden-Cronbach-Gleser Formula


● It is referred to as the BGC formula, a pivotal tool used in decision-making
processes, particularly in selection procedures.
● Developed by Hubert E. Brogden and further refined by Cronbach and Gleser.
● In general, utility gain refers to an estimate of the benefit (monetary or otherwise)
of using a particular test or selection method.
○ utility gain = (N)(T)(rxy)(SDy)(Zm)−(N)(C)
● By the way, a modification of the BCG formula exists for researchers who prefer
their findings in terms of productivity gains rather than financial ones.
● Productivity gain refers to an estimated increase in work output.
○ The result is a formula that helps estimate the percent increase in output
expected through the use of a particular test.
● The revised formula is: productivity gain = (N)(T)(rxy)(SDp)(Zm)−(N)(C)

3. Decision theory and Test utility


● Cronbach and Gleser's seminal work in Psychological Tests and Personnel
Decisions (1957, 1965) provided a framework for applying statistical decision
theory to questions of test utility.
● This involved the following key aspects in the domain of test utility:
a. Evaluation of Test Utility: Decision theory assesses how effectively
psychological tests achieve their goals by considering factors like
prediction accuracy, relevance to measured criteria, and overall benefit.

b. Optimal Decision-Making: Using test findings to inform decisions, such


as determining cutoff scores, requires careful consideration of acceptable
rules and techniques, which are guided by decision theory.

c. Costs and Trade-offs: In order to identify the most desirable result,


decision theory analyzes the costs and trade-offs of various error kinds,
such as false positives and false negatives.

d. Adaptive Strategies: In order to maximize test utility over time, decision


theory promotes adaptable strategies that can be modified in response to
new knowledge or conditions.

e. Risk Management: Decision theory helps identify and manage risks


related to test use, such as misclassification and biases, to mitigate
potential adverse consequences.
● Decision theory provides guidelines for setting optimal cutoff scores.
● Empirical studies, such as Schmidt et al. (1979), have demonstrated the tangible
benefits of using valid tests in selection processes.

Implications and Challenges:


● Despite the promise of decision theory approaches, widespread adoption in hiring
practices has been limited due to their complexity and potential legal challenges.

C. Some Practical Considerations

The Pool of Job applicant


● There are certain jobs, however, that require such unique skills or demand such
great sacrifice that there are relatively few people who would even apply, let alone
be selected.
● For a particular type of position in the pool of job applicants, it may vary within
the economic climate.
● It may be that in periods of high unemployment there are significantly more
people in the pool of possible job applicants than in periods of high employment.
○ Closely related to issues concerning the available pool of job applicants is
the issue of how many people would actually accept the employment
position offered to them even if they were found to be a qualified
candidate.

The complexity of the job

● In general, the same sorts of approaches to utility analysis are put to work for
positions that vary greatly in terms of complexity.
● The same sorts of data are gathered, the same sorts of analytic methods may be
applied, and the same sorts of utility models may be invoked for corporate
positions ranging from assembly line worker to computer programmer.
● Yet as Hunter et al. (1990) observed, the more complex the job, the more people
differ on how well or poorly they do that job. Whether or not the same utility
models apply to jobs of varied complexity, and whether or not the same utility
analysis methods are equally applicable, remain matters of debate.

The cut scores in use


● Also called a cutoff score, we have previously defined a cut score as a (usually
numerical) reference point derived as a result of a judgment and used to divide a
set of data into two or more classifications, with some action to be taken or some
inference to be made on the basis of these classifications.
● In discussions of utility theory and utility analysis, reference is frequently made to
different types of cut scores.
● For example, a distinction can be made between a relative cut score and a fixed
cut score.
○ A relative cut score may be defined as a reference point- in a distribution
of test scores used to divide a set of data into two or more
classifications-that is set based on norm- related considerations rather than
on the relationship of test scores to a criterion.
○ Because this type of cut score is set with reference to the performance of a
group (or some target segment of a group), it is also referred to as a
norm-referenced cut score.
○ In contrast to a relative cut score is the fixed cut score, which we may
define as a reference point-in a distribution of test scores used to divide a
set of data into two or more classifications-that is typically set with
reference to a judgment concerning a minimum level of proficiency
required to be included in a particular classification. Fixed cut scores may
also be referred to as absolute cut scores.
● A distinction can also be made between the terms multiple cut scores and multiple
hurdles as used in decision making processes.
○ Multiple cut scores refers to the use of two or more cut scores with
reference to one predictor for the purpose of categorizing test takers.
○ At every stage in a multistage (or multiple hurdle) selection process, a
cut score is in place for each predictor used. The cut score used for each
predictor will be designed to ensure that each applicant possesses some
minimum level of a specific attribute or skill. In this context, multiple
hurdles may be thought of as one collective element of a multistage
decision-making process in which the achievement of a particular cut
score on one test is necessary in order to advance to the next stage of
evaluation in the selection process.

Compensatory model of selection

● In what is referred to as a compensatory model of selection, an assumption is


made that high scores on one attribute can, in fact, "balance out" or compensate
for low scores on another attribute. According to this model, a person strong in
some areas and weak in others can perform as successfully in a position as a
person with moderate abilities in all areas relevant to the position in question.

Methods for Setting Cut Scores


Discussant: Christina Lee D.C. Alonso

In order to determine the cut score for predictors, test developers make use of either
subject-matter experts, or data from a representative sample which is applied in the different
methods that are most efficient to them or the decision-makers.

A. The Angoff Method (William Angoff, 1971)


● Experts are tasked to give an estimate of the response to an item in regards to the
least competence needed for the particular trait, attribute or ability being scored.
The estimate of all experts are average and undergo deliberation as to whether
these same experts agree on the final cut score.

Advantage: Simple technique.


Limitation: Inter-rater reliability and expert disagreement on responses of certain
populations of tetstakers.

B. The Known Groups Method (method of contrasting groups)


● The groups commonly used in this method are those that possess, and do not
possess the particular trait, attribute, or ability of interest. The score at the point of
least difference between the two groups will be used as the cut score.

Example: Determining the cut scores of incoming first year undergraduate


students that need to take remedial Biology before taking college-level
biology. The two groups to be used to determine the cut score are: (1)
students who have completed college-level biology, and (2) students who
have failed college-level biology.
Limitation: Absence of a set standard in determining contrasting groups,
including the degree of an attribute as complex as mental disorders.

C. IRT-Based Methods
Based on classical test score theory where testtakers total scores are observed alongside
items that need to be “correct” in order for them to be regarded as those that possess the
trait or attribute. This is further based on the item response theory (IRT) that attempts to
uncover the relationship between unobservable traits, attributes, or abilities and the
response of a subject, however in setting cut stores, experts also associate a certain level
of difficulty per item.

a. Item-mapping method
● Experts need to be trained in estimating the minimal competence required
for the trait or attribute being scored. As they will be arranging items using
a histogram where each column in the histogram will be containing items
that are of equivalent value in accordance to whether future testtakers with
minimal competence will be able to answer that item correctly. In this
regard, the difficulty level shall then be the cut score.
● The process can involve several rounds of judgment and feedback from
the same or different expert ratings until the appropriate difficulty level
has been selected.
b. Bookmark method
● Experts will first be trained in order to be knowledgeable on the minimal
competence required for the knowledge, skill, and/or ability required of
future testtakers. As they will be handed a book that contains one item per
page, arranged in ascending order of difficulty. The experts are tasked to
place a bookmark in between the page that separates future testtakers in
accordance to the minimal competence required.
● Additional rounds of bookmarking and feedback can occur however the
level of difficulty to use as the cut stories relies on the decision of the
test-developers themselves.

Concerns: Training received by experts, floor and ceiling effects, and the
optimal length of the item booklet.

D. Other Methods
a. Decision-theoretic approach (Hambleton & Novick, 1973)
Additional information: A theory commonly used in economics wherein a set of
actions, and a loss function quantifies the value to the decision-maker, (Hirano,
2010). In determining cut scores, the approach is used in miss rates as they are
determined to be a loss of function in order to determine the cut score that either
minimizes miss rates and/or maximizes utility, (de Gruijter & Hambleton, 1984)
b. Method of predictive yield (R. L. Thorndike, 1949)
Made use of a norm-referenced method that considers the following personnel
selection: number of positions available, estimations of the likelihood of offer
acceptance, and the score distribution of applicants.
c. Discriminant analysis or discriminant function analysis
Makes use of different yet related statistical techniques in order to identify the
relationship of scores (in battery tests), and two naturally occurring groups that
contrast one another.

REFERENCES

Cohen, R. J., & Swerdlik, M. E. (2017). Psychological testing and assessment (9th ed.).
McGraw-Hill Education.

de Gruijter, D. N. M., & Hambleton, R. K. (1984). On de Encountered Using Decision Theory to


Set Cutoff Scores. Applied Psychological Measurement, 8(1), 1–8.
https://doi.org/10.1177/014662168400800101

Hirano, K. (2010). Decision Theory in Econometrics. Palgrave Macmillan UK EBooks, 29–35.


https://doi.org/10.1057/9780230280816_5

‌Kaplan, R., & Sacuzzo, D. (2018). Psychological Testing (9th ed.). Cengage Learning

You might also like