Study Notes Psyc 469
Study Notes Psyc 469
Learning Objectives
Early 20th century-alfred binet designed testing to help assign children to their level class-france looked to improve its
educational standard to live up to its inscription of “liberté, égalité et fraternité”…education for all-Binet-Simon were
commissioned by state toformulate testing that could predict studentss that will require special education so they can
receive intensified education from the start-these became know as intellignece tests (IQ)-this initiative then
influenced other countries including england, america and others-psyc testing was then produced for military use to
predict individuals that can properly deal with the stresses and reality of war (WWI & WWII)-William Stern then
wrote ethical safeguard protocol for use of assesments.
2. Explain the difference between a psychological test user and a psychological assessor.
A psychological tester is adept to administering a test and following all protocols established to ensure that the data
collect is precise and accurate. A psychological assessor can administer testing but beyond that, is trained to use the
data to answer the referral question. Furthermore, the assessor will also use their observation and background
knowledge to synthesize, generalize and conclude on the client's general profile based on the data collected. The
assessor will also direct the assessment by prescribing the required tests to be administered and possibly add more
testing during the assessment period based on the data and his/her professional judgement.
Referral-referral question(s) put forth-assessor could meet with assessee b4 formal assessment-selects the tools and
test he will use (at times, research is required to find the best tools possible)-administering of the tests-writing of the
report to answer the referral question-feedback session with assessee or interested third party.
6. Explain the difference between traditional psychological evaluations and a therapeutic psychological
assessment.
In a traditional psyc assessment, the assessment is designed to have a precis purpose which is to answer ans clarify
the referral question. A therapwutic psychological assessment aims to support and help the client throughout the
assessment and through the assessment. Feedback and collaborative dialogue btw the assessor and assessee is
continuously and they co-develop an interpretation of the data as they collaboratively decide on the treatment plan.
What is said, how it is being said, what is not said as well, the non-verbal (body language, movements, facial
expression, eye contact, willingness to cooperate, reaction to the interview setting), physical appearance,
Helps make informed decisions about employment (hiring, firing, advancement, placement). Motivational
interviewing is used in clinical assessment to gather info about the problematic issues or behaviours while trying to
address them simultaneously. Being that the interview process is an interactive one, it is also an opportunity to
position someone in multiple scenarios, depending on the interviewers abilities.
10. Describe instances in which case history data may be useful.
11. List pros and cons of computer-assisted psychological assessment.
12. List some of the factors which may be affecting test-taker performance.
13. Describe the types of evaluations used in each of the settings mentioned in the textbook (i.e.,
educational, clinical, counselling, geriatric, business and military, governmental and organizational,
academic research settings).
Key Terms
o Psychological assessment
o Psychological testing
o Referral question
o Tool/instrument selection
o Cut score/Cutoff score
o Psychometrist
o Psychometrics
o Utility
o Test developer
o Test user
o Test taker
o Protocol
o Rapport
o Accommodation
o Psychological test
o Interview
o Portfolio
o Case history data
o Behavioural observation
o Role play
o Computers
Learning Objectives
1. Describe the significance of competition testing for civil service jobs in ancient China.
The Sui dynasty created the imperial examination system in order to screen candidates for govt positions. The tests
included ability to read, write and calculate, proficiency in geography, agriculture, military strategy, moral and
physical prowess.
He became proficient in measuring and assessment. He aspired to classify ppl based on their talents and individuality
as well as the deviation from average. He studied heredity of sweet peas and then of humans. He coined the concept
of coefficient of correlation technique.
3. Compare and contrast Galton’s and Wundt’s perspectives on the assessment of individuals.
In contrast to Galton, Wundt focused on how ppl were similar, not different. He disliked the difference as it would
mess his experiment.
He coined the term of “mental test” and brought assessment idea to america. He was a founding member of APA.
Binet researched the measuring of memory and social ability as a prelude to measuring intelligence. In collaboration
with colleague Simon, he eventually designed 30 points assessment tool to measure intelligence (IQ test)
Binet devised a group intelligence test which was used to recruit and assign position of military personnel in the two
WWs.
Pros-ppl are the best positioned to speak about themselves. Cons-ppl can at times be unaware or choose not to
divulge and expose themselves.
8. Understand how culture and language can affect individuals’ performance on psychological tests.
Culture and language influences what a society considers appropriate in behaviour and in thinking. Considering this,
diverse society may respond differently to questions on a standardized assessment. Psychologists have become more
sensitive about the way they word the tests because of this.
10. Explain the steps used by today’s test developers to ensure tests used are suitable for many
populations.
11. Explain how the nature-nurture debate applies to psychological assessment.
12. Describe the assessment-related issues of communication in a cultural context.
13. Describe how being from an individualist vs. a collectivist culture may affect test scores.
14. Distinguish between ethics, code of ethics, and standard of care.
15. Summarize the public’s opposition to standardized testing over the years.
16. Summarize the three levels of test-user qualifications.
17. Describe the challenges related to testing people with disabilities.
18. List and summarize the four rights of test-takers.
Key Terms
o Self-report
o Projective tests
o Culture
o Eugenics
o Culture-specific tests
o Individualist vs. collectivist culture
o Affirmative action
o Ethics
o Code of professional ethics
o Standard of care
o Hired guns
o Informed consent
o Confidentiality
Key People
Learning Objectives
A standardized test typically includes components like instructions, test items, answer sheets, scoring rubrics, and a
manual for administration and interpretation. It aims to ensure uniformity in testing conditions and evaluation
across all test takers.
Standardized testing will typically have norms by which to compare the results. These norms are developed based on
a sample of the population or the complete population itself.
A purposive sample is deliberately chosen based on specific characteristics or criteria relevant to the research
objective. In contrast, a convenience sample is selected based on ease of access, often resulting in a less deliberate or
systematic representation of the population.
4. Explain how a raw score is converted to a percentile and explain what a score at the 87th percentile
means.
The raw scores would be put in order of smallest to biggest and divided into 100 segments. Each segment is a
percentile. The 87th percentile corresponds to the the 87th segment or score when beginning from the lowest
percentile.
Do-be sensitive to the fact that culture impacts how concepts and words are understood and an individual’s response
to a test item. In cases where culture can impact testing, it may be preferable to use a testing that has less cultural
biases. At times, adjusting the language to the equivalence in another culture would be necessary tortilla-bread).
Likewise, scoring and interpreting the data in cultural context is important.
A construct score is the measurement of the theoretical construct one is trying to measure (such as depression). A
true score is connected to the measurement tool. It represents the score on a particular pest void of measurement
error.
7. Provide two examples of both random error and systematic error.
Random error can be weather, the buzzing of a light or a smell in the room.
Systematic error are more consistent as they will impact scores consistently. They can include error in administering
the test or the manner of scoring.
8. Discuss how test administration and test scoring/interpretation can affect reliability.
Reliability talks of the consistency in which the tool will measure a defined trait, state or other. The tool has been
designed in a very specific manner and is menat to be used in that way only. If I consider 1 inch amark to be 1 cm
mark, the measures would all be wrong and inconsistent to the norms they are compared to.
Test-retest is used to test the stability of a measure and requires 2 sessions. The statistical procedure to calculate the
error variance is pearson r or Spearman rho. In contrast, internal consistency test the extent of which items on test
relate to each other and are equivalent. It requires only one testing session. The source of error variance is tested
Pearson r between equivalent test halves with Spearman Brown correction or Kuder-Richardson for dichotomous
items, or coefficient alpha for multipoint items.
10. Define reliability, including potential sources of reliability coefficient and standard error of
measurement.
Reliability in testing refers to the consistency and stability of measurement results. The reliability coefficient indicates
the degree to which a test yields consistent scores. Potential sources of reliability include internal consistency, test-
retest stability, and inter-rater agreement. The standard error of measurement reflects the extent to which an
individual's true score may vary from their observed score due to measurement error.
11. Explain the difference between the reliability coefficient and standard error of measurement.
The reliability coefficient measures the consistency and stability of scores from a test, indicating how well it reliably
measures the construct. In contrast, the standard error of measurement quantifies the expected variability or margin
of error in an individual's observed score due to measurement inaccuracies, providing an estimate of score precision.
While the reliability coefficient is a measure of consistency, the standard error of measurement gauges the potential
imprecision in individual scores.
12. Describe four ways in which the nature of the test can affect reliability.
1. The test items are homogeneous or heterogeneous. 2. What is being measured is static or dynamic.
3. The range of scores are restricted or not. 4. Is it a power or speed test. 5. Whether the test is
criterion based or not
Key Terms
o Norms
o Normative sample
o Norming
o Standardization/test standardization
o Sample
o Sampling
o Stratified sample
o Purposive sample
o Convenience sample
o Percentile
o Criterion-referenced tests
o Reliability coefficient
o Measurement error
o Variance
o True variance
o Error variance
o Reliability
o Random error
o Systematic error
o Test-retest reliability
o Internal consistency
o Spearman-Brown formula
o Split-half reliability
o Inter-scorer reliability
o Classical test theory
o Standard error of measurement
Unit 4: Validity
Learning Objectives
The extent to which the test measures what it purports to measure. It is a judgement based on evidence about the
inferences taken from the results.
2. Compare and contrast three main types of validity evidence (content, criterion, and construct) and
identify examples of how each type is established.
Content validity involves ensuring that a test adequately covers the intended content. It is established through expert
reviews and subject matter analysis. Criterion validity assesses how well a test predicts a specific criterion, either
concurrently or predictively. For example, a hiring test's criterion validity might be established by correlating scores
with job performance. Construct validity evaluates whether a test measures the theoretical construct it claims to
measure. It's established through convergent and discriminant validity, where correlations with related and unrelated
measures are examined.
It can be misleading as a valid test is accepted as valid to the construct it measures only.
Face validity means that the intent of the assessment is explicit and exposes its intentions. Users seem to feel more
reassured in administering a test with good face validity as it makes them feel secure that it is measuring what they
are looking to measure.
Concurrent validity and predictive validity are both types of criterion-related validity. Concurrent validity assesses the
relationship between a test and a criterion that are measured at the same point in time. In contrast, predictive validity
assesses how well a test predicts future performance on a criterion. So, concurrent validity involves simultaneous
measurement, while predictive validity involves forecasting future outcomes based on the test results.
Incremental validity refers to the extent to which a new test or measure adds valuable information beyond what
existing measures already provide. In other words, it assesses whether the new test contributes something unique to
predicting an outcome.
For example, consider a hiring process where interviews and reference checks are standard procedures. If a new
personality test for job candidates demonstrates incremental validity, it means it provides additional predictive power
for success on the job beyond what the interviews and reference checks already offer.
8. Describe the procedures which may be used to demonstrate evidence of construct validity.
-test scores increase or decrease as a function of age, the passage of time, or an experimental manipulation as
theoretically predicted;
-test scores obtained after some event or the mere passage of time (or, posttest scores) differ from pretest scores as
theoretically predicted;
-test scores obtained by people from distinct groups vary as predicted by the theory;
-test scores correlate with scores on other tests in accordance with what would be predicted from a theory that covers
the manifestation of the construct in question.
The multitrait-multimethod (MTMM) matrix is a research design used to assess the validity of a set of measurements.
It involves examining the relationships between multiple traits and multiple methods used to measure those traits.
The procedure includes measuring several traits using different methods and then correlating the scores. The matrix
typically has three types of correlations:
**Convergent Validity (within trait):** Correlations between different methods measuring the same trait.
**Discriminant Validity (between trait):** Correlations between the same method measuring different traits.
**Method Variance:** Correlations between different methods measuring different traits, reflecting the influence of
the measurement method itself.
By analyzing these correlations, researchers can evaluate the consistency of results within traits, distinguish between
traits, and identify potential biases introduced by measurement methods. The MTMM matrix provides a
comprehensive perspective on the validity and reliability of measurements.
10. Explain the advantages of using factor analysis for learning about convergent and discriminant
validity. Enthusiasm
11.
It is useful as it brings forth interresting data however it is very complex to use and is rarely used in research because
of that.
In psychology, a bias “a factor inherent in a test that systematically prevents accurate, impartial measurement. An
example if the test uses academic content that some students have not been taught, in a test that measures
intelligence.
13. Distinguish between leniency error, severity error, central tendency error, and halo effect.
Leniency Error:** Occurs when a rater consistently gives higher ratings or evaluations than warranted. This can result
in inflated scores and may not accurately reflect the actual performance or attributes being assessed.
Severity Error:** In contrast to leniency error, severity error happens when a rater consistently gives lower ratings
than deserved. It can lead to undervaluing performance or characteristics and may affect fairness in evaluations.
Central Tendency Error:** Involves a rater consistently assigning average or middle-of-the-scale ratings, avoiding
extreme judgments. This error can mask individual differences and make it challenging to distinguish between high
and low performers.
Halo Effect:** This occurs when a rater's overall impression of an individual, either positive or negative, influences
their evaluations of specific attributes or behaviors. It can lead to biased assessments where one characteristic
unfairly influences judgments across the board.
These errors highlight the challenges in subjective evaluations and emphasize the importance of rater training and
awareness to minimize biases and enhance the accuracy of assessments.
Fairness in the context of psychometric testing, refers to the use of test in an impartial, equitable and just way. At
times, tests are used in unreasonable ways such as governments “diagnosing” ppl as ill for deviating from their
governmental vision.
Key Terms
o Inference
o Validity
o Validation
o Content validity
o Criterion-related validity
o Construct validity
o Ecological validity
o Face validity
o Criterion contamination
o Validity coefficient
o Incremental validity
o Convergent validity
o Discriminant validity
o Multitrait-multimethod matrix
o Factor analysis
o Rating error
o Leniency error
o Severity error
o Central tendency error
o Halo effect
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Analogue Behavioral Observation: Systematic observation of behavior in a setting designed to resemble the natural
environment.
Analogue Study: A research study conducted in an environment that simulates real-life situations.
Apperceive: To comprehend or interpret sensory information and integrate it with existing knowledge.
Behavioral Assessment: Evaluation of behavior through direct observation, measurement, and analysis.
Behavioral Observation: Systematic recording and analysis of observable behaviors in their natural settings.
Biofeedback: The use of electronic monitoring to provide individuals with information about physiological processes for self-
regulation.
Composite Judgment: A comprehensive evaluation formed by combining multiple judgments or assessments.
Comprehensive System (Exner): A scoring system for the Rorschach inkblot test developed by John E. Exner.
Contrast Effect: The influence of a preceding stimulus on the perception of a subsequent one.
Ecological Momentary Assessment: The collection of real-time data on a subject's behaviors, thoughts, and emotions in their
natural environment.
Figure Drawing Test: A projective test where individuals draw a human figure, analyzed for psychological insights.
Free Association: A psychoanalytic technique where individuals express thoughts without censorship to reveal unconscious
processes.
Functional Analysis of Behavior: An assessment method analyzing antecedents, behaviors, and consequences to understand and
modify behavior.
Implicit Motive: Unconscious motivational forces that influence behavior.
Inquiry (on the Rorschach): A category on the Rorschach inkblot test measuring a person's approach to problem-solving.
Leaderless Group Technique: A group assessment method where participants interact without a designated leader.
Need (Murray): A concept in Murray's theory of personality, representing a recurrent theme in an individual's experiences.
Objective Methods of Personality Assessment: Assessment techniques with standardized questions and clear scoring criteria.
Penile Plethysmograph: A device measuring changes in penile circumference, often used in sexual arousal research.
Percept (on the Rorschach): The visual response or interpretation of an inkblot on the Rorschach test.
Phallometric Data: Measurements of physiological responses related to sexual arousal.
Plethysmograph: An instrument measuring changes in volume in an organ or part of the body.
Polygraph: A lie detector measuring physiological responses such as heart rate and skin conductivity during questioning.
Press (Murray): Environmental influences and demands that act on an individual in Murray's theory of personality.
Projective Hypothesis: The idea that ambiguous stimuli elicit projections of unconscious thoughts and feelings.
Projective Method: A psychological assessment using unstructured stimuli to reveal unconscious thoughts.
Psychophysiological (Assessment Methods): Techniques measuring physiological responses to psychological stimuli.
Reactivity: Changes in behavior or physiological responses due to being observed or assessed.
Role Play: Acting out situations to assess or train individuals' responses in a specific context.
Rorschach Test: A projective psychological test using inkblots to assess personality and emotional functioning.
Self-Monitoring: The process of observing and recording one's behavior and responses to assess and modify them.
Sentence Completion: A projective technique where individuals complete sentences to reveal underlying thoughts and
emotions.
Sentence Completion Stem: The beginning of a sentence in a sentence completion test that individuals complete.
Sentence Completion Test: A psychological test using incomplete sentences to elicit responses revealing thoughts and feelings.
Situational Performance Measure: An assessment of behavior in specific situations to predict future performance.
TAT (Thematic Apperception Test): A projective test using ambiguous pictures to elicit stories revealing personality.
Testing the Limits (on the Rorschach): A procedure in Rorschach testing involving gradual changes in administration to explore
response variations.
Thema (Murray): In Murray's theory, a recurrent theme or cluster of needs influencing behavior.
Timeline Followback (TLFB) Methodology: A method collecting retrospective data on behavior over a specific time period.
Unobtrusive Measure: An assessment method that does not interfere with the natural context or behavior being observed.
Word Association: A psychological test where individuals respond to a stimulus word with the first word that comes to mind.
Word Association Test: A projective test using word stimuli to elicit associations revealing unconscious thoughts and emotions.