Psyel45a Reviwer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

PSYCHOLOGICAL ASSESSMENT

MIDTERM REVIEWER

Remote Assessment ↝ subject is not in physical proximity to


the person conducting the evaluation
Topic Outline:
● Pre-Review Lessons Ecological Momentary ↝ “in the moment” evaluation of
specific problems and related cognitive and behavioral
variables at the very time and place that they occur
INTRODUCTION Collaborative Psychological Assessment ↝ assessor and
Psychological Testing - process of measuring assessee may work as “partners” from initial contact through
psychology-related variables by means of devices or final feedback
procedures designed to obtain a sample of behavior Dynamic Assessment ↝ interactive approach to psychological
✧ method of evaluating psychological characteristics to assessment that usually follows a model of evaluation >
generate a behavioral sample intervention > evaluation
Testing ↝ described the group screening of thousands of PSYCHOLOGICAL ASSESSMENT PROCESS
military recruits during World War I 1. Determining the Referral Question
Psychological Assessment - gathering and integration of ✦ begins with a referral for assessment from a
psychology-related data for the purpose of making source
psychological evaluation ✦ one or more referral questions are put to the
✧ gathering of psychological data using instruments assessor about the assessee
Assessment ↝ emerged during World War I; a semantic 2. Acquiring Knowledge relating to the content of
distinction between testing and a more inclusive term the problem
✦ the assessor may meet with the assessee before
the formal assessment in order to clarify aspects of
TESTING ASSESSMENT
the reason for referral
Numerical in nature Answers referral questions 3. Data Collection
through the use of different ✦ the assessor prepares for the assessment by
tools of evaluation selecting the tools to be used
✦ after the selection of instruments, the formal
Individual or by group Individual assessment will begin
4. Data Interpretation
Administrators can be Assessor is the key to the ✦ after the assessment, the assessor writes a report
interchangeable without process if selecting tests
of the findings that is designed to answer the referral
affecting the evaluation and/or other tools of
evaluation question
✧Hit Rate - accurately predicts success or failure
Requires technician-like Requires a careful choice ✧Profile - narrative description, graph, or table; other
skills in terms of of evaluation tools, representations of the extent to which a person has
administration and evaluation skills, and demonstrated certain targeted characteristics as a result of the
scoring careful organization and application of tools of assessment
integration of data
✧Actuarial Assessment - approach to evaluation
Yield a test score/series of Entails logical characterized by the application of empirically demonstrated
test score problem-solving that brings statistical rules as determining factor in assessors’ judgment
to bear many sources of and actions
data assigned to answer ✧Mechanical Prediction - application of computer algorithms
the referral question together with statistical rules and probabilities to generate
findings and recommendations
VARIETIES OF ASSESSMENT ✧Extra-Test Behavior - observations made by an examiner
Therapeutic Psychological Assessment ↝ assessment that regarding what the examinee does and how the examinee
has a therapeutic component to it reacts during the course of testing that are indirectly related to
Educational Assessment ↝ use of tests and other tools to the test’s specific content but of possible significance to
evaluate abilities and skills relevant to success/failure in a interpretation
school context TOOLS OF PSYCHOLOGICAL ASSESSMENT
Retrospective Assessment ↝ use of evaluative tools to draw Test ↝ a measuring device or procedure to measure a variable
conclusions about psychological aspects of a person as they related to that modifier
existed at some point in time prior to the assessment

1
Psychological Test ↝ device/procedure designed to measure ● Attitude Test ↝ elicit personal beliefs and
variables related to psychology opinions
● Content: subject matter ● Interest Inventories ↝ measures likes and
● Format: form, plan, structure, arrangement, layout dislikes as well as one’s personality
● Item: specific stimulus to which a person orientation towards the world of work
● responds overtly and this response is being scored or OTHER TESTS:
evaluated 1. Speed Tests ↝ the interest is the number of times a
● Administration Procedures: one-to-one basis or test taker can answer correctly in a specific period
group administration 2. Power Tests ↝ reflects the level of difficulty of items
● Score: code or summary of statement, usually but not the test takers answer correctly
necessarily numerical in nature, but reflects an 3. Values Inventory ↝ measures the level of importance
evaluation of performance on a test of different values to one’s career, work, and life and
● Scoring: the process of assigning scores to the satisfaction they will receive with success
performances 4. Trade Tests ↝ assess individuals who have acquired
● Cut-Score: reference point derived by judgment and knowledge, skills, and competence in a particular
used to divide a set of data into two or more occupation but do not possess a formal qualification
classification 5. Neuropsychological Tests ↝ measure how well a
● Psychometric Soundness: technical quality person’s brain is working
● Psychometrics: science of psychological 6. Norm-Referenced Tests ↝ type of standardized test
measurement that compares students’ performances to one another
● Psychometrist or Psychometrician: refer to 7. Criterion-Referenced Tests ↝ measure a student’s
professional who uses, analyzes, and interprets academic performance against some standard/criteria
psychological data Interview ↝ method of gathering information through direct
ABILITY OR MAXIMAL PERFORMANCE TEST communication involving reciprocal exchange
1. Achievement Test ↝ measurement of the previous ✦Standardized/Structure - questions are prepared
learning ✦Non-standardized/Unstructured - questions are not
✧ used to measure general knowledge in a specific prepared in advance
period of time ✦Semi-standardized/Focused - may probe further on
✧ used to assess mastery specific number of questions
✧ rely mostly on content validity ✦Non-directive - subject is allowed to express in feelings
✧ fact-based or conceptual without fear of disapproval
2. Aptitude Test ↝ refers to the potential for learning or TYPES OF INTERVIEW:
acquiring a specific skill ✧ Mental Status Examination ↝ determines the mental
✧ tends to focus on informal learning status of the patient
✧ rely mostly on predictive validity ✧ Intake Interview ↝ determine why the client came for the
3. Intelligence Test ↝ refers to a person’s general assessment; chance to inform the client about policies, fees,
potential to solve problems, adapt to changing and processes involved
environments, abstract thinking, and profit from ✧ Social Case ↝ biographical sketch of the client
experience ✧ Employment Interview ↝ determine whether the candidate
4. Human Ability Test ↝ considerable overlap of is suitable for hiring
achievement, aptitude, and intelligence test ✧ Panel Interview ↝ more than one interviewer participates in
5. Typical Performance Test ↝ measure usual or the assessment
habitual thoughts, feelings, and behavior ✧ Motivational Interview ↝ used by counselors and clinicians
✧ indicate how test takers think and act on a daily to gather information about some problematic behavior while
basis simultaneously attempting to address it therapeutically
✧ use interval scales Portfolio ↝ samples of one’s ability and accomplishments\
✧ no right and wrong answers Case History Data ↝ refers to records, transcripts, and other
6. Personality Test ↝ measures individual dispositions accounts in written, pictorial, or other form that preserve
and preferences archival information, official and informal accounts, and other
✧ designed to identify characteristic data relevant to an assessee
✧ measured ideographically or nomothetically ✦Case Study - a report/illustrative account concerning a
● Structured Personality Tests ↝ provide person/event that was compiled on the basis of case history
statement, usually self-report, and require data
the subject to choose between two or more ✦Groupthink - result of the varied forces that drive
alternative responses decision-makers to reach a consensus
● Projective Personality Tests ↝ Behavioral Observation ↝ monitoring of actions of
unstructured, and the stimulus or response others/oneself by visual/electronic means while recording
are ambiguous
2
quantitative and/or qualitative information regarding those ● Universities relied on formal exams in conferring
actions degrees and honors
✦Naturalistic Observation - observe humans in their natural INDIVIDUAL DIFFERENCES
setting Charles Darwin
✦SORC Model - Stimulus, Organismic Valuables, Actual ● Believed that despite similarities, no two human are
Response, Consequence exactly the same
Role Play ↝ defined as acting an improvised/partially ● Some are more adaptive than others and these
improvised part in a simulated situation differences lead to more complex, intelligence
✦Role Play Test - assessees are directed to act as if they are organism over time
in a particular situation Francis Galton
COMPUTER AS TOOLS ● Established the testing movement
Local Processing ↝ scoring is be done on-site ● Pioneered the application of rating-scale and
Central Processing ↝ scoring is conducted at some central questionnaire method
location ● Pioneered the use of statistical concept central to
Teleprocessing ↝ test-related data may be sent to and psychological experimentation and testing
returned from the central facility by means of phone lines if ● Coefficient and correlation
processing is done at a central location EARLY EXPERIMENTAL PSYCHOLOGISTS
Simple Scoring Report ↝ an account of a testtaker’s Johan Friedrich Herbart
performance can range from a mere listing of a score/scores ● Father of pedagogy as an academic discipline
Extended Scoring Report ↝ includes statistical analyses of ● Went against Wilhelm Wundt
the testtaker’s performance Wilhem Wundt
Interpretive Report ↝ distinguished by its inclusion of ● Founded the first psychological laboratory at the
numerical or narrative interpretive statements in the report University of Leipzig in Germany (1872)
Consultative Report ↝ may provide expert opinion concerning ● Focused on how people are similar
analysis of the data; at the high end of interpretive reports Edward Titchner
Integrative Report ↝ designed to integrate data from sources ● He succeeded Wundt
other than the test itself into the interpretive report; employs ● He brought structuralism in America
previously collected data into the test report Louis Leon Thurstone
CAPA ↝ computer-assisted psychological assessment ● He is a large contributor of factor analysis
PARTIES IN PSYCHOLOGICAL ASSESSMENT ● Law of comparative judgment
Test Developer ↝ creates test or other methods of STUDENTS OF WILHELM WUNDT
assessment Victor Henri
Test Publisher ↝ publish, market, and sell tests ● He suggested how mental test could be used to
Test User ↝ clinicians, school psychologists, human measure higher mental process
resources, personnel are the ones who will conduct the test ● He collaborated with Alfred Binet
Test Taker ↝ subject of an assessment/observation Emil Kraeplin
Test Sponsors ↝ institutions/government who contract test ● He used word associate techniques as a formal test
developers for various testing services Lightner Witner
Psychological Autopsy ↝ reconstruction of deceased ● He succeeded Cattell as a director of psychology
individuals’ psychological profile laboratory University of Pennsylvania
Society at Large ↝ test developers devised new tests for the ● Little known as founder of Clinical Psychology
emerging society STUDY OF MENTAL DEFICIENCY AND INTELLIGENCE
✦ Test Battery - selection of tests and assessment procedures TESTING
typically composed of tests designed to measure different James Mckeen Cattell
variables but having a common objective ● He coined the term mental test
✦ Protocol - a form/sheet/booklet on which a testtaker’s Jean Esquirol
responses are ● He provided the first accurate description of mental
EARLY ANTECEDENTS retardation and entity separate from insanity
Chinese Civilization Alfred Binet
● Testing and testing programs first came into China as ● He is the father of IQ testing
early as 2200BC Lewis Terman
● The purpose is in the means of selecting who, of ● He introduced the concept of IQ as determined by the
many applicants, would obtain government jobs mental age and chronological age
Greek Civilization Edouard Seguin
● Tests were used to measure intelligence and physical ● He pioneered modern educational methods for
skills teaching people who are mentally
European Universities retarded/intellectually disabled
Charles Spearman
3
● He built the mathematical framework for the statistical IMPORTANT TERMS AND DEFINITIONS
te chnique of factor analysis Psychology ↝ scientific study of human behavior and mental
● He introduced the two-factor theory processes
Louis Leon Thrustone Practice of Psychology ↝ the delivery of psychological
● He proposed the primary mental abilities services that involve the application of psychological principles
David Wechsler and procedures
● He developed the Wechsler Intelligence Tests (WICS, Psychological Intervention ↝ psychological interventions that
WAIS) involve the application of psychological principles and methods
Raymond Cattell to improve the psychological functioning of individuals
● He introduced the components of g (fluid and Psychological Assessment ↝ gathering and integration of
crystallized) psychology-related data for the purpose of:
Joy Paul J.P. Guilford (a) making a psychological evaluation accomplished through a
● He theorized the many factor analysis theory variety of tools
Vernon Carroll (b) assessing diverse psychological functions
● He introduced the hierarchical approach “g” Psychological Programs ↝ development, planning,
Robert Sternberg implementation, monitoring and evaluation of psychological
● He introduced the 3g’s treatment programs
● Academic, practical, and creative Psychological Evaluation ↝ include the making of diagnostic
Howard Gardner interpretations, reports, and recommendations
● He conceptualized the multiple intelligence theory (a) as part of a case study
WORLD WAR I (b) in support of diagnostic screening, placement, management
Robert Yerkes decisions, psychiatric evaluation, legal action, psychological
● He pioneered the first group intelligence known as the counseling, and psychotherapy/change intervention
Army Alpha (literate) and Army Beta (illiterate) Assessing Diverse Psychological Functions ↝ include the
Arthur Otis development, standardization, and publication of psychological
● He introduced multiple choice and other “objective” tests which measure adjustment and psychopathology
item type of tests Clinical Supervision ↝ direction, guidance, mentoring, and
Robert Woodworth cliniquing of psychology practitioners and interns, rpm, and
● He devised the Personal Data Sheet (known as the other trainees for psychology-related work to meet the
first personality test) which aimed to identify soldiers standards of quality and excellence in professional practice
who are at risk for shell shock Psychologist ↝ natural person who is duly registered and
Hermann Rorschach holds a valid Certificate of Registration and a valid Professional
● He developed the projective test known as the Identification Card as a Professional Psychologist, issued by
Rorschach Inkblot Test the Board and the Commission pursuant to Section 3 (c),
Henry Murray and Christiana Morgan Article III of R.A. No. 10029, for the purpose of delivering
● They developed the projective test called Thematic psychological services defined this IRR
Apperception Test Psychometrician ↝ natural person who has been registered
Raymond Cattell and issued a valid Certificate of Registration and a valid
● He identified the 16 factors of personality (16 Professional Identification Card as a psychometrician by the
Personality Factors) Board and the Commission in accordance with Sec. 3 (d),
McCrae and Costa Article III of R.A. No. 10029, and is authorized to do the
● They developed the Big 5 Personality Factors following activities:
PSYCHOLOGICAL TESTING IN THE PHILIPPINES ● Administer and score objective personality tests,
Virgilio Enriquez structured personality tests, excluding projective tests
● Panukat ng Ugali at Pagkatao (PUP) and other high-level forms of psychological tests
Aurora R. Placio ● Interpret results of said tests and prepare a written
● Panukat ng Katalinuhang Pilipino (PKP) report on the results
Anadaisy Carlota ● Conduct preparatory intake interviews of clients for
● Panukat ng Pagkataong Pilipino (PPP) psychological sessions
Gregorio E.H. Del Pilar Provided that these activities shall at all times be conducted
● Masaklaw ng Panukat ng Loob (Mapa ng Loob) under the supervision of a licensed psychologist
Alfredo Lagmay CODE OF ETHICS AND PROFESSIONAL STANDARDS FOR
● Philippine Thematic Apperception Test (PTAT) PSYCHOLOGY PRACTITIONERS
PHILIPPINE PSYCHOLOGY ACT OF 2009 ✦ The Psychological Association of the Philippines (PAP)
Republic Act No. 10029 ↝ An act to regulate the practice of adopted a Code of Ethics for the Philippines in 2008
Psychology, creating for this purpose a professional regulatory ✦ The Code presents the principles and standards that shall
board of Psychology,, appropriating funds therefor and for govern the norms of conduct and ethics of all registered
other purposes Psychologists and Psychometricians in the Philippines
4
✦ Psychology practitioners in the Philippines adhere to the ✧ Tools that are appropriate to the language, competence,
following Universal Declaration of Ethical Principles: and other relevant characteristics
● Principle I: Respect for the dignity of persons and CHOOSING THE APPROPRIATE INSTRUMENT
people ↝ Test worthiness
● Principle II: Competent caring for the well-being of ↝ Reliability
persons and people ↝ Validity
● Principle III: Integrity ↝ Normative base
● Principle IV: Professional and Scientific ↝ Cross-cultural fairness
Responsibilities to society ↝ Practicality of the tests
ETHICAL ISSUES TEST ADMINISTRATION
↝ Misuse of works ✧ The test should be administered appropriately as defined by
↝ Conflict between ethics and law, regulations, or other the way they were established and standardized
↝ Conflicts between ethics and organization demands ✧ Alterations should be noted
↝ Action on ethical violations ✧ Interpretations of test data adjusted if testing conditions
↝ Cooperating with the ethics committee were not ideal
↝ Improper complaints OBSOLETE AND OUTDATED TEST RESULTS
↝ Unfair discrimination against complainants and respondents ✧ Do not base interpretations, conclusions, and
COMMON ETHICAL ISSUES IN ASSESSMENT recommendations on outdated test results
↝ Confidentiality ✧ Do not base interpretations, conclusions, and
↝ Informed consent recommendations on obsolete tests
↝ Choice and use of Psychological Tests INTERPRETING ASSESSMENT RESULTS
↝ Release of Test Results ✧ Under no circumstances should the test results be reported
BASES FOR ASSESSMENT without taking into consideration the validity, reliability, and
✧ The expert opinions that is provided through appropriateness of the test. There should be indication of
recommendations, reports, and diagnostic or evaluative reservations regarding the interpretations
statements are based on substantial information and ✧ Interpret assessment results while considering the purpose
appropriate assessment techniques of the assessment and other factors such as the client’s
✧ Expert opinions are provided regarding the psychological test-taking abilities, characteristics, situational, personal, and
characteristics of a person only after employing adequate cultural differences
assessment procedures and examination to support the RELEASE OF TEST DATA
conclusions and recommendations ✧ Ensure that test results and interpretations are not used by
✧ In instances where opinions are asked about an individual persons other than those explicitly agreed upon by the referral
without conducting an examination on the basis of a review of sources prior to the assessment procedure
existing test results and reports, the limitations of the opinions ✧ Do not release test data in the forms of raw and scaled
and the basis of the conclusions and recommendations are scores, client’s responses to test questions/stimuli, and notes
discussed regarding the client’s statements and behaviors during the
INFORMED CONSENT IN ASSESSMENT examination
Gather informed consent prior to the assessment of the clients EXPLAINING ASSESSMENT RESULTS
except for the following instances: ✧ Test results are released only to the sources of referral and
✦ when it is mandated by the law with written permission from the client if it’s self-referral
✦ when it is implied such as in routine education, institutional, ✧ Explain test results to relatives, parents, or teachers through
and organizational activity non-technical language
✦ when the purpose of the assessment is to determine the ✧ Explain findings and test results to clients/designated
individual’s decisional capacity representatives except when the relationship precludes the
✦ clients are educated about the nature of the services, provision of explanation of results and it’s explained in advance
financial arrangements, potential risks and limits of to the client
confidentiality; in instances where the client is not competent to ✧ Supervise the release of test results if it’s needed to be
provide informed consent on assessment, these matters are shared with schools, social agencies, courts, or industry
discussed with immediate family members or legal guardians TEST SECURITY
✦ if there is a third-party interpreter needed, the confidentiality ✧ The administration and handling of all test materials shall be
of test results and the secure of the tests must be ensured handled only by qualified users or personnel
ASSESSMENT TOOLS ASSESSMENT BY UNQUALIFIED PERSONS
✧ Select and administer tests that are pertinent to the reasons ✧ Do not promote the use of assessment tools and methods
for the referral and the purpose of the assessment by unqualified persons except for training purposes with
✧ Methods and procedures that are consistent with current adequate supervision
scientific and professional developments ✧ Ensure that test protocols, interpretations, and all other
✧ Tests are standardized, valid, reliable, and have normative records are kept secured from unqualified persons
data
5
TEST USER COMPETENCE ✦ Psychologists must file complaints responsibly by checking
↝ Adequate knowledge about testing facts about the allegations
↝ Familiar with the tests ✦ Psychologists DO NOT deny persons employment,
↝ 3-level qualifications advancement, admissions, tenure or promotion based solely
✦ Level A: with aid of the manual and a general orientation upon their having made or their being the subject of an ethics
✦ Level B: require some technical knowledge of test complaint
construction; supporting psychological and educational fields ↝ Just because they are questioned by the ethics committee or
✦ Level C: requires substantial knowledge of testing and involved in an on-going ethics investigation, they would be
supporting psychological fields discriminated or denied advancement
TEST CONSTRUCTION ↝ Unless the outcome of the proceedings are already
✧ Develop tests and other assessment tools using current considered
scientific findings and knowledge, appropriate psychometric ✦ Psychologists should do their services within the boundaries
properties, validation, and standardization procedures of their competence, which is based on the amount of training,
CONFIDENTIALITY education, experience, or consultation they had
↝ Protect the client information ✦ When they are tasked to provide services to clients who are
↝ Confidentiality is an ethical guideline and not a legal right deprived with mental health services (e.g., communities far
↝ Privileged information is legal right from the urban cities), however, they were still not able to
↝ Term that ensures the rights of professionals not to reveal obtain the needed competence for the job, they could still
information about their clients provide services AS LONG AS they make reasonable effort to
WHEN TO REVEAL CONFIDENTIAL INFORMATION obtain the competence required, just to ensure that the
↝ If the client is in danger of harming themself/someone else services were not denied to those communities
↝ If a child is a minor and the law states that the parents have ✦ During emergencies, psychologists provide services to
a right to information about their child individuals, even though they are yet to complete the
↝ If a client asks to break confidentiality (testimony is needed competency/training needed just to ensure that services were
in court) not denied. However, the services are discontinued once the
↝ If it’s bound by the law to break confidentiality appropriate services are available
↝ Reveal information to the supervisor in order to benefit the ✦ Psychologists should discuss the limits of confidentiality,
client uses of the information that would be generated from the
↝ Have a written agreement from the client to reveal services to the persons and organizations with whom they
information to specified sources establish a scientific or professional relationships
STRATEGIES FOR AVOIDING ETHICAL AND LEGAL ✦ Before recording voices or images, they must obtain
PERILS permission first from all persons involved or their legal rep
↝ Always obtain a written informed consent ✦ Only discuss confidential information with persons clearly
↝ Get consultation concerned/involved with the matters
↝ Maintain professional competence ✦ Disclosure is allowed with appropriate consent
↝ Know the law and ethics code ✧ No consent is not allowed UNLESS mandated by the law
↝ Avoid or plan for high-risk patients situations ✦ No disclosure of confidential information that could lead to
↝ Keep good written records the identification of a client unless they have obtained prior
↝ Maintain confidentiality consent or the disclosure cannot be avoided
↝ Be extra careful with managed care and insurance ✧ Only disclose necessary information
companies ✦ Exemptions to disclosure:
↝ Get help when needed ↝ If the client is disguised/identity is protected
ETHICAL PRINCIPLES AND STANDARDS OF PRACTICE ↝ Has consent
✦ If mistakes was made, they should do something to correct ↝ Legally mandated
or minimize the mistakes ✦ Psychologists can create public statements as long as they
✦ If an ethical violation made by another psychologist was would be responsible for it
witnessed, they should resolve the issue with informal ✧ They cannot compensate employees of the media in return
resolution, as long as it does not violate any confidentiality for publicity in a news item
rights that may be involved ✧ Paid Advertisement must be clearly recognizable
✦ If informal resolution is not enough or appropriate, referral to ✧ When they are commenting publicly via internet, media,
state or national committees on professional ethics, state etc., they must ensure that their statement are based on their
licensing boards, or the appropriate institutional authorities can professional knowledge in accord with appropriate psych
be done. Still, confidentiality rights of the professional in literature and practice,consistent with ethics, and do not
question must be kept. indicate that a professional relationship has been established
✦ Failure to cooperate in ethics investigation itself, is an ethics with the recipient
violation, unless they request for deferment of adjudication of ✦ Must provide accurate information and obtain approval prior
an ethics complaint to conducting the research
✦ Informed consent is required, which include:
6
↝ Purpose of the research Measurement ↝ act of assigning numbers/symbols to
↝ Duration and procedures characteristics of things according to rules
↝ Right to decline and withdraw Scale ↝ set of numbers whose properties model empirical
↝ Consequences of declining or withdrawing properties of the objects to which the numbers are assigned
↝ Potential risks, discomfort, or adverse effects ● Continuous Scale ↝ a scale used to measure a
↝ Benefits continuous variable
↝ Limits of confidentiality ● Discrete Scale ↝ a scale used to measure discrete
↝ Incentives for participation variable
↝ Researcher’s contact information Statistics ↝ refers to the techniques by which
✦ Permission for recording images or vices are needed unless quantitative/qualitative data are collected, presented,
the research consists of solely naturalistic observations in organized, analyzed, and interpreted
public places, or research designed includes deception Population ↝ the complete set of individuals, objects, or
✧ Consent must be obtained during debriefing scores that the investigator is interested in studying
✦ Dispense or Omitting Informed consent only when: Sample ↝ subset of the population
↝ Research would not create distress or harm Variable ↝ any property or characteristic of some event,
✧ Study of normal educational practices conducted in an object, or person that may have different values at different
educational settings times depending on the conditions
✧ Anonymous questionnaires, naturalistic observation, ● Independent Variable (IV) ↝ variable that is
archival research systematically manipulated by the investigator
✧ Confidentiality is protected ● Dependent Variable (DV) ↝ variable that the
↝ Permitted by law investigator measures to determine the effect of the
o Avoid offering excessive incentives for research participation independent variable
that could coerce participation ● Quasi-Independent Variable (DV) ↝ non
✦ DO not conduct study that involves deception unless they manipulated variable to designate groups
have justified the use of deceptive techniques in the study Data ↝ the measurements that are made on the subjects of an
✧ Must be discussed as early as possible and not during the experiment
conclusion of data collection Statistic ↝ number calculated on sample data that quantifies a
✦ They must give opportunity to the participants about the characteristic/sample
nature, results, and conclusions of the research and make sure ● Descriptive Statistics ↝ concerned with techniques
that there are no misconceptions about the research that are used to describe/characterize the obtained
✦ Must ensure the safety and minimize the discomfort, data
infection, illness, and pain of animal subjects ● Inferential Statistics ↝ involves techniques that use
✧ If so, procedures must be justified and be as minimal as the obtained sample data to infer the population;
possible drawing conclusions about the population; testing the
✧ During termination, they must do it rapidly and minimize the significant differences and independence between two
pain or more variables
✦ Must no present portions of another’s work or data as their Parameter ↝ number calculated on population data that
own quantifies a characteristic of the population
✧ Must take responsibility and credit, including authorship Error ↝ refers to the collective influence of all the factors on a
credit, only for work they have actually performed or to which test score or measurement beyond those specifically
they have substantially contributed measured by the test or measurement
✧ Faculty advisors discuss publication credit with students as Properties of Scale:
early as possible ✦ Magnitude
✦ After publishing, they should not withhold data from other ↝ property of “moreness”
competent professionals who intends to reanalyze the data ↝ particular instance of the attribute represents more, less, or
✧ Shared data must be used only for the declared purpose equal amounts of the given quantity than does another
✦ RA 9258 – Guidance and Counseling Act of 2004 instance
✦ RA 9262 – Violence Against Women and Children ✦ Equal Intervals
✦ RA 7610 – Child Abuse ↝ difference between two points at any place on the scale has
✦ RA 9165 – Comprehensive Dangerous Drugs Act of 2002 the same meaning as the difference between two other points
✦ RA 11469 – Bayanihan to Heal as One Act that differ by the same number scale units
✦ RA 7277 – Magna Carta for Disabled Persons ✦ Absolute Zero
✦ RA 11210 – Expanded Maternity Leave Law ↝ obtained when nothing of the property being measured
✦ RA 11650 – Inclusive Education Law exists
✦ RA 10173 – Data Privacy Act FOUR LEVELS OF MEASUREMENT SCALE
✦ House Bill 4982 – SOGIE Bill
Nominal Scales Involve classification or
✦ Art. 12 of Revised Penal Code – Insanity PleaSTATISTICS categorization based on one
REFRESHER
7
✧ used to find whether a respondent has a positive or
or more distinguishing
characteristics negative attitude towards an object
Staple Scales
Ordinal Scales Permit classification; rank ✧ originally developed to measure the direction and intensity
ordering on some of an attitude simultaneously
characteristics is also ✧ modern versions place a singled adjective as a substitute
permissible for the semantic differential to create pairs of bipolar adjectives
FREQUENCY DISTRIBUTION
Interval Scales Contain equal intervals
Distribution ↝ defined as a set of test scores arrayed for
between numbers; each unit
on the scale is exactly equal recording or study
to any other unit on the Frequency Distribution ↝ all scores are listed alongside the
scale number of times each score occurred
Simple Frequency Distribution ↝ indicates that individual
Ratio Scales Has a true zero point; all scores have been used and the data have not been grouped
mathematical operations can Grouped Frequency Distribution ↝ used to summarize the
meaningfully be performed data
because there exist equal
Raw scores ↝ straightforward, unmodified accounting of
intervals between the
numbers on the scale as performance that is usually numerical
well as a true or absolute Class Interval (CI) ↝ grouping or categories defined by lower
zero scale and upper limits
COMPARATIVE SCALES OF MEASUREMENT Class Size (i) ↝ the difference between the upper class
Paired Comparison boundary and the lower class boundary of a class interval;
✧ comparative technique in which a respondent is presented width of each class interval
with two objects at a time Class Boundaries ↝ the numbers used to separate class but
✧ asked to select one object according to some criterion without gaps created by class limits; the number to be added
✧ Data obtained are ordinal in nature or subtracted is half the difference between the upper limit of
Rank Order one class and the lower limit of the next class
✧ respondents are presented with several items Class Mark (x) ↝ the middle value or midpoint of a class
simultaneously and asked to rank in order of priority interval; obtained by getting the average of the lower and
✧ describes the favored and unfavored objects but does not upper limits and then dividing the answer by 2
reveal distance between the objects Range ↝ difference between the the highest score and the
✧ the data in rank order is ordinal data lowest score
Constant Sum Relative Frequency ↝ the percentage distribution in every
✧ respondents are asked to allocate a constant sum of units class interval
such as points among a set of stimulus objects with respect to Cumulative percentage frequency distribution ↝ refers to
some criterion the number of observations belonging to a class interval or a
✧ advantage of this technique is saving time; disadvantage is number of items within
the respondent may allocate more or fewer points than those Less than cumulative frequency (<cf) ↝ obtained by adding
specified the frequencies successfully from lowest to highest
Q-Sort Technique Greater than cumulative frequency (>cf) ↝ obtained by
✧ uses a rank order procedure to sort objects based on adding the frequencies from highest class interval to lowest
similarity with respect to some criterion class interval
✧ it is more important to make comparisons among different Graphs ↝ diagram or chart composed of lines, points, bars, or
responses than the responses between different respondents other symbols that describe and illustrate data
NON-COMPARATIVE SCALES OF MEASUREMENT
Continuous Rating Scales Histogram graph with vertical lines
✧ respondents rate the objects by placing a mark at the drawn at the true limits of
appropriate position on a continuous line that runs from one each test score, forming a
extreme of the criterion variable to the other series of continuous
rectangles
Likert Scales
✧ respondents indicate their own attitudes by checking how
Bar Graph numbers indicative of
strongly they agree or disagree with certain statements frequency appear on the
✧ respondents generally chose from 5 alternatives Y-axis and reference to
Semantic Differential Scales some categorization
✧ a seven-point rating scale with endpoints associated with appears on the X-axis
bipolar labels that have semantic meaning
Frequency Polygon data are expressed by a

8
Deciles ↝ divided to 10 parts
continuous line connecting
the points where test scores SKEWNESS
meet frequencies ↝ nature and extent to which symmetry is absent
Positive Skewed
MEASURES OF CENTRAL TENDENCY
✧ few scores fall the high end of the distribution
↝ statistic that indicates the average/midmost score between
✧ the exam is difficult
the extreme scores in a distribution
✧ more items that was easier would have been desirable in
↝ compute the average score for each group and then
order to better discriminate at the lower end of the distribution
compare the average; measure computed is the measure of
of test scores
the central tendency
Negative Skewed
3 MOST OFTEN USED MEASURES OF CENTRAL
✧ when relatively few of the scores fall at the low end of the
TENDENCY:
distribution
Arithmetic Mean
✧ the exam is easy
✧ statistic at the ratio level of measurement
✧ more items of a higher level of difficulty would make it
✧ average
possible to better discriminate between scores at the upper
✧ symbol: x̄
end of the distribution
✧ formula: equal to the sum of the observations divided by the
Symmetrical Distribution
number of observations:
✧ right side of the graph is mirror image of the left side
x̄ = Σ(x/n)
✧ has only one mode and it is in the center of the distribution
✧ an arithmetic mean can also be computed from a frequency
✧ mean = median = mode
distribution
Median
✧ takes into account the orders of scores and is ordinal in
nature
✧ middle score in the distribution
✧ determine the median by ordering the scores ascending or
descending order

Odd Number Even Number KURTOSIS


↝ steepness of a distribution in its center
The median score is exactly The median score will be
↝ measure of the combined weight of a distribution’s tails
in the middle calculated by obtaining the
average of the two middle relative to the rest of the distribution
score Mesokurtic
✧ most similar to the standard normal distribution
✧ resembles a bell curve
Mode
✧ fatter tails and lower peak
✧ nominal in nature
Leptokurtic
✧ most frequently occurring score in a distribution of scores
✧ displays greater kurtosis than a mesokurtic distribution
✧ mode is found by inspection of the scores
✧ extremely thick tails and a very thin and tall peak
● Unimodal ↝ only one mode
Platykurtic
● Bimodal ↝ two modes
✧ slender tails and and a peak that’s smaller than a
MEASURES OF VARIABILITY
mesokurtic distribution
↝ statistic that describes the amount of variation in a
✧ short and broad-looking peak
distribution
Variability ↝ indication how scores in a distribution are
scattered of dispersed
Range ↝ difference between the highest and lowest scores
Interquartile Range ↝ difference between Q3 and Q1
Semi-Interquartile Range ↝ interquartile range divided by 2
Standard Deviation ↝square root of the averaged deviations
about the mean; equal to the square root of the variance
Variance ↝ equal to the arithmetic mean of the squares of the
differences between the score in distribution and their mean
MEASURES OF LOCATION
Percentile ↝ expression of the percentage of people whose
score on a test/measure falls below a particular raw score
Quartiles ↝ one of the 3 dividing points between the four NORMAL CURVE
quarters of distribution; typically labeled as Q1, Q2, and Q3 ↝ Known as Gaussian Curve
9
↝ bell-shaped, smooth, mathematically defined curve that is
highest at its center
↝ Asymptotically - approaches but never touches the axis
↝ perfectly symmetrical without skewness
↝ mean, median, mode have the same value
↝ approximately 34% of all scores occur between the mean
and 1 SD and below
↝ 68% of all scores occur between the mean and +- 1 SD
↝ 95% of all scores occur between the mean and +- 2 SD
↝ 99% of all scores occur between the mean and +- 3SD Linear Relationships ↝ between two variables is one in which
the relationship can be most accurately represented by a
straight line
Curvilinear ↝ when scatter plot of the X and Y variables are
drawn, curved line fits the points better than a straight line
Scatter Plot ↝ the relationship between variables can be seen
by plotting using the paired X and Y values
Positive Relationship ↝ indicates that there is a direct
relationship between the variables; X increases, Y increases
Negative Relationship ↝ indicates that there is an inverse
relationship between X and Y; X increases, Y decreases
Zero Correlation ↝ no relationship exists between two
variable
The Degree of Correlation
1.00 ↝ perfect relationship
STANDARD SCORES 0.75-0.99 ↝ very strong relationship
0.50-0.74 ↝ strong relationship
0.25-0.49 ↝ weak relationship
MEAN SD 0.01-0.24 - ↝ very weak relationship
0.00 ↝ no relationship
Z-Score 0 1 The Direction of Correlation
-1.00 ↝ perfect negative relationship
T-Score 50 10
-0.70 ↝ strong negative relationship
Stanine 5 2 -0.50 ↝ moderate negative relationship
-0.30 ↝ weak negative relationship
STEN 5.5 2 0.00 ↝ no relationship
0.30 ↝ weak positive relationship
IQ 100 15 0.50 ↝ moderate positive relationship
0.70 ↝ strong positive relationship
GRE or SAT 500 100 1.00 ↝ perfect positive relationship
Linear Transformation ↝ one that retains a direct numerical ASSUMPTIONS ABOUT PSYCHOLOGICAL TESTING
relationship to the original raw score A1: Psychological Traits and States Exist
Nonlinear Transformation ↝ required when the data under ✦ Trait – any distinguishable, relatively enduring way in which
consideration are not normally distributed one individual varies from another
Normalized Standard Score Scale ↝ normalizing the ↝ Permit people predict the present from the past
distribution involves stretching the skewed curve into the shape ↝ Characteristic patterns of thinking, feeling, and behaving that
of a normal curve and creating a corresponding scale of generalize across similar situations, differ systematically
standard scores between individuals, and remain rather stable across time
CORRELATION AND INFERENCE Psychological Trait ↝ intelligence, specific intellectual
Inferences ↝ how some things are related to other things abilities, cognitive style, adjustment, interests, attitudes, sexual
Coefficient of Correlation/Correlation Coefficient (r) ↝ orientation and preferences, psychopathology, etc.
number that provides with an index of the strength of the ✦ States – distinguish one person from another but are
relationship between two things relatively less enduring
Correlation ↝ expression of the degree and direction of ↝ Characteristic pattern of thinking, feeling, and behaving in a
correspondence between two things concrete situation at a specific moment in time
● degree - weak - strong ↝ Identify those behaviors that can be controlled by
● direction - positive, negative, no correlation manipulating the situation
✦ Psychological Traits exists as:

10
↝ Construct: an informed, scientific concept developed or A6: Testing and Assessment can be Conducted in a Fair
constructed to explain a behavior, inferred from overt behavior and Unbiased Manner
↝ Overt Behavior: an observable action or the product of an ✦ Despite best efforts of many professionals, fairness-related
observable action questions and problems do occasionally rise
✦ Trait is not expected to be manifested in behavior 100% of ✦ In all questions about tests with regards to fairness, it is
the time important to keep in mind that tests are toolsꟷthey can be
✦ Whether a trait manifests itself in observable behavior, and used properly or improperly
to what degree it manifests, is presumed to depend not only on A7: Testing and Assessment Benefit Society
the strength of the trait in the individual but also on the nature ✦ Considering the many critical decisions that are based on
of the action (situation-dependent) testing and assessment procedures, we can readily appreciate
✦ Context within which behavior occurs also plays a role in the need for tests
helping us select appropriate trait terms for observed behaviors A GOOD TEST
✦ Definition of trait and state also refer to a way in which one ↝ Criteria for a good test would include clear instructions for
individual varies from another administration, scoring, and interpretation
✦ Assessors may make comparisons among people who, ↝ Technical criteria that assessment professionals use to
because of their membership in some group or for any number evaluate the quality of tests: psychometric soundness,
of other reasons, are decidedly not average reliability, and validity
A2: Psychological Traits and States can be Quantified and ↝ A good test is a useful test, one that yields actionable results
Measured that will benefit individual test takers
✦ Once the trait, state or other construct has been defined to Norm ↝ basic, average
be measured, a test developer consider the types of item Norms ↝ test performance data of a particular group of test
content that would provide insight to it, to gauge the strength of takers that are designed to use as a reference when evaluating
that trait and interpreting individual test scores
✦ Measuring traits and states means of a test entails Normative Sample ↝ group of people whose performance on
developing not only appropriate tests items but also a particular test is analyzed for reference in evaluating the
appropriate ways to score the test and interpret the results performance of individual test takers
✦ Cumulative Scoring ↝ assumption that the more the Norming ↝ process of deriving norms
testtaker responds in a particular direction keyed by the test Norm-referenced ↝ method of evaluation and a way of
manual as correct or consistent with a particular trait, the deriving meaning from test scores by evaluating an individual
higher that test taker is presumed to be on the targeted ability test taker’s score and comparing it to scores of a group of test
or trait takers
A3: Test-Related Behavior Predicts Non-Test-Related Race Norming ↝ controversial practice of norming on the
Behavior basis of race or ethnic background
✦ The tasks in some tests mimics the actual behaviors that User/Program Norms ↝ consist of descriptive statistics based
the test user is attempting to understand on a group of test takers in a given period of time rather than
✦ Such tests only yield a sample of the behavior that can be norms obtained by formal sampling methods
expected to be emitted under non test conditions Standardized ↝ has clearly specified procedures for
A4: Test and Other Measurement Techniques have administration and scoring
Strengths and Weaknesses Standardization ↝ process of administering a test to a
✦ Competent test users understand and appreciate the representative sample of test takers for the purpose of
limitations of the test they use as well as how those limitations establishing norms
might be compensated for by data from other sources Sampling ↝ process of selecting the portion of the universe
A5: Various Sources of Error are Part of the Assessment deemed to be representative of the whole population
Process Sample ↝ a portion of the universe of people deemed to be
✦ Error ↝ refers to something that is more than expected; it is representative of the whole population
component of the measurement process Population ↝ the universe as a whole
↝ Refers to a long-standing assumption that factors other than Purposive Sample ↝ arbitrarily selecting some sample
what a test attempts to measure will influence performance on because it is believed to be a representative of the population
the test Incidental Sample ↝ one that is convenient or available for
↝ Error Variance ↝ the component of a test score attributable use
to sources other than the trait or ability measured Criterion ↝ standard on which a judgment or decision may be
✦ Potential Sources of error variance: based
1. Assessors Criterion-referenced ↝ method of evaluation and a way of
2. Measuring Instruments deriving meaning from test scores by evaluating an individual’s
3. Random errors such as luck score with reference to a set standard
✦ Classical Test Theory ↝ each test taker has true score on RELIABILITY
a test that would be obtained but for the action of ↝ dependability/consistency of the instrument or scores
measurement error obtained by the same person when re-examined with the same
11
test on different occasions or with different sets of equivalent Random Error ↝ source of error in measuring a targeted
items variable caused by unpredictable fluctuations and
✦ Test may be reliable in one context, but unreliable in another inconsistencies of other variables in measurement process
✦ Estimate the range of possible random fluctuations that can (e.g., noise, temperature, weather)
be expected in an individual’s score Systematic Error ↝ source of error in a measuring a variable
✦ Free from errors that is typically constant or proportionate to what is presumed
✦ More number of items = higher reliability to be the true values of the variable being measured
✦ Minimizing error ● has consistent effect on the true score
✦ Using only representative sample to obtain an observed ● SD does not change, the mean does
score ✦ Reliability refers to the proportion of total variance attributed
✦ True score cannot be found to true variance
✦ Reliability Coefficient: index of reliability, a proportion that ✦ The greater the proportion of the total variance attributed to
indicates the ratio between the true score variance on a test true variance, the more reliable the test
and the total variance ✦ Error variance may increase or decrease a test score by
Classical Test Theory (True Score Theory) ↝ score on a varying amounts, consistency of test score, and thus, the
ability tests is presumed to reflect not only the test taker's true reliability can be affected
score on the ability being measured but also the error Test-Retest Reliability
✦ Error ↝ refers to the component of the observed test score Error: Time Sampling
that does not have to do with the test taker's ability ✧ time sampling reliability
✦ Errors of measurement are random ✧ an estimate of reliability obtained by correlating pairs of
X=T+E scores from the same people on two different administrations
X = observed score of the test
T = true score ✧ appropriate when evaluating the reliability of a test that
E = error purports to measure an enduring and stable attribute such as
✦ When you average all the observed scores obtained over a personality trait
period of time, then the result would be closest to the true ✧ established by comparing the scores obtained from two
score successive measurements of the same individuals and
✦ The greater number of items, the higher the reliability calculating a correlated between the two set of scores
✦ Factors that contribute to consistency: stable attributes ✧ the longer the time passes, the greater likelihood that the
✦ Factors that contribute to inconsistency: characteristics of reliability coefficient would be insignificant
the individual, test, or situation, which have nothing to do with ✧ Carryover Effects: happened when the test-retest interval is
the attribute being measured, but still affect the scores short, wherein the second test is influenced
GOALS OF RELIABILITY by the first test because they remember or practiced the
● Estimate errors previous test = inflated correlation/overestimation
● Devise techniques to improve testing and reduce of reliability
errors ✧ Practice Effect: scores on the second session are higher
Variance ↝ useful in describing sources of test score variability due to their experience of the first session of
✦ True Variance: variance from true differences testing
✦ Error Variance: variance from irrelevant random sources ✧ test-retest with longer interval might be affected of other
Measurement Error ↝ all of the factors associated with the extreme factors, thus, resulting to low correlation
process of measuring some variable, other than the variable ✧ lower correlation = poor reliability
being measured ✧ Mortality: problems in absences in second session
✦ difference between the observed score and the true score (just remove the first tests of the absents)
✦ Positive: can increase one’s score ✧ Coefficient of Stability
✦ Negative: decrease one’s score ✧ Statistical Tool: Pearson R, Spearman Rho
✦ Sources of Error Variance: Parallel Forms/Alternate Forms Reliability
a. Item Sampling/Content Sampling: refer to variation among Error: Item Sampling (Immediate), Item Sampling changes
items within a test as well as to variation among items between over time (delayed)
tests ✧ established when at least two different versions of the test
✦ The extent to which testtaker’s score is affected by the yield almost the same scores
content sampled on a test and by the way the content is ✧ has the most universal applicability
sampled is a source of error variance ✧ Parallel Forms: each form of the test, the means, and the
b. Test Administration: test takers motivation or attention, variances, are EQUAL; same items, different
environment, etc. positionings/numberings
c. Test Scoring and Interpretation: may employ ✧ Alternate Forms: simply different version of a test that has
objective-type items amenable to computer scoring of been constructed so as to be parallel
well-documented reliability ✧ test should contain the same number of items and the items
should be expressed in the same form and
12
should cover the same type of content; range and difficulty two halves of a test, if each half had been the length of the
must also be equal whole test and have the equal variances
✧ if there is a test leakage, use the form that is not mostly ✧ Spearman-Brown Prophecy Formula: estimates how many
administered more items are needed in order to achieve the target reliability
✧ Counterbalancing: technique to avoid carryover effects for ✧ multiply the estimate to the original number of items
parallel forms, by using different sequence ✧ Rulon’s Formula: counterpart of spearman-brown formula,
for groups which is the ratio of the variance of difference between the odd
✧ can be administered on the same day or different time and even splits and the variance of the total, combined
✧ most rigorous and burdensome, since test developers odd-even, score
create two forms of the test ✧ if the reliability of the original test is relatively low, then
✧ main problem: difference between the two test developer could create new items, clarify test instructions, or
✧ test scores may be affected by motivation, fatigue, or simplifying the scoring rules
intervening events ✧ equal variances, dichotomous scored
✧ means and the variances of the observed scores must be ✧ Statistical Tool: Pearson R or Spearman Rho
equal for two forms Inter-Scorer Reliability
✧ Statistical Tool: Pearson R or Spearman Rho Error: Scorer Differences
Internal Consistency (Inter-Item Reliability) ✧ the degree of agreement or consistency between two or
Error: Item Sampling Homogeneity more scorers with regard to a particular measure
✧ used when tests are administered once ✧ used for coding non behavioral behavior
✧ consistency among items within the test ✧ observer differences
✧ measures the internal consistency of the test which is the ✧ Fleiss Kappa: determine the level between two or more
degree to which each item measures the same raters when the method of assessment is measured on
construct categorical scale
✧ measurement for unstable traits ✧ Cohen’s Kappa: two raters only
✧ if all items measure the same construct, then it has a good ✧ Krippendorff’s Alpha: two or more rater, based on observed
internal consistency disagreement corrected for disagreement expected by chance
✧ useful in assessing Homogeneity ✦ Tests designed to measure one factor (Homogenous) are
✧ Homogeneity: if a test contains items that measure a single expected to have high degree of internal consistency and vice
trait (unifactorial) versa
✧ Heterogeneity: degree to which a test measures different Dynamic ↝ trait, state, or ability presumed to be everchanging
factors (more than one factor/trait) as a function of situational and cognitive experience
✧ more homogenous = higher inter-item consistency Static ↝ barely changing or relatively unchanging
✧ KR-20: used for inter-item consistency of dichotomous items Restriction of range or Restriction of variance ↝ if the
(intelligence tests, personality tests with yes or no options, variance of either variable in a correlational analysis is
multiple choice), unequal variances, dichotomous scored restricted by the sampling procedure used, then the resulting
✧ KR-21: if all the items have the same degree of difficulty correlation coefficient tends to be lower
(speed tests), equal variances, dichotomous scored Power Tests ↝ when time limit is long enough to allow test
✧ Cronbach’s Coefficient Alpha: used when two halves of the takers to attempt all times
test have unequal variances and on tests Speed Tests ↝ generally contains items of uniform level of
containing non-dichotomous items, unequal variances difficulty with time limit
✧ Average Proportional Distance: measure used to evaluate ● Reliability should be based on performance from two
internal consistency of a test that focuses on the degree of independent testing periods using test-retest
differences that exists between item scores ● and alternate-forms or split-half-reliability
Split-Half Reliability Criterion-Referenced Tests ↝ designed to provide an
Error: Item sample: Nature of Split indication of where a testtaker stands with respect
✧ Split Half Reliability is obtained by correlating two pairs of to some variable or criterion
scores obtained from equivalent halves of a ● As individual differences decrease, a traditional
single test administered once measure of reliability would also decrease, regardless
✧ useful when it is impractical or undesirable to assess of the stability of individual performance
reliability with two tests or to administer a test twice Classical Test Theory ↝ everyone has a “true score” on test
✧ cannot just divide the items in the middle because it might ● True Score: genuinely reflects an individual’s ability
spuriously raise or lower the reliability level as measured by a particular test
coefficient, so just randomly assign items or assign ● Random Error
odd-numbered items to one half and even-numbered items to Domain Sampling Theory ↝ estimate the extent to which
the other half specific sources of variation under defined conditions are
✧ Spearman-Brown Formula: allows a test developer of user contributing to the test scores
to estimate internal consistency reliability from a correlation of

13
● Considers problem created by using a limited number Confidence Interval ↝ a range of and of test score that is
of items to represent a larger and more complicated likely to contain true score
construct ● Tells us the relative ability of the true score within the
● Test reliability is conceived of as an objective specified range and confidence level
measure of how precisely the test score assesses the ● The larger the range, the higher the confidence
domain from which the test draws a sample ✦ If the reliability is low, you can increase the number of items
● Generalizability Theory: based on the idea that a or use factor analysis and item analysis to increase internal
person’s test scores vary from testing to testing consistency
because of the variables in the testing situations Reliability Estimates ↝ nature of the test will often determine
● Universe: test situation the reliability metric
● Facets: number of items in the test, amount of review, a) Homogenous (unifactor) or heterogeneous (multifactor)
and the purpose of test administration b) Dynamic (unstable) or static (stable)
● According to Generalizability Theory, given the exact c) Range of scores is restricted or not
same conditions of all the facets in the universe, the d) Speed Test or Power Test
exact same test score should be obtained (Universe e) Criterion or non-Criterion
score) Test Sensitivity ↝ detects true positive
● Decision Study: developers examine the usefulness Test Specificity ↝ detects true negative
of test scores in helping the test user make decisions Base Rate ↝ proportion of the population that actually possess
● Systematic Error the characteristic of interest
Item Response Theory ↝ the probability that a person with X Selection Ratio ↝ no. of available positions compared to the
ability will be able to perform at a level of Y in a test no. of applicants
● Focus: item difficulty Four Possible Hit and Miss Outcomes:
● Latent-Trait Theory 1. True Positives (Sensitivity) ↝ predict success that does
● a system of assumption about measurement and the occur
extent to which item measures the trait 2. True Negatives (Specificity) ↝ predict failure that does occur
● The computer is used to focus on the range of item 3. False Positive (Type 1) ↝ success does not occur
difficulty that helps assess an individual’s ability level 4. False Negative (Type 2) ↝ predicted failure but succeed
● If you got several easy items correct, the computer VALIDITY
will them move to more difficult items ↝ a judgment or estimate of how well a test measures what it
● Difficulty: attribute of not being easily accomplished, supposed to measure
solved, or comprehended ✦ Evidence about the appropriateness of inferences drawn
● Discrimination: degree to which an item differentiates from test scores
among people with higher or lower levels of the trait, ✦ Degree to which the measurement procedure measures the
ability etc. variables to measure
● Dichotomous: can be answered with only one of two Inferences – logical result or deduction
alternative responses ✦ May diminish as the culture or times change
● Polytomous: 3 or more alternative responses ✦ Predicts future performance
Standard Error of Measurement ↝ provide a measure of the ✦ Measures appropriate domain
precision of an observed test score ✦ Measures appropriate characteristics
● Standard deviation of errors as the basic measure of Validation ↝ the process of gathering and evaluating evidence
error about validity
● Index of the amount of inconsistent or the amount of Validation Studies ↝ yield insights regarding a particular
the expected error in an individual’s score population of test takers as compared to the norming sample
● Allows to quantify the extent to which a test provide described in a test manual
accurate scores Internal Validity ↝ degree of control among variables in the
● Provides an estimate of the amount of error inherent study (increased through random assignment)
in an observed score or measurement External Validity ↝ generalizability of the research results
● Higher reliability, lower SEM (increased through random selection)
● Used to estimate or infer the extent to which an Conceptual Validity
observed score deviates from a true score ● focuses on individual with their unique histories and
● Standard Error of a Score behaviors
● Confidence Interval: a range or band of test scores ● Means of evaluating and integrating test data so that
that is likely to contain true scores the clinician’s conclusions make accurate statements
Standard Error of the Difference ↝ can aid a test user in about the examinee
determining how large a difference should be before it is Face Validity
considered statistically significant ● a test appears to measure to the person being tested
Standard Error of Estimate ↝ refers to the standard error of than to what the test actually measures
the difference between the predicted and observed values
14
Content Validity ● logical and statistical
● describes a judgment of how adequately a test ● judgment about the appropriateness of inferences
samples behavior representative of the universe of drawn from test scores regarding individual standing
behavior that the test was designed to sample on variable called construct
● when the proportion of the material covered by the ● Construct: an informed, scientific idea developed or
test approximates the proportion of material covered hypothesized to describe or explain behavior;
in the course unobservable, presupposed traits that may invoke to
● Test Blueprint: a plan regarding the types of describe test behavior or criterion performance
information to be covered by the items, the no. of ● One way a test developer can improve the
items homogeneity of a test containing dichotomous items
● tapping each area of coverage, the organization of the is by eliminating items that do not show significant
items, and so forth correlation coefficients with total test scores
● more logical than statistical ● If it is an academic test and high scorers on the entire
● concerned with the extent to which the test is test for some reason tended to get that particular item
representative of defined body of content consisting wrong while low scorers got it right, then the item is
the topics and processes obviously not a good one
● panel of experts can review the test items and rate ● Some constructs lend themselves more readily than
them in terms of how closely they match the objective others to predictions of change over time
or domain specification ● Method of Contrasted Groups: demonstrate that
● examine if items are essential, useful and necessary scores on the test vary in a predictable way as a
● construct underrepresentation: failure to capture function of membership in a group
important components of a construct ● If a test is a valid measure of a particular construct,
● construct-irrelevant variance: happens when scores then the scores from the group of people who does
are influenced by factors irrelevant to the construct not have that construct would have different test
● Lawshe: developed the formula of Content Validity scores than those who really possesses that construct
Ratio ● Convergent Evidence: if scores on the test
● Zero CVR: exactly half of the experts rate the item as undergoing construct validation tend to highly
essential correlated with another established, validated test that
Criterion Validity measures the same construct
● more statistical than logical ● Discriminant Evidence: a validity coefficient showing
● a judgment of how adequately a test score can be little relationship between test scores and/or other
used to infer an individual’s most probable standing variables with which scores on the test being
on some measure of interestꟷthe measure of interest construct-validated should not be correlated
being criterion ● test is homogenous
● Criterion: standard on which a judgment or decision ● test score increases or decreases as a function of
may be made age, passage of time, or experimental manipulation
● Characteristics: relevant, valid, uncontaminated ● pretest-posttest differences
● Criterion Contamination: occurs when the criterion ● scores differ from groups
measure includes aspects of performance that are not ● scores correlated with scores on other test in
part of the job or when the measure is affected by accordance to what is predicted
“construct-irrelevant” (Messick, 1989) factors that are Factor Analysis ↝ designed to identify factors or specific
not part of the criterion construct variables that are typically attributes, characteristics, or
1. Concurrent Validity: If the test scores obtained at dimensions on which people may differ
about the same time as the criterion measures are ● Developed by Charles Spearman
obtained; economically efficient ● Employed as data reduction method
2. Predictive Validity: measures of the relationship ● Used to study the interrelationships among set of
between test scores and a criterion measure obtained variables
at a future time ● Identify the factor or factors in common between test
3. Incremental Validity: the degree to which an scores on subscales within a particular test
additional predictor explains something about the ● Explanatory FA: estimating or extracting factors;
criterion measure that is not explained by predictors deciding how many factors must be retained
already in use; used to improve the domain ● Confirmatory FA: researchers test the degree to which
● related to predictive validity wherein it is defined as a hypothetical model fits the actual data
● the degree to which an additional predictor explains ● Factor Loading: conveys info about the extent to
something about the criterion measure that is not which the factor determines the test score or scores
explained by predictors already in use ● can be used to obtain both convergent and
Construct Validity (Umbrella Validity) discriminant validity
● covers all types of validity
15
Cross-Validation ↝ revalidation of the test to a criterion based Selection Ratio ↝ numerical value that reflects the relationship
on another group different from the original group form which between the number of people to be hired and the number of
the test was validated people available to be hired
Validity Shrinkage ↝ decrease in validity after cross-validation Base Rate ↝ percentage of people hired under the existing
Co-Validation ↝ validation of more than one test from the system for a particular position
same group ✦ One limitation of Taylor-Russell Tables is that the
Co-Norming ↝ norming more than one test from the same relationship between the predictor (test) and criterion must be
group linear
Bias ↝ factor inherent in a test that systematically prevents Naylor-Shine Tables ↝ entails obtaining the difference
accurate, impartial measurement between the means of the selected and unselected groups to
● Prejudice, preferential treatment derive an index of what the test is adding to already
● Prevention during test development through a established procedures
procedure called Estimated True Score Brogden-Cronbach-Gleser Formula ↝ used to calculate the
Transformation dollar amount of a utility gain resulting from the use of a
Rating ↝ numerical or verbal judgment that places a person or particular selection instrument
an attribute along a continuum identified by a scale of Utility Gain ↝ estimate of the benefit of using a particular test
numerical or word descriptors known as Rating Scale Productivity Gains ↝ an estimated increase in work output
● Rating Error: intentional or unintentional misuse of the High performing applicants may have been offered in other
scale companies as well
● Leniency Error: rater is lenient in scoring (Generosity ✦ The more complex the job, the more people differ on how
Error) well or poorly they do that job
● Severity Error: rater is strict in scoring Cut Score ↝ reference point derived as a result of a judgment
● Central Tendency Error: rater’s rating would tend to and used to divide a set of data into two or more classifications
cluster in the middle of the rating scale Relative Cut Score ↝ reference point based on norms related
● One way to overcome rating errors is to use rankings considerations (norm-referenced); e.g, NMAT
● Halo Effect: tendency to give high score due to failure Fixed Cut Scores ↝ set with reference to a judgment
to discriminate among conceptually distinct and concerning minimum level of proficiency required; e.g., Board
potentially independent aspects of a ratee’s behavior Exams
Fairness ↝ the extent to which a test is used in an impartial, Multiple Cut Scores ↝ refers to the use of two or more cut
just, and equitable way scores with reference to one predictor for the purpose of
✦ Attempting to define the validity of the test will be futile if the categorization
test is NOT reliable Multiple Hurdle ↝ multi-stage selection process, a cut score is
UTILITY in place for each predictor
↝ usefulness or practical value of testing to improve efficiency Compensatory Model of Selection ↝ assumption that high
✦ Can tell us something about the practical value of the scores on one attribute can compensate for lower scores
information derived from scores on the test Angoff Method ↝ setting fixed cut scores
✦ Helps us make better decisions ✦ low interrater reliability
✦ Higher criterion-related validity = higher utility Known Groups Method ↝ collection of data on the predictor
✦ One of the most basic elements in utility analysis is financial of interest from group known to possess and not possess a
cost of the selection device trait of interest
Cost ↝ disadvantages, losses, or expenses both economic ✦ The determination of where to set cutoff score is inherently
and noneconomic terms affected by the composition of contrasting groups
Benefit ↝ profits, gains or advantages IRT-Based Methods ↝ cut scores are typically set based on
✦ The cost of test administration can be well worth it if the testtaker’s performance across all the items on the test
results is certain noneconomic benefits ✦ Item-Mapping Method: arrangement of items in histogram,
Utility Analysis ↝ family of techniques that entail a with each column containing items with deemed to be
cost-benefit analysis designed to yield information relevant to a equivalent value
decision about the usefulness and/or practical value of a tool of ✦ Bookmark Method: expert places “bookmark” between the
assessment two pages that are deemed to separate test takers who have
Expectancy Table ↝ provide an indication that a test taker will acquired the minimal knowledge, skills, and/or abilities from
score within some interval of scores on a criterion measure – those who are not
passing, acceptable, failing Method of Predictive Yield ↝ took into account the number of
Might indicate future behaviors, then if successful, the test is positions to be filled, projections regarding the likelihood of
working as it should offer acceptance, and the distribution of applicant scores
Taylor-Russell Tables ↝ provide an estimate of the extent to Discriminant Analysis ↝ shed light on the relationship between
which inclusion of a particular test in the selection system will identified variables and two naturally occurring groups
improve selection

16
TEST DEVELOPMENT with the stem, avoid ridiculous distractors, not excessively
↝ an umbrella term for all that goes into the process of creating long, “all of the above”, “none of the above” (25%)
a test Effective Distractors ↝ a distractor that was chosen equally
I. Test Conceptualization ↝ brainstorming of ideas about what by both high and low performing groups that enhances the
kind of test a developer wants to publish consistency of test results
✧ stage wherein the ff. is determined: construct, goal, user, Ineffective Distractors ↝ may hurt the reliability of the test
taker, administration, format, response, benefits, costs, because they are time consuming to read and can limit the no.
interpretation of good items
✧ determines whether the test would be norm-referenced or Cute Distractors ↝ less likely to be chosen, may affect the
criterion-referenced reliability of the test bec the test takers may guess from the
Pilot Work/Pilot Study/Pilot Research ↝ preliminary research remaining options
surrounding the creation of a prototype of the test 2. Matching Item ↝ Test taker is presented with two columns:
✧ Attempts to determine how best to measure a targeted Premises and Responses
construct 3. Binary Choice ↝ Usually takes the form of a sentence that
✧ Entail lit reviews and experimentation, creation, revision, requires the testtaker to indicate whether the statement is or is
and deletion of preliminary items not a fact (50%)
II. Test Construction – stage in the process that entails writing Constructed-Response Format ↝ requires test takers to
test items, revisions, formatting, setting scoring rules supply or to create the correct answer, not merely selecting it
✧ it is not good to create an item that contains numerous 1. Completion Item ↝ Requires the examinee to provide a
ideas word or phrase that completes a sentence
Item Pool ↝ reservoir or well from which the items will or will 2. Short-Answer ↝ Should be written clearly enough that the
not be drawn for the final version of the test test taker can respond succinctly, with short answer
Item Banks ↝ relatively large and easily accessible collection 3. Essay ↝ allows creative integration and expression of the
of test questions material
Computerized Adaptive Testing ↝ refers to an interactive, Scaling ↝ process of setting rules for assigning numbers in
computer administered test-taking process wherein items measurement
presented to the test taker are based in part on the test taker's III. Test Tryout ↝ the test should be tried out on people who
performance on previous items are similar in critical respects to the people for whom the test
✧ The test administered may be different for each test taker, was designed
depending on the test performance on the items presented ✧ An informal rule of thumb should be no fewer than 5 and
✧ Reduces floor and ceiling effects preferably as many as 10 for each item (the more, the better)
Floor Effects ↝ occurs when there is some lower limit on a ✧ Risk of using few subjects = phantom factors emerge
survey or questionnaire and a large percentage of respondents ✧ Should be executed under conditions as identical as
score near this lower limit (test takers have low scores) possible
Ceiling Effects ↝ occurs when there is some upper limit on a ✧ A good test item is one that answered correctly by high
survey or questionnaire and a large percentage of respondents scorers as a whole
score near this upper limit (test takers have high scores) Empirical Criterion Keying: administering a large pool of test
Item Branching ↝ ability of the computer to tailor the content items to a sample of individuals who are known to differ on the
and order of presentation of items on the basis of responses to construct being measured
previous items Item Analysis statistical procedure used to analyze items,
Item Format ↝ form, plan, structure, arrangement, and layout evaluate test items
of individual test items Discriminability Analysis employed to examine correlation
Dichotomous Format ↝ offers two alternatives for each item between each item and the total score of the test
Polychotomous Format each item has more than two Item ↝ suggest a sample of behavior of an individual
alternatives Table of Specification ↝ a blueprint of the test in terms of
Category Format ↝ a format where respondents are asked to number of items per difficulty, topic importance, or taxonomy
rate a construct ✧ Guidelines for Item writing: Define clearly what to measure,
1. Checklist ↝ subject receives a longlist of adjectives and generate item pool, avoid long items, keep the level of reading
indicates whether each one if characteristic of himself or difficulty appropriate for those who will complete the test, avoid
herself double-barreled items, consider making positive and negative
2. Guttman Scale ↝ items are arranged from weaker to worded items
stronger expressions of attitude, belief, or feelings Double-Barreled Items ↝ items that convey more than one
Selected-Response Format ↝ require test takers to select ideas at the same time
response from a set of alternative responses Item Difficulty ↝ defined by the number of people who get a
1. Multiple Choice ↝ Has three elements: stem (question), a particular item correct
correct option, and several incorrect alternatives (distractors or Item-Difficulty Index ↝ calculating the proportion of the total
foils), Should’ve one correct answer, has grammatically parallel number of test takers who answered the item correctly; The
alternatives, similar length, alternatives that fit grammatically larger, the easier the item
17
Item-Endorsement Index for personality testing ↝ Differential Item Functioning ↝ item functions differently in
percentage of individual who endorsed an item in a personality one group of test takers known to have the same level of the
test underlying trait
✧ The optimal average item difficulty is approx. 50% with DIF Analysis ↝ test developers scrutinize group by group item
items on the testing ranging in difficulty from about 30% to 80% response curves looking for DIF Items
Omnibus Spiral Format ↝ items in an ability are arranged DIF Items ↝ items that respondents from different groups at
into increasing difficulty the same level of underlying trait have
Item-Reliability Index ↝ provides an indication of the internal different probabilities of endorsing a function of their group
consistency of a test membership
✧ The higher Item-Reliability index, the greater the test’s Computerized Adaptive Testing ↝ refers to an interactive,
internal consistency computer administered test-taking process wherein items
Item-Validity Index ↝ designed to provide an indication of the presented to the test taker are based in part on the test taker's
degree to which a test is measure what it purports to measure performance on previous items
✧ The higher Item-Validity index, the greater the test’s ✧ The test administered may be different for each test taker,
criterion-related validity depending on the test performance on the items presented
Item-Discrimination Index ↝ measure of item discrimination; ✧ Reduces floor and ceiling effects
measure of the difference between the proportion of high Floor Effects ↝ occurs when there is some lower limit on a
scorers answering an item correctly and the proportion of low survey or questionnaire and a large percentage of respondents
scorers answering the item correctly score near this lower limit (test takers have low scores)
Extreme Group Method ↝ compares people who have done Ceiling Effects ↝ occurs when there is some upper limit on a
well with those who have done poorly survey or questionnaire and a large percentage of respondents
Discrimination Index ↝ difference between these proportion score near this upper limit (test takers have high scores)
Point-Biserial Method ↝ correlation between a dichotomous Item Branching: ability of the computer to tailor the content and
variable and continuous variable order of presentation of items on the basis of responses to
Item-Characteristic Curve ↝ graphic representation of item previous items
difficulty and discrimination Routing Test ↝ subtest used to direct or route the testtaker to
Guessing ↝ one that eluded any universally accepted a suitable level of items
solutions Item-Mapping Method ↝ setting cut scores that entails a
✧ Item analyses taken under speed conditions yield historiographic representation of items and
misleading or uninterpretable results expert judgments regarding item effectiveness
✧ Restrict item analysis on a speed test only to the items Basal Level ↝ the level of which a the minimum criterion
completed by the testtaker number of correct responses is obtained
✧ Test developer ideally should administer the test to be Computer Assisted Psychological Assessment ↝
item-analyzed with generous time limits to complete the test standardized test administration is assured for test takers and
Scoring Items/Scoring Models variation is kept to a minimum
1. Cumulative Model – testtaker obtains a measure of the ✧ Test content and length is tailored according to the taker’s
level of the trait; thus, high scorers may suggest high level in ability
the trait being measured INTELLIGENCE TESTS
2. Class Scoring/Category Scoring – test taker response Stanford-Binet Intelligence Scale 5th Ed. (SB-5) [C]
earn credit toward placement in a particular ✦ 2-85 years old
class or category with other testtaker whose pattern of ✦ individually administered
responses is similar in some way ✦ norm-referenced
3. Ipsative Scoring – compares test takers score on one scale ✦ Scales: Verbal, Nonverbal, and Full Scale (FSIQ)
within a test to another scale within that same test, two ✦ Nonverbal and Verbal Cognitive Factors: Fluid Reasoning,
unrelated constructs Knowledge, Quantitative Reasoning, Visual-Spatial
IV. Test Revision – characterize each item according to its Processing, Working Memory
strength and weaknesses ✦ age scale and point-scale format
✧ As revision proceeds, the advantage of writing a large item ✦ originally created to identify mentally disabled children in
pool becomes more apparent because some items were Paris
removed and must be replaced by the items in the item pool ✦ 1908 Scale introduced Age Scale format and Mental Age
✧ Administer the revised test under standardized conditions to ✦ 1916 scale significantly applied IQ concept
a second appropriate sample of examinee ✦ Standard Scores: 100 (mean), 15 (SD)
Anchor Protocol ↝ test protocol scored by highly authoritative ✦ Scaled Scores: 10 (mean), 3 (SD)
scorer that is designed as a model for ✦ co-normed with Bender-Gestalt and WoodcockJohnson
scoring and a mechanism for resolving scoring discrepancies Tests
Scoring Drift ↝ discrepancy between scoring in an anchor ✦ based on Cattell-Horn-Carroll Model of General Intellectual
protocol and the scoring of another protocol Ability
✦ no accommodations for PWDs
18
✦ 2 routing tests ✦ Alan & Nadeen Kaufman
✦ w/ teaching items, floor level, and ceiling level ✦ for assessing cognitive development in children
✦ provides behavioral observations during administration ✦ 13 to 18 years old
Wechsler Intelligence Scales (WAIS-IV, WPPSI-IV, WISC-V) PERSONALITY TESTS
[C] Minnesota Multiphasic Personality Inventory (MMPI-2) [C]
✦ WAIS (16-90 years old), WPPSI (2-6 years old), WISC ✦ Multiphasic personality inventory intended for used with
(6-11) both clinical and normal populations to identify sources of
✦ individually administered maladjustment and personal strengths
✦ norm-referenced ✦ Starke Hathaway and J. Charnley McKinley
✦ Standard Scores: 100 (mean), 15 (SD) ✦ Help in diagnosing mental health disorders, distinguishing
✦ Scaled Scores: 10 (mean), 3 (SD) normal from abnormal
✦ addresses the weakness in Stanford-Binet ✦ should be administered to someone with no guilt feelings for
✦ could also assess functioning in people with brain injury creating a crime
✦ evaluates patterns of brain dysfunction ✦ individual or by groups
✦ yields FSIQ, Index Scores (Verbal Comprehension, ✦ Clinical Scales: Hypochondriasis, Depression, Hysteria,
Perceptual Reasoning, Working Memory, and Processing Psychopathic Deviate, Masculinity/Femininity, Paranoia,
Speed), and subtest-level scaled scores Psychasthenia (Anxiety, Depression, OCD), Schizophrenia,
Raven’s Progressive Matrices (RPM) [B] Hypomania, Social Introversion
✦ 4 – 90 years old ✦ Lie Scale (L Scale): items that are somewhat negative but
✦ nonverbal test apply to most people; assess the likelihood of the test taker to
✦ used to measure general intelligence & abstract reasoning approach the instrument with defensive mindset
✦ multiple choice of abstract reasoning ✦ High in L scale = faking good
✦ group test ✦ F Scale: assess an individual’s tendency to endorse
✦ IRT-Based uncommon symptoms or level distress/dysfunction in certain
Culture Fair Intelligence Test (CFIT) [B] populations
✦ Nonverbal instrument to measure your analytical and ✦ High in F scale = faking bad, severe distress or
reasoning ability in the abstract and novel situations psychopathology
✦ Measures individual intelligence in a manner designed to ✦ Superlative Self Presentation Scale (S Scale): a measure of
reduced, as much as possible, the influence of culture defensiveness; Superlative SelfPresentation to see if you
✦ Individual or by group intentionally distort answers to look better
✦ Aids in the identification of learning problems and helps in ✦ Correction Scale (K Scale): reflection of the frankness of the
making more reliable and informed decisions in relation to the test taker's self-report
special education needs of children ✦ K Scale = reveals a person’s defensiveness around certain
Purdue Non-Language Test [B] questions and traits; also faking good
✦ Designed to measure mental ability, since it consists entirely ✦ K scale sometimes used to correct scores on five clinical
of geometric forms scales. The scores are statistically corrected for an individual’s
✦ Culture-fair over willingness or unwillingness to admit deviance
✦ Self-Administering ✦ “Cannot Say” (CNS) Scale: measures how a person doesn’t
Panukat ng Katalinuhang Pilipino answer a test item
✦ Basis for screening, classifying, and identifying needs that ✦ High ? Scale: client might have difficulties with reading,
will enhance the learning process psychomotor retardation, or extreme defensiveness
✦ In business, it is utilized as predictors of occupational ✦ True Response Inconsistency (TRIN): five true, then five
achievement by gauging applicant’s ability and fitness for a false answers
particular job ✦ Varied Response Inconsistency (VRIN): random true or
✦ Essential for determining one’s capacity to handle the false
challenges associated with certain degree programs ✦ Infrequency-Psychopathology Scale (Fp): reveal intentional
✦ Subtests: Vocabulary, Analogy, Numerical Ability, Nonverbal or unintentional over-reporting
Ability ✦ FBS Scale: “symptom validity scale” designed to detect
Wonderlic Personnel Test (WPT) intentional over-reporting of symptoms
✦ Assessing cognitive ability and problem-solving aptitude of ✦ Back Page Infrequency (Fb): reflects significant change in
prospective employees the testtaker’s approach to the latter part of the test
✦ Multiple choice, answered in 12 minutes Myers-Briggs Type Indicator (MBTI)
Armed Services Vocational Aptitude Battery ✦ Katherine Cook Briggs and Isabel Briggs Myers
✦ Most widely used aptitude test in US ✦ Self-report inventory designed to identify a person’s
✦ Multiple-aptitude battery that measures developed abilities personality type, strengths, and preferences
and helps predict future academic and occupational success in ✦ Extraversion-Introversion Scale: where you prefer to focus
the military your attention and energy, the outer world and external events
Kaufman Assessment Battery for Children-II (KABC-II) or your inner world of ideas and experiences
19
✦ Sensing-Intuition Scale: how do you take inform, you take in ✦ for commercial purposes to researchers and students
or focus on interpreting and adding meaning on the information
✦ Thinking-Feeling Scale: how do you make decisions, logical PROJECTIVE TESTS
or following what your heart says Rorschach Inkblot Test [C]
✦ Judging-Perceiving Scale: how do you orient the outer ✦ Hermann Rorschach
world? What is your style in dealing with the outer world – get ✦ 5 years and older
things decided or stay open to new info and options? ✦ subjects look at 10 ambiguous inkblot images and describe
Edwards Preference Personality Schedule (EPPS) [B] what they see in each one
✦ designed primarily as an instrument for research and ✦ once used to diagnose mental illnesses like schizophrenia
counseling purposes to provide quick and convenient ✦ Exner System: coding system used in this test
measures of a number of relatively normal personality ✦ Content: the name or class of objects used in the patient’s
variables responses
✦ based of Murray’s Need Theory Content:
✦ Objective, forced-choice inventory for assessing the relative 1. Nature
importance that an individual places on 15 personality 2. Animal Feature
variables 3. Whole Human
✦ Useful in personal counseling and with non-clinical adults 4. Human Feature
✦ Individual 5. Fictional/Mythical Human Detail
Guilford-Zimmerman Temperament Survey (GZTS) 6. Sex
✦ items are stated affirmatively rather than in question form, Determinants:
using the 2nd person pronoun 1. Form
✦ measures 10 personality traits: General Activity, Restraint, 2. Movement
Ascendance, Sociability, Emotional Stability, Objectivity, 3. Color
Friendliness, Thoughtfulness, Personal Relations, Masculinity 4. Shading
NEO Personality Inventory (NEO-PI-R) 5. Pairs and Reflections
✦ Standard questionnaire measure of the Five Factor Model, Location:
provides systematic assessment of emotional, interpersonal, 1. W – the whole inkblot was used to depict an image
experiential, attitudinal, and motivational styles 2. D – commonly described part of the blot was used
✦ gold standard for personality assessment 3. Dd – an uncommonly described or unusual detail was used
✦ Self-Administered 4. S – the white space in the background was used
✦ Neuroticism: identifies individuals who are prone to Thematic Apperception Test [C]
psychological distress ✦ Christiana Morgan and Henry Murray
✦ Extraversion: quantity and intensity of energy directed ✦ 5 and above
✦ Openness To Experience: active seeking and appreciation ✦ 31 picture cards serve as stimuli for stories and descriptions
of experiences for their own sake about relationships or social situations
✦ Agreeableness: the kind of interactions an individual prefers ✦ popularly known as the picture interpretation technique
from compassion to tough mindedness because it uses a standard series of provocative yet
✦ Conscientiousness: degree of organization, persistence, ambiguous pictures about which the subject is asked to tell a
control, and motivation in goal-directed behavior story
Panukat ng Ugali at Pagkatao/Panukat ng Pagkataong ✦ also modified African American test takers
Pilipino Children’s Apperception Test
✦ Indigenous personality test ✦ Bellak & Bellak
✦ Tap specific values, traits and behavioral dimensions related ✦ 3-10 years old
or meaningful to the study of Filipinos ✦ based on the idea that animals engaged in various activities
Sixteen Personality Factor Questionnaire were useful in stimulating projective storytelling by children
✦ Raymond Cattell Hand Test
✦ constructed through factor analysis ✦ Edward Wagner
✦ Evaluates a personality on two levels of traits ✦ 5 years old and above
✦ Primary Scales: Warmth, Reasoning, Emotional Stability, ✦ used to measure action tendencies, particularly acting out
Dominance, Liveliness, Rule Consciousness, Social Boldness, and aggressive behavior, in adults and children
Sensitivity, Vigilance, Abstractedness, Privateness, ✦ 10 cards (1 blank)
Apprehension, Openness to change, Self-Reliance, Apperceptive Personality Test (APT)
Perfectionism, Tension ✦ Holmstrom et. Al.
✦ Global Scales: Extraversion, Anxiety, ToughMindedness, ✦ attempt to address the criticisms of TAT
Independence, Self-Control ✦ introduced objectivity in scoring system
Big Five Inventory-II (BFI-2) ✦ 8 cards include male and female of different ages and
✦ Soto & John minority group members
✦ Assesses big 5 domains and 15 facets
20
✦ test takers will respond to a series of multiple choice CLINICAL AND COUNSELING TESTS
questions after storytelling Millon Clinical Multiaxial Scale-IV (MCMI-IV)
Word Association Test (WAT) ✦ Theodore Millon
✦ Rapaport et. Al. ✦ 18 years old and above
✦ presentation of a list of stimulus words, assessee responds ✦ for diagnosing and treatment of personality disorders
verbally or in writing the first thing that comes into their minds ✦ exaggeration of polarities results to maladaptive behavior
Rotter Incomplete Sentences Blank (RISB) ✦ Pleasure-Pain: the fundamental evolutionary task
✦ Julian Rotter & Janet Rafferty ✦ Active-Passive: one adapts to the environment or adapts
✦ Grade 9 to Adulthood the environment to one’s self
✦ most popular SCT ✦ Self-Others: invest to others versus invest to oneself
SACK’s Sentence Completion Test (SSCT) Beck Depression Inventory (BDI-II)
✦ Joseph Sacks and Sidney Levy ✦ Aaron Beck
✦ 12 years old and older ✦ 13 to 80 years old
✦ asks respondents to complete 60 questions with the first ✦ 21-item self-report that taps Major Depressive symptoms
thing that comes to mind across four areas: Family, Sex, accdg. to the criteria in the DSM
Interpersonal, Relationships and Self concept MacAndrew Alcoholism Scale (MAC & MAC-R)
Bender-Gestalt Visual Motor Test [C] ✦ from MMPI-II
✦ Lauretta Bender ✦ Personality & attitude variables thought to underlie
✦ 4 years and older alcoholism
✦ consists of a series of durable template cards, each California Psychological Inventory (CPI-III)
displaying a unique figure, then they are asked to draw each ✦ attempts to evaluate personality in normally adjusted
figure as he or she observes it individuals
✦ provides interpretative information about an individual’s ✦ has validity scales that determines faking bad and faking
development and neuropsychological functioning good
✦ reveals the maturation level of visuomotor perceptions, ✦ interpersonal style and orientation, normative orientation
which is associated with language ability and various functions and values, cognitive and intellectual function, and role and
of intelligence personal style
House-Tree-Person Test (HTP) ✦ has special purpose scales, such as managerial potential,
✦ John Buck and Emmanuel Hammer work orientation, creative temperament, leadership potential,
✦ 3 years and up amicability, law enforcement orientation, tough-mindedness
✦ measures aspects of a person’s personality through Rosenberg Self-Esteem Scale
interpretation of drawings and responses to questions ✦ measures global feelings of self-worth
✦ can also be used to assess brain damage and general ✦ 10-item, 4 point likert scale
mental functioning ✦ used with adolescents
✦ measures the person’s psychological and emotional Dispositional Resilience Scale (DRS)
functioning ✦ measures psychological hardiness defined as the ability to
✦ The house reflects the person’s experience of their view stressful situations as meaningful, changeable, and
immediate social world challenging
✦ The tree is a more direct expression of the person’s Ego Resiliency Scale-Revised
emotional and psychological sense of self ✦ measure ego resiliency or emotional intelligence
✦ The person is a more direct reflection of the person’s sense HOPE Scale
of self ✦ developed by Snyder
Draw-A-Person Test (DAP) ✦ Agency: cognitive model with goal driven energy
✦ Florence Goodenough ✦ Pathway: capacity to contrast systems to meet goals
✦ 4 to 10 years old ✦ good measure of hope for traumatized people
✦ a projective drawing task that is often utilized in ✦ positively correlated with health psychological adjustment,
psychological assessments of children high achievement, good problem solving skills, and positive
✦ Aspects such as the size of the head, placement of the health-related outcomes
arms, and even things such as if teeth were drawn or not are Satisfaction with Life Scale (SWLS)
thought to reveal a range of personality traits ✦ overall assessment of life satisfaction as a cognitive
✦ Helps people who have anxieties taking tests (no strict judgmental process
format) Positive and Negative Affect Schedule (PANAS)
✦ Can assess people with communication problems ✦ measure the level of positive and negative emotions a test
✦ Relatively culture free taker has during the test administration
✦ Allow for self-administration CLINICAL AND COUNSELING TESTS
Kinetic Family Drawing 1. Flynn Effect ↝ progressive rise in intelligence score that is
✦ Burns & Kaufman expected to occur on a normed intelligence test from the date
✦ derived from Hulses’ FDT “doing something” when the test was first normed
21
✧ Gradual increase in the general intelligence among
newborns
✧ Frog Pond Effect: theory that individuals evaluate
themselves as worse when in a group of high-performing
individuals
2. Culture Bias of Testing
✧ Culture-Free: attempt to eliminate culture so
nature can be isolated
✧ Impossible to develop bec culture is evident in its
influence since birth or an individual and the
interaction between nature and nurture is
cumulative and not relative
✧ Culture Fair: minimize the influence of culture
with regard to various aspects of the evaluation
procedures
✧ Fair to all, fair to some cultures, fair only to one
culture
✧ Culture Loading: the extent to which a test
incorporates the vocabulary concepts traditions,
knowledge etc. with particular culture
ERRORS DUE TO BEHAVIORAL ASSESSMENT
1. Reactivity – when evaluated, the behavior increases
✧ Hawthorne Effect
2. Drift – moving away from what one has learned going to
idiosyncratic definitions of behavior
✧ subjects should be retrained in a point of time
✧ Contrast Effect: cognitive bias that distorts our perception of
something when we compare it to something else, by
enhancing the differences between them
3. Expectancies – tendency for results to be influenced by
what test administrators expect to find
✧ Rosenthal/Pygmalion Effect: Test administrator’s expected
results influences the result of the test
✧ Golem Effect: negative expectations decreases one’s
performance
4. Rating Errors – intentional or unintentional misuse of the
scale
✧ Leniency Error: rater is lenient in scoring (Generosity Error)
✧ Severity Error: rater is strict in scoring
✧ Central Tendency Error: rater’s rating would tend to cluster
in the middle of the rating scale
✧ Halo Effect: tendency to give high score due to failure to
discriminate among conceptually distinct and potentially
independent aspects of a ratee’s behavior
✧ snap judgment on the basis of positive trait
✧ Horn Effect: Opposite of Halo Effect
✧ One way to overcome rating errors is to use rankings
5. Fundamental Attribution Error – tendency to explain
someone’s behavior based on internal factors such as
personality or disposition, and to underestimate the influence
the external factors have on another person’s behavior,
blaming it on the situation
✧ Barnum Effect: people tend to accept vague personality
descriptions as accurate descriptions of themselves (Aunt
Fanny Effect)

22

You might also like