Unit 2
Unit 2
Unit 2
Test Construction: Item Construction, Item Analysis, develop test administration, norms,
scoring and Interpretation of the tests; Tester’s Bias and Extraneous Factors.
TEST CONSTRUCTION
Attention must be given to the below mentioned points while constructing a potent,
constructive and relevant questionnaire/schedule:
• The researcher must first define the problem that s/he wants to examine, as it will lay the
foundation of the questionnaire. There must be a complete clarity about the various facets of
the research problem that will be encountered as the research progresses.
• The correct formulation of questions is dependent on the kind of information the researcher
seeks, the objective of analysis and the respondents of the schedule/questionnaire. Whether to
use open ended or close ended questions should be decided by the researcher. They should be
uncomplicated and made with such a view that there will be an objective part of a calculated
tabulation plan.
• A researcher must prepare a rough draft of the schedule while giving ample thought to the
sequence in which s/he wants to place the questions. Previous examples of such
questionnaires can also be observed at this stage.
• A researcher by default should recheck and if required make changes in the rough draft to
improve the same. Technical discrepancies should be examined in detail and changed
accordingly.
• There should be a pre-testing done through a pilot study and changes should be made to the
questionnaire if required.
• The questions should be easy to understand the directions to fill up the questionnaire
clearly mentioned; this should be done to avoid any confusion.
The primary objective of developing a tool is obtaining a set of data that is accurate,
trustworthy and authentic so as to enable the researcher in gauging the current situation
correctly and reaching conclusions that can provide executable suggestions. But, no tool is
absolutely accurate and valid, thus, it should carry a declaration that clearly mentions its
reliability and validity.
A researcher evaluating a new mathematics curriculum, for example, might (a) desire a test
that could show changes over time in mathematics skills, (b) assign a score of 1 to each math
item correctly scored, (c) create a table of specifications indicating what kind of skills would
be expected to be acquired, (d) run a study to determine which items were sensitive to
change, and (e) repeat the process with the selected items with a new group of students.
Standardization refers to the consistency of processes and procedures that are used for
conducting and scoring of a test. To compare the scores of different individuals the
conditions should be the same. In case of a new step the first and major step in
standardization is formulating the directions. This also includes the type of materials to be
used, verbal instructions, time to be taken, the way to handle questions by test takers and all
other minute details of a testing environment. Establishing the norms is also a key step for
standardization. Norm refers to the average performance. To standardize a test, we administer
it to a big, representative sample of the kind of individuals it was designed for. The
aforementioned group sets the norms and is called the standardization sample. The norms for
personality tests are set in the same way as those set for aptitude tests. For both, the norm
would refer to the performance of average individuals. To construct and administer a test,
standardization is a very important. The test is administered on a large set number of the
people (the conditions and guidelines need to be the same for all). After which the scores are
modified using Percentile rank, Z-score, T-score and Stanine, etc. The standardization of a
test can be established from this modified score. Hence, “standardization is a process of
ensuring that a test is standardized, (Osadebe, 2001)”. There are lots of advantages when a
test is standardized. A standard test is usually produced by experts and it is better than teacher
made test. The standardized test is highly valid, reliable and normalized with Percentile rank,
Z-score, T-score among scores derived from others to produce age norm, sex norm, location
norm and school-type norm. Generally, a standardized test could be used to assess, and
compare students in the same norming group. The normal process for administering
standardization includes:
Criterion referenced Testing: It is used for measuring the real knowledge of a certain topic.
For example: Multiple choice questions in a geography quiz.
Steps for Constructing Standardized Tests:
A carefully constructed test where the scoring, administration and interpretation of result
follows a uniform process can be termed as a standardized test. Following are the steps that
can be followed to construct a standardised test:
Steps
1) Plan for the test.
2) Preparation of the test.
3) Trial run of the test.
4) Checking the Reliability and Validity of the test.
5) Prepare the norms for the test.
6) Prepare the manual of the test and reproducing the test.
3) Preliminary Administration – After modifying the items as per the advise of the experts
the test can be tried out on experimental basis, which is done to prune out any inadequacy or
weakness of the item. It highlights ambiguous items, irrelevant choices in multiple choice
questions, items that are very difficult or easy to answer. Also the time duration of the test
and number of items that are to be kept in the final test can be ascertained, this avoids
repetition and vagueness in the instructions. This is done in following three stages:
c) Final try-out – It is administered on a large sample in order to estimate the reliability and
validity. It provides an indication to the effectiveness of the test when the intended sample is
subjected to it.
4) Reliability and Validity of the test – When test is finally composed, the final test is again
administered on a fresh sample in order to compute the reliability coefficient. This time also
sample should not be less than 100. Reliability is calculated through test-retest method, split-
half method and the equivalent -form method. Reliability shows the consistency of test
scores. Validity refers to what the test measures and how well it measures. If a test measures
a trait that it intends to measure well then the test can be said to be a valid one. It is
correlation of test with some outside independent criterion.
5) Norms of the final test – Test constructor also prepares norms of the test. Norms are
defined as average performance scores. They are prepared to meaningfully interpret the
scores obtained on the test. The obtained scores on test themselves convey no meaning
regarding the ability or trait being measured. But when these are compared with norms, a
meaningful inference can be immediately drawn. .
The norms may be age norms, grade norms etc. as discussed earlier. Similar norms cannot be
used for all tests.
6) Preparation of manual and reproduction of the test – The manual is prepared as the last
step and the psychometric properties of the test norms and references are reported. It provides
in detail the process to administer the test, its duration and scoring technique. It also contains
all instructions for the test.
Item Construction
Test item construction is the process of designing and developing questions or prompts used
in psychological assessments to measure specific psychological constructs such as cognitive
abilities, personality traits, or emotional states. Properly constructed test items are crucial for
ensuring the validity, reliability, and fairness of psychological tests. The quality of these
items directly impacts the accuracy of the test results and the conclusions drawn from them.
8. Ethical Considerations:
o Confidentiality: Ensure that responses are kept confidential and used only for
their intended purpose.
o Informed Consent: Obtain consent from participants, clearly explaining the
purpose of the test and how the data will be used.
Item analysis
Item analysis is a procedure by which we analyse the items to judge their suitability or
unsuitability for inclusion in the test. As we know, the quality
or merit of a test depends upon the individual items which constitute it. So only those items
which suit our purpose are to be retained. Item analysis is an integral part of the reliability
and validity of a test. The worth of an item is judged from three main angles viz.
a) Item difficulty
When an item is too easy, all the students would answer it. If it is too hard, nobody would
answer it. What is the use of having such items in a test? If all the students get equal scores,
the very purpose of the test (i.e. to assess the ability of students) is defeated. So it is clear that
too easy and too difficult items are to be totally discarded. It is desirable that items of a
medium difficulty level must be included in a test. Item difficulty is calculated by different
methods.
Method 1: Item difficulty (I.D.) is calculated by using the formula. ID = X 100 where R = no.
of testees answering correctly, and N = Total no. of testees. If in a test administered to 50
pupils an item is passed by (i.e. correctly marked by) 35 students the I.D = X 100 = 70 Here,
we understand that the item is easy. In essence, if the I.D. value is more, the item is easy and
if the I.D. value is less than the item is considered to be difficult. Usually I.D. values in
between 16 and 84 (or 15 to 85) are retained.
(b) Discriminating index: To be considered good, an item must have discriminating power.
For example, if an item is too easy or too difficult to all the testees, it can’t discriminate
between individuals. Logically, it is expected that a majority of students of a better standard
and a few students of lower standard will answer an item correctly. Thus, an item must
discriminate between persons of the high group and the low group. In other words,
WL = Number of persons in the lower group (i.e. 27% of N) who have wrongly answer an
item or omitted it.
WH = Number of persons in the higher group who have wrongly answered an item or
omitted it.
It is expected that WL will be always more than WH i.e., WL - WH will always be positive.
If WH is more than WL the item is either ambiguous and it is to be totally rejected.
Statistical methods are used to determine the internal consistency of items. Biserial
correlation gives the correlation of an item with its sub-test scores and with total test-scores.
This is the process of establishing internal validity. There are also other methods of assessing
internal consistency of items and as they are beyond the scope of our present purpose, we
have not discussed them here.
Item-total correlation is a critical concept in item analysis, particularly in classical test theory.
Definition: The item-total correlation is the correlation between a single test item’s score and
the total score of the test (excluding the item in question). It measures the relationship
between how a person scores on a specific item and their overall performance on the test.
Purpose: This statistic helps determine how well each item on a test contributes to the overall
measurement of the construct. High item-total correlations indicate that the item is a good
measure of the underlying construct being tested, as it is consistent with the total test score.
1. Score Calculation:
- For each test-taker, calculate the total score of the test, excluding the item of interest. This
gives the total score that excludes the specific item.
2. Correlation Computation:
- Compute the Pearson correlation coefficient between the item scores and the total scores
(excluding the item). This can be done using statistical software or formulas.
- High Correlation: A high positive correlation (e.g., above 0.30) suggests that the item is
consistent with the overall test score, meaning it contributes well to measuring the construct
and is aligned with the other items on the test.
- Low or Negative Correlation: A low or negative correlation indicates that the item may not
be measuring the same construct as the other items or may be poorly worded or ambiguous.
This could signal a need for revision or removal of the item.
- Item Selection: Items with high item-total correlations are typically retained in the test as
they contribute effectively to the test's overall reliability.
- Item Revision: Items with low correlations might be revised or removed to improve the
test’s consistency and overall validity.
Considerations:
- Contextual Factors: Ensure that the item-total correlation is interpreted in the context of the
test’s purpose and content. Sometimes, an item with a lower correlation might still be
valuable for assessing certain aspects of the construct.
In summary, item-total correlation is a useful statistic in item analysis for evaluating how
well individual items contribute to the overall test construct. It provides insights into the
effectiveness of each item and helps guide decisions in test development and refinement.
a. Standardization
Administering tests under uniform conditions to ensure fairness and comparability.
- Importance:
- Consistency: Reduces variability in test conditions.
- Reliability: Enhances test consistency by minimizing sources of error.
-Validity: Ensures the test measures what it is intended to measure under consistent
conditions.
- Implementation:
- Follow a standardized administration protocol.
- Train all personnel involved in the administration.
- Document all procedures and conditions.
b. Instructions
Clear, concise, and consistent guidance provided to test-takers.
- Importance:
- Clarity: Ensures test-takers understand what is required.
- Fairness: Provides all test-takers with an equal understanding of test expectations.
- Implementation:
- Use straightforward language.
- Deliver instructions consistently.
- Provide examples if needed.
c. Environment
- Definition: The physical and situational conditions where the test is administered.
- Importance:
- Minimize Distractions: Helps maintain focus and reduces variability.
- Comfort: Supports concentration and reduces test-taker stress.
- Implementation:
- Choose a quiet, controlled location.
- Ensure comfortable seating and functional equipment.
- Maintain appropriate lighting and temperature.
2. Practical Skills
a. Setting Up
Preparing materials and equipment for test administration.
Importance:
- Readiness: Ensures all materials are available and in working order.
- Efficiency: Facilitates smooth administration.
- Implementation:
- Use a checklist to verify all materials.
- Check and prepare any necessary equipment.
- Organize materials systematically.
b. Timing
Managing and adhering to time limits during the test.
- Importance:
- Fairness: Provides equal time for all test-takers.
- Accuracy: Measures performance under timed conditions.
- Implementation:
- Use reliable timing devices.
- Clearly communicate time limits.
- Monitor and manage time during the test.
- Handle any time-related issues consistently.
c. Handling Issues
- Definition: Addressing technical problems or unexpected situations during administration.
- Importance:
- Adaptability: Ensures test continuity despite issues.
- Fairness: Minimizes disruption and potential impact on performance.
- Implementation:
- Develop contingency plans for common problems.
- Document issues and resolutions.
- Provide immediate assistance to test-takers as needed.
3. Ethical Considerations
a. Confidentiality
Protecting test-taker privacy and data.
- Importance:
- Trust: Builds confidence in the testing process.
- Compliance: Meets legal and ethical standards for data protection.
- Implementation:
- Securely store test materials and results.
- Limit access to authorized personnel.
- Follow data protection laws and guidelines.
b. Informed Consent
Providing test-takers with information about the test and obtaining their consent.
- Importance:
- Transparency: Ensures test-takers understand the test and its use.
- Autonomy: Respects the right to make informed decisions.
-Implementation:
- Disclose the purpose, procedures, and potential risks of the test.
- Obtain written or electronic consent.
- Allow test-takers to ask questions and address concerns.
NORMS
Norm refers to the typical performance level for a certain group of individuals. Any
psychological test with just the raw score is meaningless until it is supplemented by
additional data to interpret it further. Therefore, the cumulative total of a psychological test is
generally inferred through referring to the norms that depict the score of the standardized
sample. Norms are factually demonstrated by establishing the performance of individuals
from a specific group in a test. To determine accurately a subject’s (individual’s) position
with respect to the standard sample, the raw score is transformed into a relative measure.
There are two purposes of this derived score: 1) They provide an indication to the individuals
standing in relation to the normative sample and help in evaluating the performance. 2) To
give measures that can be compared and allow gauging of individuals performance on
various tests.
Types of Norms Fundamentally, norms are expressed in two ways, developmental norms and
within group norms.
1) Developmental Norms These depict the normal developmental path for an individual’s
progression. They can be very useful in providing description but are not well suited for
accurate statistical purpose. Developmental norms can be classified as mental age norms,
grade equivalent norms and ordinal scale norms.
2) Within Group Norms This type of norm is used for comparison of an individual’s
performance to the most closely related groups’ performance. They carry a clear and well
defined quantitative meaning which can be applied to most statistical analysis.
a) Percentiles (P(n) and PR): They refer to the percentage of people in a standardized sample
that are below a certain set of score. They depict an individual’s position with respect to the
sample. Here the counting begins from bottom, so the higher the percentile the better the
rank. For example if a person gets 97 percentile in a competitive exam, it means 97% of the
participants have scored less than him/her.
b) Standard Score: It signifies the gap between the individuals score and the mean depicted as
standard deviation of the distribution. It can be derived by linear or nonlinear transformation
of the original raw scores. They are also known as T and Z scores.
c) Age Norms: To obtain this, we take the mean raw score gathered from all in the common
age group inside a standardized sample. Hence, the 15 year norm would be represented and
be applicable by the mean raw score of students aged 15 years.
d) Grade Norms: It is calculated by finding the mean raw score earned by students in a
specific grade
Scoring:
Interpretation:
Tester’s Bias
1. Types of Bias:
o Cultural Bias: Test content may favor one cultural group over another. This
can affect the performance of individuals from different backgrounds.
o Confirmation Bias: Scorers might unconsciously look for evidence that
confirms their pre-existing beliefs or expectations about a test-taker.
2. Mitigating Bias:
o Training: Provide thorough training for scorers on rubrics and scoring
procedures.
o Blind Scoring: Ensure that scorers do not know the identity or background of
the test-takers.
o Standardization: Maintain consistent test administration and scoring
practices.
Extraneous Factors
1. Environmental Factors:
o Testing Conditions: Noise, lighting, and seating arrangements can impact
performance. Ensuring a controlled and consistent environment is key.
o Health and Well-being: Test-takers’ physical or emotional state can affect
their performance. Stress, fatigue, or illness can skew results.
2. Test Administration:
o Consistency: Variations in how the test is administered (e.g., instructions
given, time limits) can affect results. Adherence to standardized procedures
helps mitigate this.
o Preparation: Test-taker familiarity with the test format and preparation level
can influence scores. Providing preparatory resources or information can help
level the playing field.
3. Test Design:
o Clarity and Fairness: Tests should be well-designed to measure what they
intend to without ambiguity or unfair advantage. Ensuring that the questions
are clear and relevant to the test objectives is crucial.