0% found this document useful (0 votes)

105 views9 pages

Unit 3 Stages of Test Development

The document outlines the 5 key stages of test development: 1. Stating the problem - Clearly defining what is being tested and the test's purpose. 2. Writing test specifications - Including content, structure, timing, scoring criteria. 3. Writing and moderating test items - Developing items aligned to specifications and revising through peer review. 4. Informal trialling of items on native speakers - Identifying and revising difficult items. 5. Trialling the full test on a sample group - Administering the test to identify flaws before official use.

Uploaded by

john paul tagoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views9 pages

Unit 3 Stages of Test Development

Uploaded by

john paul tagoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit 3

Stages of Test Development

Common Test Techniques

Multiple Choice

Stages of Test Development

1. Stating the problem

It cannot be said too many times that the essential first step in testing is to make
oneself perfectly clear about what it is one wants to know and for what purpose. The
following questions , the significance of which should be clear from previous chapters,
have to be answered:

i. What kind of test is it to be? Achievement (final or progress), proficiency,

diagnostic, or placement?
ii. What is its precise purpose?
iii. What abilities are to be tested?
iv. How detailed must the result be?
v. How accurate must the result be?
vi. How important is backwash?
vii. What constraints are set by unavailability of expertise, facilities, time ( for
construction, administration and scoring)?

Once the problem is clear, steps can be taken to solve it. It is to be hoped that a
handbook of the present kind will take readers a long way towards appropriate
solutions. In addition, however, efforts should be made to gather information on tests
that have been designedfor similar situation. If possible, sample of such test should be
obtained. There is nothing dishonourable in doing this ; it is what professional testing
bodies do when they are planning a test of a kind for which they do not already have
first-hand experience. Nor does it contradict the claim made earlier that each testing
situation is correct. It is not intended that other test should simply be copied; rather
that their development can serve to suggest possibilities and to help avoid the need to
‘reinvent the wheel’.
2. Writing specifications for the test

A set of specifications for the test must be written on the outset.

(i)Content

This refers not to the content of a single, particular version of a test, but to the
entire potential content of any number of versions. Sample of this content will appear in
individual versions of the test.

The fuller the information on content, the less arbitrary should be the
subsequent decisions as to what to include in the writing of any version of the test.
There is danger, however, that in the desire to be highly specific, we may go beyond
our current understanding of what the components of language ability are and what
their relationship is to each other.

The way in which content is described will vary with its nature. The content of a
grammar test, for example, may simply list all the relevant structures. The content of a
test of a language skill, on the other hand, may be specified along the numbers of
dimension. The following provides a possible framework for doing this. It is not meant
to be prescriptive; readers may wish to describe test content differently. The important
thing is that content should be as fully specified as possible.

Operations

• Scan text to locate specific information

• Guess meaning of unknown words from context

Types of text

• Letters
• Forms
• Academic essays up to three pages in length

Addressees of texts-this refers to the kinds of people that the candidate is

expected to be able to write or speak to or the people for whom reading and
listening materials are primarily intended.

• Native speakers of the same status and age

• Native speaker university students

Length of text(s)
• For a reading test, this would be the length of the passages on which items are
set.
• For a listening test, it could be the length of the spoken texts
• For a writing test, the length of the pieces to be written

Topics- it may be specified quite loosely and selected according to suitability for the
candidate and the type of test.

Readability- reading passages may be specified as being within a certain range of

readability.

Structural Range- it could be;

• A list of structures which may occur in texts

• A list of structures which should be excluded
• A general indication of range of structures

Vocabulary Range- This may be loosely or closely specified. An example of the latter
is to be found in the handbook of the Cambridge Young Learners tests, where words
are listed.

Dialect, accent, style- This may refer to the dialects and accents that test takers are
meant to understand or those in which they are expected to write or speak. Style may
be formal, informal, conversational etc.

Speed of processing-

• For reading, this may be expressed in the number of words to be read per
minute
• For speaking, it will be rate of speech, also expressed in words per minute.
• For listening, it will be the speed at which texts are spoken.

(ii) Structure, timing, medium/channel and techniques

The following should be specified:

• Test structure
What sections will the test have and what will be tested in each? (for
example: 3 sections- grammar, careful reading, expeditious reading)
• Number of Items
(in total and in the various sections)
• Number of Passages
(and number of items associated with each)
• Medium/channel
(paper and pencil, tape, computer, face-to-face, telephone, etc.)
• Timing
(for each section and for entire test)
• Techniques
What techniques will be used to measure what skills or subskills?

(iii) Criterial levels of performance

The required level(s) of performance for ( different levels of) success should be
specified. This may involve a simple statement to the effect that, to demonstrate
‘mastery’, 80 percent of the items must be responded to correctly.

For speaking or writing, however, one can expect a description of the criterial
level to be much more complex. For example, the handbook of the Cambridge
Certificates in Communicative Skills in English (CCSE) specifies the following degree of
skill for the award of the Certificate in Oral Interaction at level 2:

• Accuracy
Pronunciation must be clearly intelligible even if still obviously influenced by L1.
Grammatical/ lexical accuracy is generally high although some errors that do not
destroy communication are acceptable.
• Appropriacy
The use of language must be generally appropriate to function. The overall
intention of the speaker must be generally clear.
• Range
A fair range of language must be available to the candidate. Only in complex
utterances is there a need to search for words.

• Flexibility
There must be some evidence of the ability to initiate and concede a
conversation and to adapt to new topics or changes of direction.
• Size
Must be capable of responding with more than short-form answers where
appropriate. Should be able to expand simple utterances with occasional
prompting from the Interlocutor.
(iv) These are always important, but particularly so where scoring will be subjective.
The test developers should be clear as to how they will achieve high reliability and
validity in scoring. What rating scale will be used ? How many people will rate each
piece of work? What happens if two or more raters disagree about a piece of work?

3. Writing and Moderating Items

a. Sampling

➢ It is most unlikely that everything found under the heading of 'Content' in the
specifications can be covered by the items in any one version of the test.
➢ Choices have to be made. For content validity and for beneficial backwash, the
important thing is to choose widely from the whole area of content.

b. Writing Items

➢ Items should always be written with the specifications in mind. It is no use

writing 'good' items if they are not consistent with the specifications.
➢ As one writes an item, it is essential to try look as if through the eyes of test
takers and imagine how they might misinterpret the item.
➢ The writing of successful items is extremely difficult. No one can expect to be
able consistently to produce perfect items. Some items will have to be rejected,
others reworked.
➢ The best way to identity items that have to be improved or abandoned is
through the process of moderation.

c. Moderating Items

➢ Moderation is the scrutiny of proposed items by at least two colleagues, neither

of whom is the author of the items being examined.
➢ Their task is to try to find weaknesses in the items and, where possible, remedy
them.

4. Informal Trialling of Items on Native Speakers

➢ Items which have been through the process of moderation should be presented
in the form of a test to a number of native speakers-twenty or more, if possible.
➢ The native speakers should be similar to the people for whom is the test is being
developed, in terms of age, education, and general background.
➢ Items that proved difficult for the native speakers almost certainly need revision
or replacement.

5. Trialling of the Test on a group of non-native speakers similar to those for

whom the test is intended

➢ Those items that have survived moderation and informal trialling on native
speakers should be put together into a test, which is then administered under
test conditions to a group similar to that for which the test is intended.
➢ Problems in administration and scoring are noted.
➢ It has to be accepted that, for a number of reasons, trialling of this kind is not
feasible.
➢ It is often the case, therefore, that faults in a test are discovered only after it has
been administered to the target group.
➢ Unless it is intended that no part of the test should be used again, it is
worthwhile noting problems that become apparent during administration and
scoring, and afterwards carrying out statistical analysis of the kind referred to
below and treated more fully in Apendix 1.

6. Analysis of the result of the trial; making of any necessary changes

There are two kinds of analysis that should be carried out.

➢ This will reveal qualities (such as reliability) of the test as a whole and of
individual items (for example, how difficult they are, how well they discriminate
between stronger and weaker candidates.
➢ The second kind of analysis is qualitative. Responses should be examined in
order to discover misinterpretations, unanticipated but possibly correct
responses, and any other indicators of faulty items.
➢ Items that analysis shows to be faulty should be modified or dropped from the
test.
➢ Assuming that more items have been trialled than are needed for the final test, a
final selection can be made, basing decisions on the results of the analyses.

7. Calibration of Scales

➢ Where rating scales are going to be used for oral testing or the testing of writing,
these should be calibrated.
➢ Essentially this means collecting samples of performance (for example, pieces of
writing) which cover the full range of the scales. A team of "experts" then looks
at these samples and assigns each one of them to a point on the relevant scale.
➢ The assigned samples provide reference points for all future uses of the scale, as
well as being necessary training materials.

8. Evaluation

• The final version of the test to be validated.

• Regarded as essential for high stakes or published test.
• For low stakes, tests are to be used within an institution. This may not be
thought necessary, although where the test is likely to be used many
times over a period of time, informal, small-scale validation is desirable.
9. Writing handbooks for test takers, test users and staff

Handbook (each with rather different content, depending on audience) may be

expected of contain the following:

• the rationale for the test;

• an account of how the test was developed and validated;
• a description of the test( which may include a version of the
specifications);
• sample items( or a complete sample test);
• advice on preparing for taking the test;
• an explanation of how test scores are to be interpreted;
• training materials( for interviewers, raters, etc.);
• details of test administration.

10. Training Staff

• All staff involved in the test process should be trained. This may include
interviewers, raters, scorers, computer operators, invigilators (proctors).

8 COMMON TEST TECHNIQUES

• It is very difficult to write successful items

A further problem with multiple choice is that, even where items are possible,
good ones are extremely difficult to write. Professional test writers reckon to have to
write many more multiple choice items than they actually need for a test, and it is
only after trailing and statistical analysis of performance on the items that they can
recognize the ones that are usable.

Multiple choice tests that are produced for use within institutions are often shot
through with faults.

Common amongst these are:

• more than one correct answer

• no correct answer
• there are clues in the options as to which is correct
• ineffective distracters

Savings in the time for administration and scoring will be outweighed by the time
spent on successful test preparation.

It is true that item banks are worthwhile but great demands are still made on time
and expertise.

• Backwash may be harmful

It should hardly be necessary to point out that where a test that is important to
students is multiple choice in nature, there is a danger that practice for the test will
have a harmful effect on learning and teaching.

Practice at multiple choice items (especially when-as can happen-as much attention
is paid to improving one’s educated guessing as to the content of items) will not usually
be the best way for students to improve their command of language.

• Cheating may be facilitated

The fact that the responses on a multiple choice test (a,b,c,d) are so simple that
makes them easy to communicate to other candidates non-verbally.

Some defence against this is to have at least two versions of the test, the only
difference between them being the order in which the options are presented.

All in all, the multiple choice technique is best suited to relatively infrequent testing
of large numbers of candidates.

Yes/No and True/False items

• Items in which the test taker has merely to choose between Yes and NO or
between True or False.
• The obvious weakness of such items is that the test taker has a 50% chance of
choosing the correct response by chance alone.
• True/False items are sometimes modified by requiring test takers to give a reason
for their choice.

Short-answer items
• Items in which the test taker has to provide a short answer are common
particularly in listening and reading tests.
• Advantages over multiple choice:
1. guessing will contribute less to test scores;
2. the technique is not restricted by the need for distractors (though
there have to be potential alternative responses) ;
3. cheating is to be more difficult;
4. though great care must still be taken, items should be easier to
write.
• Disadvantages are:
1. responses may take longer and so reduce the possible number of items;
2. the test taker has produce language in order to respond;
3. scoring may be invalid or unreliable, if judgment is required;
4. scoring may take longer.

Copy Geriatrics at Your Fingertips 2018
100% (1)
Copy Geriatrics at Your Fingertips 2018
370 pages
Macbeth (An Undoing) Rehearsal Draft Dec 2023
No ratings yet
Macbeth (An Undoing) Rehearsal Draft Dec 2023
81 pages
312 ATI Critical Care Meds
100% (1)
312 ATI Critical Care Meds
34 pages
Mastering CELPIP Speaking: Strategies, Tips, and Practice for Success
From Everand
Mastering CELPIP Speaking: Strategies, Tips, and Practice for Success
Teacher Karen
5/5 (1)
Designing Classroom Language Tests
100% (4)
Designing Classroom Language Tests
9 pages
Introduction To Linguaskill Slides
100% (1)
Introduction To Linguaskill Slides
37 pages
Hughes 7 Stages of Test Development-3
No ratings yet
Hughes 7 Stages of Test Development-3
17 pages
7 Stages of Test Construction: Statement of The Problem
No ratings yet
7 Stages of Test Construction: Statement of The Problem
14 pages
SUMMARY 5th (Miranda Nurislami B)
No ratings yet
SUMMARY 5th (Miranda Nurislami B)
4 pages
Chapter 13 Interpreting Test Scores
No ratings yet
Chapter 13 Interpreting Test Scores
4 pages
Session 4 - Designing Language Test
No ratings yet
Session 4 - Designing Language Test
22 pages
Language Examining and Test Development
No ratings yet
Language Examining and Test Development
59 pages
Guide October 2002 Revised Version1
No ratings yet
Guide October 2002 Revised Version1
61 pages
Context Validity: Cyril J. Weir
No ratings yet
Context Validity: Cyril J. Weir
42 pages
Common Test Techniques
100% (2)
Common Test Techniques
4 pages
Language Testing
100% (1)
Language Testing
5 pages
Test Construction Procedures
No ratings yet
Test Construction Procedures
7 pages
Achievement Test
No ratings yet
Achievement Test
7 pages
Assessment
No ratings yet
Assessment
5 pages
Testing Writing Testing Oral Ability Testing Reading Testing Listening Testing Grammar & Vocabulary Testing Young Learners
No ratings yet
Testing Writing Testing Oral Ability Testing Reading Testing Listening Testing Grammar & Vocabulary Testing Young Learners
42 pages
The Practice of English Language Teaching - CH 23 TESTING
No ratings yet
The Practice of English Language Teaching - CH 23 TESTING
7 pages
Mid and Final Test Esp (Novianti Arif and Wahyu) Tbi 3.semester 7
No ratings yet
Mid and Final Test Esp (Novianti Arif and Wahyu) Tbi 3.semester 7
20 pages
K D M
No ratings yet
K D M
7 pages
Summary 21-23
No ratings yet
Summary 21-23
8 pages
Language Testing
No ratings yet
Language Testing
5 pages
Language Testing: Liu Jianda
No ratings yet
Language Testing: Liu Jianda
45 pages
THE SUMMARY OF CHAPTER 2 Tugas Sabtu 281023
No ratings yet
THE SUMMARY OF CHAPTER 2 Tugas Sabtu 281023
5 pages
Language Testing
75% (4)
Language Testing
45 pages
Designing Classroom Language Test
50% (2)
Designing Classroom Language Test
13 pages
Bài Mẫu Final Assignment
No ratings yet
Bài Mẫu Final Assignment
10 pages
Oral Production Test
No ratings yet
Oral Production Test
6 pages
9 Testing Course Participants Notes
No ratings yet
9 Testing Course Participants Notes
6 pages
Testing For Language Teachers
No ratings yet
Testing For Language Teachers
8 pages
LA - Week 3
No ratings yet
LA - Week 3
33 pages
Testing Spoken Language
No ratings yet
Testing Spoken Language
6 pages
Assessment and Testing in The Classroom
100% (1)
Assessment and Testing in The Classroom
47 pages
Basic Types of Speaking
100% (3)
Basic Types of Speaking
11 pages
LTA - Unit 5
No ratings yet
LTA - Unit 5
43 pages
Advanced Assessment in Language Teaching
No ratings yet
Advanced Assessment in Language Teaching
28 pages
Testing Speaking
No ratings yet
Testing Speaking
31 pages
Sample Test Specification
No ratings yet
Sample Test Specification
6 pages
Language Testing: Open University of Sudan
100% (2)
Language Testing: Open University of Sudan
33 pages
Language Testing. Oxford. Oxford University Press)
No ratings yet
Language Testing. Oxford. Oxford University Press)
6 pages
Testing Techiques
No ratings yet
Testing Techiques
14 pages
Communicative Language Test PDF
100% (1)
Communicative Language Test PDF
14 pages
Bismillah
No ratings yet
Bismillah
9 pages
Final Presentation - Listening Test
No ratings yet
Final Presentation - Listening Test
53 pages
Communicative Language Testing.. Barba Karen NIcole
No ratings yet
Communicative Language Testing.. Barba Karen NIcole
41 pages
Language Testing Language Assessment and Error Correction
No ratings yet
Language Testing Language Assessment and Error Correction
5 pages
Lesson 4
No ratings yet
Lesson 4
42 pages
Bsee29 LP G3
No ratings yet
Bsee29 LP G3
14 pages
IWG July2005 PDF
No ratings yet
IWG July2005 PDF
199 pages
Task 1B (I)
No ratings yet
Task 1B (I)
5 pages
2-Testing Listening
No ratings yet
2-Testing Listening
13 pages
Assessing Listening and Speaking Skills 1
No ratings yet
Assessing Listening and Speaking Skills 1
49 pages
Nur Alfi Laela 20400112016 Pbi 1: Test Types
No ratings yet
Nur Alfi Laela 20400112016 Pbi 1: Test Types
6 pages
Communicative Testing Done Kel 3
No ratings yet
Communicative Testing Done Kel 3
9 pages
Communicative and Performance-Based Testing
No ratings yet
Communicative and Performance-Based Testing
5 pages
Assessing Speaking Skills: A: Workshop For Teacher Development Ben Knight
No ratings yet
Assessing Speaking Skills: A: Workshop For Teacher Development Ben Knight
9 pages
Lla Presentasi
No ratings yet
Lla Presentasi
6 pages
Criteria of Tests
No ratings yet
Criteria of Tests
3 pages
How to Write a Dissertation: An Instructional Manual for Dissertation Writers.
From Everand
How to Write a Dissertation: An Instructional Manual for Dissertation Writers.
Benjamin Baisai Silas Madondo
No ratings yet
Aiming for an A in CfE Higher English
From Everand
Aiming for an A in CfE Higher English
Dick Lynas
No ratings yet
CMC Mathazine: VOLUME 1 - October 2020 ISSUE
No ratings yet
CMC Mathazine: VOLUME 1 - October 2020 ISSUE
52 pages
Impactofpeerinfluence
No ratings yet
Impactofpeerinfluence
12 pages
Skala BiK 2008
No ratings yet
Skala BiK 2008
19 pages
Workshop 2-Basic Patterns
No ratings yet
Workshop 2-Basic Patterns
10 pages
PIRS SLEEP 25 Abstract Supplement A246 A2472002
No ratings yet
PIRS SLEEP 25 Abstract Supplement A246 A2472002
1 page
Randfile 01
No ratings yet
Randfile 01
40 pages
A Christian Lifestyle in The Last Days
No ratings yet
A Christian Lifestyle in The Last Days
16 pages
An Investigation Into Punctuation and Capitalization Errors Made by Hebron University EFL Students
No ratings yet
An Investigation Into Punctuation and Capitalization Errors Made by Hebron University EFL Students
21 pages
Answer On Question #39214, Physics, Other: Solution
No ratings yet
Answer On Question #39214, Physics, Other: Solution
2 pages
Interview Manju Jois
No ratings yet
Interview Manju Jois
3 pages
CTY0 Extra Grammar Exercises Unit 4
No ratings yet
CTY0 Extra Grammar Exercises Unit 4
5 pages
ProQuestDocuments 2013 03 08
No ratings yet
ProQuestDocuments 2013 03 08
147 pages
The Case of Starbucks - C3
No ratings yet
The Case of Starbucks - C3
2 pages
Shanitsagar Ji Paper-6!9!24 (Responses)
0% (1)
Shanitsagar Ji Paper-6!9!24 (Responses)
8 pages
BENGALI (Code: 105) Syllabus: CLASS-XII (2018 - 2019)
No ratings yet
BENGALI (Code: 105) Syllabus: CLASS-XII (2018 - 2019)
2 pages
Five Pillars
No ratings yet
Five Pillars
8 pages
Occupational Health and Safety Management Systems Tcm18 240421
100% (2)
Occupational Health and Safety Management Systems Tcm18 240421
6 pages
Customer Satisfaction Towards Patanjali Ayurvedic Products
No ratings yet
Customer Satisfaction Towards Patanjali Ayurvedic Products
42 pages
Domino VS Comelec Digest
No ratings yet
Domino VS Comelec Digest
2 pages
Offer Letter
No ratings yet
Offer Letter
3 pages
Critikon Dinamap NIBP Modul - Service Manual
No ratings yet
Critikon Dinamap NIBP Modul - Service Manual
50 pages
Week 2 - English q4 - Las 2
No ratings yet
Week 2 - English q4 - Las 2
6 pages
Validación Del SHAFT98 y SAHFTspt
No ratings yet
Validación Del SHAFT98 y SAHFTspt
16 pages
Axis Bank Set 1
50% (2)
Axis Bank Set 1
14 pages
Family Law
No ratings yet
Family Law
21 pages
Adelino Quizes
No ratings yet
Adelino Quizes
2 pages
DAFTAR PUSTAKA Peb
No ratings yet
DAFTAR PUSTAKA Peb
2 pages

Unit 3 Stages of Test Development

Uploaded by

Unit 3 Stages of Test Development

Uploaded by

Unit 3

Stages of Test Development

Stages of Test Development

Common Test Techniques

Stages of Test Development

1. Stating the problem

i. What kind of test is it to be? Achievement (final or progress), proficiency,

A set of specifications for the test must be written on the outset.

• Scan text to locate specific information

Addressees of texts-this refers to the kinds of people that the candidate is

• Native speakers of the same status and age

Readability- reading passages may be specified as being within a certain range of

Structural Range- it could be;

• A list of structures which may occur in texts

(ii) Structure, timing, medium/channel and techniques

The following should be specified:

(iii) Criterial levels of performance

3. Writing and Moderating Items

➢ Items should always be written with the specifications in mind. It is no use

➢ Moderation is the scrutiny of proposed items by at least two colleagues, neither

4. Informal Trialling of Items on Native Speakers

5. Trialling of the Test on a group of non-native speakers similar to those for

6. Analysis of the result of the trial; making of any necessary changes

There are two kinds of analysis that should be carried out.

• The final version of the test to be validated.

Handbook (each with rather different content, depending on audience) may be

• the rationale for the test;

10. Training Staff

8 COMMON TEST TECHNIQUES

• It is very difficult to write successful items

Common amongst these are:

• more than one correct answer

• Backwash may be harmful

• Cheating may be facilitated

Yes/No and True/False items

You might also like