Test Design I
Test Design I
In this unit we are going to explore t the different steps involved in the process of designing a test.
Enjoy the Unit.
Competencies:
By the end of this Unit, students will be able to:
1. Introduction
2. Stages of Test Development
3. Assessment techniques
4. Summary and Conclusions
5. Assessment Plan
6. References
7. Key
Estimated Time
10 hours.
Assessment Plan
Unit 5 Final Task; this task represents 5 points of the overall grade.
You will submit this task through the UAS platform.
In the previous unit we discussed the issues of validity and reliability and
how they make a successful exam; in addition, we discussed at different
approaches to testing, for instance: direct vs indirect, discrete vs
integrative, etc. and they opened our mind to different possibilities and
provided us with more ideas on the type and quality of evidence we want to
gather from our students´ learning.
In this unit, we are going to analyze each of the steps involved in the
process of test design proposed by Arthur Hughes in his book Testing for
Language Teaching in 2003.
In order to take the best out of this unit, we will have a different approach in
the design of this unit. We will work with the END OF UNIT TASK since the
beginning of the unit as we analyze each of the steps of test design.
Before we start with the stages of test development, we want you to rate
your exam from 1 to 10, being 1 a low-quality exam and 10 an excellent
exam. Keep this number in mind as we visit the stages. Let´s start!
Step 1) Look for an English exam you recently used in your language
classroom. We will analyze the exam as we read the stages of test
development. Make sure you complete all the steps included in this study
material and collect your answers in a separate document as they will be
the foundation of your end of unit report.
Once you have the exam in your hands, read the first stage in test
development.
Stage 1. Stating the problem
Step 2) Using the exam you chose in step 1, answer the following
questions proposed by Arthur Hughes (2003) for the first stage in test
development. Open a word document and write your answers or copy them
from here into a blank file.
Despite of the fact, that we will eventually be able to design all types of test,
the most common one is the achievement exam.
2) What is its precise purpose? Briefly describe what you expect to get
from the exam in terms of what you want to measure.
3) What abilities are to be tested?
The better the items (exercises) in the exam are, the more detailed – and
useful – the answers you get from students. If you take a closer look to the
exam in your hands, you can judge the exercises by using your knowledge
on the approaches to testing (direct / indirect, discrete / integrative) to
determine how detailed the answers from the students will be and if they
are useful for the purpose of the exam.
Now that you have answered the previous questions, we can say that you
have a first general picture of what happened in the first stage in designing
the exam you have in your hands. This might be the first time you ask
yourself this type of questions before designing a test.
In this second stage, a set of specifications for the test must be written. The
expected information in this second stage is related to: content, test
structure, timing, medium/channel, techniques to be used, criterial levels of
performance, and scoring procedures.
Step 3) as you read each of the categories, complete the following sub-
tasks. Remember, the answers must be collected in a separate document
as they will be part of the end of unit task. Let´s read about content.
1) Content.
The way in which the content is described will vary with its nature. For
example, in a grammar test, you may simply list all the relevant structures
covered during a period of time.
Answer the following questions:
As you can probably realize, this first part of stage 2 is really important as it
determines what we will include in the exam. One key reflection that you
can make is whether or not the answer you gave to the second question in
the first stage is still correct at the light of the content that should have been
included in the exam.
Test structure. It refers to the number of sections that the test will
have and what is tested in each. For example, 3 sections –
grammar, reading and writing. How many sections are there in the
exam you are analyzing and what are they about?
Number of items. In total and in the various sections. You have to
make decisions on the total number of items that you will include
in the exam. As you can remember from the previous unit, when
we discussed the issue of reliability, it stated that this is a key
aspect: include enough representative sample but make sure you
don´t design an exam that is too long or too short.
Answer: how many items are there in the exam that you are analyzing? Do
you think that this number is the most appropriate? Would you add more or
remove some items? Why?
This part is really important, if you designed the exam, you can give it a
second thought on the previous decisions you made. If it’s the case, that
you were given the exam you are analyzing, you can make more informed
judgements and determine whether or not the number of items is
appropriate for the content and for the purpose of the test.
1) how much time does it take for a regular student to answer the exam?
2) how much time do you expect students to spend in each section?
3) Is there enough time to complete the items in the exam?
4) Are sections balanced in terms of the time they demand when being
completed? Sometimes, we allocate more time to a task that is not as
complex as a different one with less time available.
5) Is the time available for a regular class enough for students to finish with
the exam?
6) When you used the exam, did you experience any time constraint?
Key questions: do you have a criteria level of performance in the exam you
are analyzing? In the case that you didn´t design the exam, can you ask for
it? If it´s the case, that you designed the exam: did you include it? Yes?
No? elaborate on your answer.
4) Scoring procedures
This is the problem of not having scoring procedures and criteria level of
performance.
We expect that this far; you have made reflections on the quality of the
exam you have in your hands. Do you remember, that at the beginning of
this unit you rated your exam from 1 to 10? Do you still give the same
number to your exam? Keep that in mind!
Once you have answer all of the questions in stage 1 and 2, it is time to
start writing the items you want to include in the exam. In this stage we
have 3 sub-stages or steps to consider.
1. Sampling. This refers to the fact that you have to make the right
choices on the types of items to include in the exam. Not only to choose the
ones that will provide you with the best evidence of students´ learning, but
to consider items that truly serve the purpose of the exam. Finally, you
have to include items that could help you test all of the content specified in
stage 2.
The second element refers to the process of writing the items you chose in
the previous element in this third stage. According to Hughes (2003) items
should always be written with the specifications in mind.
In order to evaluate the effectiveness of the items included in the exam you
have in your hands, let´s analyze some advantages and disadvantages of
the most commonly used items.
These two items are probably the most popular testing techniques
found in test today, largely because they are very easy to mark
and have excellent reliability.
One of the main disadvantages is the fact that you cannot always
trust the results due to the guessing fact and you cannot assume
that students can produce language as you tested students
indirectly.
Item: Gap-filling
One decision we must make when including filling the gaps items
in a test is about how much help we have to provide to students.
Let´s analyze the last consideration when designing gap-filling items and
including them in a test.
Just remember, that the text should be at the right level, if you
bring a text that is too above students´ level they will fail at
predicting the missing word.
If you want to ease the exercise you can make a variation of the
cloze just like the following example:
As you can see in both items the first sentence is not deleted as it
provides context to students on what the text is about. In this
second item, every second word is deleted; however, the first
letter is provided to guide students on the possible answer to the
gap.
There are more testing techniques that you can include in your exam or
that are probably included in the exam you are analyzing. Some
possibilities could be building sentences, transformations and
reformulations and editing.
Did you include any type of building sentences items? If so, were
you aware of the elements discussed in this section?
How is this knowledge useful for you as a test designer?
Would you modify any of the items to make it different, more
challenging, provide more support, etc.?
Item: Transformations and reformulations
These two items not only test linguistic knowledge, they also test
a general ability in the target language. The main objective of
these testing items is to see if students can take a
sentence/meaning and express it in a different way. In other
words, can students express the same idea but using different
linguistic items?
These two techniques are more complex, and they demand more
time from the test taker. Don´t forget that you need to consider
time as a key factor. Let´s see some examples.
As you can see, in this example, the item provides students with a
part of the sentence and it doesn´t need to be beginning like in the
second example.
Some structures that you can test with this technique are:
Item: Editing
This type of testing technique is very common today as it is
included in a lot of different proficiency tests. The idea is to find
mistakes (or a lack of mistakes) in a text. For example:
For example,
Some very valuable from this type of item is that every mistake
you find in students´ work is a mistake they haven´t found
themselves. This provides you with ideas on what students lack.
Did you include any type of editing items? If so, were you aware of
the elements discussed in this section?
Would you modify any of the items to make it different, more
challenging, provide more support, etc.?
The following chart summarizes the items discussed in this third
stage in the development of a test.
3) Moderating items
Before reaching this point, you have so far made decisions on:
The type of exam.
The purpose of the exam.
The content of the exam
The type of items to include
How to design each of the items,
Etc.
In this stage you are expected to ask native speakers, with a very similar
profile to the target participants, to analyze the items included in the exam.
The main reason of doing this is the fact that if native speakers struggled
with instructions or with answering some items, your students may struggle
too, and some modifications need to be done.
The items that survived the last two stages should now be applied to non-
native speakers with a similar profile to the intended users. The main
reason for this step is not only to see if some of the items need modification
but to see the grades students get and to see if the results are reliable and
if those results are similar to the ones we expect with the intended
students.
profile to the target participants, to analyze the items included in the exam.
The main reason of doing this is the fact that if native speakers struggled
with instructions or with answering some items, your students may struggle
too, and some modifications need to be done.
After native (stage 4) and non-native (stage 5) participants took the exam
we need to calibrate the scales based on the results and get enough
representative samples to create the scale we will use to measure our
students´ level. This calibration provides validity and reliability to the exam.
The final version of the test can be validated. This is a must step in
proficiency tests or published tests; however, for a “regular school or
institution” may not be “that” necessary. But, if the exam is to be used many
times over a period of time, informal, small-scale validation is desirable as it
will provide more certainty for students and for the institution.
Has the exam in your hands (with probably small changes) been
used for a period of time?
Have you compared the results from different groups and
generations to validate the results?
The end of unit task from this unit is similar to a light (really light) version of
this, as it prepares you for the final assignment (unit 8) in this course.
Is there a handbook for test takers in your school for the exam you
have in your hands?
Using the handbook develop in the previous stage (9), all the staff involved
should be trained. This may include interviewers, raters, scores,
invigilators, etc.
Were you trained on how to apply the test you have in your
hands?
Were you trained on how to score the exam?
Were you told the expect answers and results from your students?
Guess what? We are done! Let`s go to the summary and conclusions
section for some final recommendations before we read the instructions for
the end of unit task.
This is probably the first time you stop and analyze each of the stages of
test development. I`m sure there were stages that seemed to be obvious
and other stage that you were not aware they existed but at the end all of
them are necessary for creating a valid and reliable exam.
In this fifth unit, we visited items that are useful to test the language
systems (grammar, vocabulary, etc.), in the coming unit we will review
different items to test the skills (reading, listening, speaking and writing).
4. Assessment Plan
Instructions:
Step 1) Look for an English exam you recently used in your language
classroom.
Step 2) to Step 18) answer all the corresponding questions using the exam
as a reference the exam you chose in step 1.
Step 19) Collect the answers you wrote in every step (step 2 to step 17) in
the study material. You will identify the answers with this icon
Cover Page.
Introduction (describe what this assignment was about)
Each stage (1 to 10) with the answers you wrote.
Conclusion
Appendix: the sample exam you used to answer all of the questions in the
10 stages.
For example,
Cover Page
Introduction
Stage 1. Stating the problem
1) What kind of test is it to be? Achievement (final or progress),
proficiency, diagnostic, or placement? WRITE THE ANSWER.
2) What is its precise purpose? Briefly describe what you expect
to get from the exam in terms of what you want to measure.
WRITE THE ANSWER.
3) What abilities are to be tested? WRITE THE ANSWER.
4) How detailed must the results be? WRITE THE ANSWER.
5) How accurate must the results be? WRITE THE ANSWER.
Make sure your answers are self-explanatory and that they are really
answering the question in the corresponding section. You can write them in
a paragraph form (or different paragraphs not a BIG paragraph, please)
including the answers to all of the questions.
Step 21) Write a conclusion (250 words) using the following questions as
the foundation:
1) Do you remember you rated your exam at the beginning of this study
material? Well, after you completed all of the steps and answered all of the
questions, RATE your exam again.
Will you give the same score?
What do you think about the tests you use in your school or the
test you have previously designed? Do they follow a similar
process?
What did you learn in this unit?
Will you make changes in the way you design your tests?
Any additional comment you would like to make.
Step 22) As this is probably one of the largest end of unit task you have
done, you will submit until Tuesday, May 11th before 10:00pm
Your tutor,
Heidy Paredes
Step 4) Answer the replies you received. Due: Saturday before 10:00pm.
Your tutor,
Heidy Paredes.