Testing
Testing
Testing
We as teachers are often faced with the need of designing our own tests to measure our
students´ level of proficiency; however, we haven´t been trained on how to make a good test.
IN this unit, you will develop some basic understanding but also some basic skills in order to
create your own tests. Enjoy the unit.
Competencies:
By the end of this Unit, students will be able to:
Discuss the advantages, disadvantages and nature of direct and indirect assessment;
Show an understanding of the benefits and key design features of discrete and integrative
testing; and
Demonstrate understanding of the relative merits of reliability and validity in the assessment
process.
1. Introduction
2. Types of tests and their purpose and features
i. Diagnostic, achievement, placement, proficiency
3. Features of a well-designed tests
i. Validity
1. Content validity
2. Criterion-related validity
3. Face validity
ii. Reliability
1. Making tests reliable
4. Approaches to test design:
i. Direct and indirect testing, and
ii. discrete integrative testing
5. Summary and Conclusions
6. Assessment Plan
7. References
8. Key
Estimated Time
10 hours.
Assessment Plan
In order to successfully complete your work on this Unit you will be asked to do two
assessment tasks:
Unit 4 Final Task; this task represents 5 points of the overall grade. You will
submit this task through the UAS platform.
The main objective of this unit is to help you design better written tests. Even
thought, we will cover this topic in unit 4, 5 and 6; this unit will help you ground some basic
understanding behind the process of designing a written test and it can hopefully become a
guide for you on the process of deigning your own tests.
More then often, we take the responsibility of designing our own exams instead of using pre-
designed exams by our institutions, due to a variety of reasons. Some reasons could be the
following:
A deep mistrust on the designing on the exam.
The exam doesn´t test what I´m teaching.
The exercises on the exam are not familiar to my students.
There are sections of the exam that are more complex that they should.
Instructions are not well written.
The exam doesn´t provide me with the evidence I need in order to determine if a
student passes or not the course.
Etc.
You are probably thinking of more and more situations that have pushed you to the decision of
designing your own test. If you have designed your own exam, then, the following questions
may ring a bell:
Is your exam valid?
Is your exam reliable?
What approaches to testing did you use? And, why?
Is your exam testing what you are trying to test?
2. Types of tests
There is no such thing, as a perfect test. A test that proves to be ideal for one situation or
context may be quite useless for another. A testing technique that works great in one situation
can entirely go wrong a different situation.
Therefore, in the process of designing any test we must be aware that there is a basic
distinction among tests according to the purpose the serve. Different purposes require
different kinds of tests.
Task 1) Write different purposes you have in mind when designing an exam:
You probably wrote that you design exams to (test) determine how successful students have
been in achieving the objectives of a course of study. However, there are other purposes for
testing, for instance:
Stop and Think: What is a proficiency test? What is it for? What are some examples of
proficiency tests? Have you taken one?
Proficiency tests are designed to measure and determine people´s ability in the target language
(not only English, as there are for almost every language in the world). Some important to
observe, is that this test is apply to language users without necessarily being part of a language
course or program. Therefore, it is not based on specific objectives from a language course.
Stop and Think: If is not linked to a type of course, who takes this exam? What do
they take it for?
Proficiency tests are based on a specification of what candidates have to be able to do in the
language in order to be considered proficient. This type of exam measures that a language user
is able to function successfully in the target language. For example, if a learner can pass the
Cambridge English Preliminary English Test, it shows that s/he has reached CEFR B1 level; if on
the other hand s/he can pass the Cambridge English First, that s/he has reached CEFR B2.
Task 2)
Write at least 5 different proficiency tests that you know in the space below:
You probably wrote: TOEFL, IELTS, CAE, FCE, PET, KET, Trinity, TOEIC, etc. these exams are
proficiency tests and even thought they measure different levels of competency their serve the
same purpose.
There is a second type of exams called: achievement tests.
This type of exams, in the contrary of proficiency tests, are closely related to language courses,
their purpose is to establish how successful individual students, group of students, or the courses
themselves have been in achieving objectives.
Stop and Think: There are two types of tests: final achievement tests and progress
achievement tests. How are they different from each other?
In one hand, we can simply say that final achievement tests are used or administered at the end
of a course of study, and it could be based on the syllabus, the book used during the course or on
the objectives of the course. Depending on what is based it has its advantages and
disadvantages which we will cover later when we talk about validity and reliability.
On the other hand, we can say that progress achievement tests, are intended to measure the
progress that students are making at a certain period of time in a given course.
If you are like most teachers, you are likely to be responsible not only for preparing students to
take – and pass – the test, but also, to be involved in the process of designing the exam. This is
one of the main reasons for including this unit in the assessment in let course.
So far, we have discussed about proficiency and achievement test, it is time to explore what a
diagnostic test is. Let´s complete a task:
Task 3) Answer the following questions in the space provided:
Diagnostic tests are used to identify learner´s strengths and weaknesses, in order others, they
help us to ascertain what learning still needs to take place. We can think of proficiency tests to
provide us with this information; however, an exam designed for this very specific purpose
linked to a particular course will tell us exactly (hopefully) what students already know and don
´t know. In order to do so, we need to create an exam with “enough representative samples”
of the target grammatical points and skills that we would like to measure.
Diagnostic exams are especially useful for individual instruction or self-instruction. This type of
exam provides us with a starting point when we are going to teach a private class to a student.
Just like its name indicates, they are intended to provide us with key information that will help
us place students at the stage (or in the part) of the teaching program that is most appropriate
to their abilities.
For example, in Centro de Estudio de Idiomas – Culiacan, the English program consists
of ten levels, and you are given the opportunity to take a placement test and advance up
to the sixth level.
This type of exams is typically used to assign students to classes at different levels; however, it
is recommended that every school designs its own placement exam in order to get a more
accurate measure of the abilities of the candidates according to the courses offered by the
institution.
Task 4)
Answer the following questions:
Is there a placement test in the school where you work?
Have you ever taken a placement test?
Have you ever been invited to design a placement test?
Probably, due to the nature of the institution where you work, no placement exam is applied to
candidates; however, we encourage you to be aware of the importance of a placement test in
all institutions.
This first section of unit 4, provided you with a basic understanding of the different types of
exams; however, there are a number of features in order to design an exam. Let´s analyze
them in the coming section.
As we have discussed in this unit, there will be times in which we will have to design our own
tests in order to measure our students´ current level or as to gather evidence of their learning.
Sometimes we overlook the features of a well-designed tests as we have “most of the times”
designed our own tests with “some” evidence of success and it appears to be easy. However,
while writing a quick little test or quiz is easy, it is very difficult to write a good test.
Stop and Think: how much effort and time will you invest in designing an exam if you
know that the result of the exam is fundamental in deciding if a person graduates from a
Bachelor program?
I´m sure you agree that you will invest more time and effort on a test depending on how
important the result is to the student. For example, if you want to know if a student knows the
parts of the house, you will design exercises like the ones below:
Option 1)
Instructions:
Option 2)
1. __________________
2. __________________
3. __________________
4. __________________
However, if you are going to use a test to decide whether someone will repeat a school year or
will be able to go to the university or not, the test will obviously need to be much better.
Task 5)
So, then, what makes a good test? Write your answers in the space below.
A good language test should measure students´ performance without setting “traps” for
students. It should provide students with an opportunity to show their abilities in the target
language.
A good test should also let teachers know what students know and don´t know about the
language at a certain point and results should be also used as a way to evaluate the
effectiveness of the instruction, methods, materials and the course itself.
As this is probably the first time you encounter these concepts in your teaching career, let´s
analyze each of them. Let´s start with validity.
3.1 Validity
There are different forms of validity that will help you grasp this definition better.
3.1.1 Content Validity
This first form of evidence relates to the content of the test. According to Hughes a test is said
to have content validity if its content constitutes a representative sample of the language skills,
structures, etc. with which it is meant to be concerned.
For example, if you are going to design a grammar test, it must be made up of items related to
the knowledge or control of grammar and it should include the relevant structures. In order to
do so, the test designers need to make a list of all of the structures covered during a period of
time and design enough representative samples of those structures in order to test students´
knowledge of those structures.
In order to design an exam, you will have to determine how many items in the exam will be
included to test those two topics for you to gather an accurate measure of their knowledge
and control of those grammatical points and decide whether they have successfully mastered
the simple present.
As you see in the example above, we didn´t include (teach) the question form, therefore, you
CANNOT include it in the exam. If you test students on how to make questions in the simple
present using DO / DOES, you are affecting content validity. Therefore, your exam is not valid. If
you put it in the exam without teaching it, students will say:
Teacher, we don’t´ know how to ask questions.
¡Teacher , eso no lo vimos!
¡Teacher, eso no lo sé contestar, eso no lo habíamos visto!
¡Teacher, dijo que eso no lo iba a enseñar y que no iba a venir en el examen!
If you have been into a situation like this one, well it was due to the lack content validity in your
exam. Does it sound familiar to you?
I hope it does not.
In School X, the Academic Coordinator designs all the exams in the language school and
uses the syllabus (what teachers are expected to teach – and students to learn) to design
the exam for each of the levels (Basic, Intermediate and advance).
The Academic Coordinator checks on the syllabus and designs the exam considering
content validity: including enough representative samples to measure students´ abilities
on ALL the topics and gives teachers the photocopied exams for their groups.
The day of the examination arrives and in the middle of the exam students start
complaining that the exam includes things they don´t know how to answer and things
they were not taught.
Step 2) According to content validity, what do you think went wrong?
What probably went wrong is that the teacher affected content validity by not teaching what
he was supposed to teach (syllabus) and the Coordinator expected the teacher to do it so.
Teachers affect content validity everything they make the following decisions:
This is especially true, when you are given an exam that was designed by School Administrators
who do not know the decisions you made in the language classroom. School administrators
expect you to have taught everything that is in the syllabus (list of things to teach), and they
designed the exam using that list of topics and you didn´t cover it. The result will be an exam
that is not valid due to content validity.
Stop and Think: how many times have you affected content validity?
I think this first form of validity has opened your eyes to things you probably didn´t consider
before when designing an exam. There are two other forms of validity: criterion-related and
face validity, and just like content validity are important when designing an exam. Let´s analyze
them.
This is the second form of evidence that a test is valid. This form of validity refers to the degree
to which results on the test agree with those provided by some independent assessment.
Concurrent is basically when you compare the results of the test with examples of an
independent criteria. In theory, the students test scores should tally with that criteria.
Predictive is when you use the test to examine and predict future performance. An
example of this would be a placement test, you are testing to see how well a student
will perform in a particular class.
In some simple words, if your test “proves” that your student is able to do something at a
certain level of competency, well, if tested by somebody else, we expect your student to have
pretty much the same result.
If you design an exam to measure your student speaking ability, and your results state that
your student is a B2 (according to the CEFR), we expect your student to get the same result if
he takes a proficiency exam in that same level.
Stop and Think: imagine that you are asked to design a placement test for the school
where you work, and one hundred students are asked to take that exam in order to be
placed at the right level. And results indicate the following:
After 3 classes, the teacher from the intermediate level calls you and says the following:
I´m not sure those 30 students should be in this level; they are not able to perform well at
this level. I think they should be in level 2 or 3 not in level 5.
Key questions:
what would you do?
What went wrong?
Is your placement exam well-designed?
If it ended up that the teacher in level 5 is right, and students are not placed in the right level,
then your exam lacks criterion referenced validity, but it also lacks content validity as you didn´t
choose enough and well representative samples of those levels to place candidates correctly.
If the intermediate students are not well placed, what would you do with the advance
students? Food for thought!
A test is said to have face validity when it looks like it measures what it is supposed to measure.
For example, if you design an exam to measure students´ ability to pronounce some words but
students are not asked to pronounce those words, well the exam lacks face validity.
A test with no face validity might not be accepted by candidates as they will say that the exam
is not providing students with the opportunity to truly show that they are able to do what they
are being tested. Face validity also deals with the format, if the test looks formal to students or
not as this might affect their “attitude” to taking the test.
So far, we have discussed the issues related to validity as one of the features for a well-
designed exam and you have explored at different examples on how to make a test valid;
however, besides being valid, an exam needs to be reliable.
3.2 Reliability.
Reliability is consistency, dependence, and trust. It could also be understood as
trustworthiness; what does it mean? It means that the results of a reliable test should be
consistent (remain stable, should not be different when the test is used in different days.
It means that if you apply your test to group A on Monday, we expect to get the same results if
we would apply the same test, to the same group on Wednesday. Despite of the fact, that
there could be some “special” factors on Wednesday, like a rainy day, students took another
test before you test, etc. we still expect to have pretty much the same results. If you apply, the
same exam to group B, well, we expect the same measurement.
But, then, how can we make a test more reliable? Let´s see some suggestions you can apply in
your future tests.
We cannot expect students to show their best performance in one single change,
therefore, providing enough opportunities for students to demonstrate their abilities
will provide us with more reliable results.
Therefore, the more samples (testing items / exercises in the same) the better;
however, we must be aware of not making an exam too long that is boring, tiring and
demotivating for students to complete.
Hughes (1989:22) provides us with an example in his book Testing for Language
Teachers that is especially useful:
The clearer your items, the better. Candidates should not be presented with items
whose meaning is not clear. I remember one teacher asking the following:
Instructions:
To get an extra point answer the following:
o What´s my full name?
o When is my birthday?
o What´s the address of the school?
Not saying this is “so” unambiguous but I remember that nobody got it right and the
teacher got offended, as we all agreed, not to accept again an exam with questions like
the ones above.
Instructions:
Choose True when there is a false statement.
Choose False when the statement is true.
I didn´t read the instructions, I just saw T and F and I “assume” that it was a “normal”
true or false activity, but the teacher decided to change the instructions for some
“special” and most of us fail the test as we didn´t read instructions.
Suggestion #5 Ensure that tests are well laid out and perfectly legible.
More than often, schools are trying to save money on materials and tests are
commonly printed in recycled paper or with a very small size to save space or they are
poorly reproduced. Sometimes, teachers design handwritten exams that are difficult to
read and sometimes are misspelled.
Suggestion #6: Make candidates familiar with format and testing techniques.
Let me say it nicely: your exam is not the perfect place to “introduce” your new exercise.
If students are asked to complete an exercise in the exam that they are not familiar
with, they are likely to perform less well that they would do otherwise.
Make sure you include items that students have practiced before during the regular
classes.
Note: you can find more suggestions in the book Testing for language teachers on page
58. Feel free to download the book from the platform.
IN order to finish with this section on validity and reliability we encourage you to not
forget that gathering evidence on students´ learning provides with the opportunity to
provide not only a grade to students but feedback that is useful for students to improve
and grow. Tests are one of the most useful tools we teachers have in our hands and
designing them correctly will give us more opportunities to improve the process of
teaching and learning English in our language classrooms.
Let´s go to the last section of this study unit, in which we will discuss different approaches to
testing, they will provide you with more ideas on how to make your exams better. Feel free to
stop for a minute and take some coffee. After you do it so, come back and let´s work together
on the last section.
In this last section, we will talk about approaches to test design, as they will help you
improve the quality of your exams, as we stated before. There are different approaches to
testing, but we will visit the following:
Approaches to Testing are different was to measure what students are learning and they help
us get more and better evidence on students´ learning. Remember, the more evidence we
have, the better judgement we can make.
Indirect test items aim at testing the underlying knowledge and capabilities of someone´s
skill. It doesn´t test the skill directly, you don´t ask Ss to write something, or to speak. The
TOEFL exam, tests the knowledge and control of grammar and vocabulary required to be able
to read and write, e.g. at the college level, or that is “acceptable in formal standard English”
(Weir:15).
Assessment tasks where Ss have to actually perform the skill we want to assess (speaking,
writing, reading, listening), are based taking a direct approach to testing.
As you can see this item takes a direct approach, the teacher is asking students to actually
WRITE, and the teacher can take a more accurate measure of students´ ability to write in simple
past.
In this second exercise, we assume that if the student can identify the sentences in simple past,
then, he is able to create his own sentences when asked to do so.
DIRECT: Instructions:
Listen to the audio, and as you listen classify the regular verbs in past according to
their ending sound.
INDIRECT: Instructions:
Read and classify the following list of regular verbs in past according to their ending
sound in the corresponding column.
Let´s see some other examples of direct testing:
DIRECT:
Instructions: (listening and speaking)
You will listen to a story about the last trip the Rodriguez family did.
As you listen, number the images as they appear in the story.
After you organized discussed with the person next to you what the best title for
the story could be choosing from A, B, or C.
As you can see, when you use a direct approach you are actually testing that students are ABLE
to perform the skill and use it in order to complete a task and we get a more accurate measure
of their “real” level of proficiency in the target language.
Instructions:
Organize the following sentences:
o Park / go / I / always/ to / friends / with / the
Instructions:
Circle the best word to complete these sentences.
1. Jack (go / goes) to a film club on Wednesdays.
As you can see in these two examples, we ASSUME that if students are able to organize the
sentence (in the first exercise) and if the ss is able to choose the right word (in the second
exercise), when ASKED TO PERFORM he will do it correctly; however, we just don´t know, we
just expect the student to truly be able to do it.
DIRECT:
Instruction: as you listen to the animal / sound put a mark on it.
INDIRECT:
Instruction: match the animal with the correct picture.
Instruction: choose the correct name of the animal (picture of a cow and you write: cow /
elephant.
Stop and Think: what approach are you using the most in your tests? Are you a more
direct or indirect? What implications are there based on your choices?
We do not pretend to encourage you to not use the indirect approach as it provides
some evidence that we need to gather, especially with beginners; however, we invite you to
integrate the direct approach in order to help your students perform better in the target
language.
Let´s now see the second approach to testing discrete and integrative.
4.2 Discrete and integrative
When we ask Ss to simply change one verb form for another, or to choose one word to match
the meaning given, that would mean that we are taking a discrete approach to testing. We
test language bits, one item at a time.
A writing assessment task asks Ss to show that they can write in the language, accurately, using
correct grammar and punctuation, and appropriate lexis. It also requires that Ss use cohesive
devices to connect ideas, and to give the text a structure. Because of this, direct testing is said
to take an integrative approach to testing. (It integrates a number of language points).
As you can see, you are only testing that students are able to use the verbs in simple past.
Therefore, we are saying, that as you are testing only one item of the language, you are using a
discrete approach to testing.
In the integrative approach students need to bring more in order to complete the exercise,
they need more knowledge, more vocabulary, more grammatical structures and therefore, the
exercise is more challenging.
INTEGRATIVE:
Instructions: READ the following two paragraphs of the story of XXX and write an
ending for the story.
As you can see, in order to complete this activity, the student will need to put more effort,
more vocabulary, etc. Clearly this distinction is not unrelated to that between indirect and
direct testing. Discrete point tests will almost always be indirect, while integrative tests will
tend to be direct.
This unit has probably opened a new dimension in your language teaching career as we are
exploring the different elements involved in designing an exam.
IN this unit we discussed the different types of exams (diagnostic, proficiency, placement, etc.)
and we talk about the importance of validity and its form and reliability. We went more in
depth in reading some suggestions on how to make our exams more reliable.
The last section of this unit is probably an opportunity for you to start designing exercises and
activities with a different approach and gathering more accurate evidence on what your
students are able or not to do in the target language.
The coming two units will cover more in depth the process of designing and exam and how to
test different language systems, skills and a gallery or different testing items we can use.
Welcome to the end of Unit Task. Remember, this task should be uploaded to the UAS Platform
before Sunday at 10:00pm.
Instructions:
Step 1) Choose the last written exam you applied to your students.
Step 2) Identified at least 3 different type of exercises you applied.
Step 3) Analyze each of the items and describe if they are direct or indirect and discrete or
integrative. Make sure you are able to explain and accurately identify the type of approach used in
the item.
Step 4) In a word document include:
A picture of the item.
An analysis of the item (step 3)
Step 5) After each of the items write the answer to the following questions:
Is the approach you used the best / more suitable option to do it?
What if you change it a little bit? Would it be better for you and your students?
In terms of validity, is your item valid? You need the types of validity to justify your
answer.
In terms of reliability, is it reliable? You need to use the suggestions in order to justify your
answer.
Step 6) Submit in a word document your answers to step 1 to 4.
Step 7) Upload it before Sunday at 10:00pm
Discussion FORUM
Instructions:
Step 1) Before you work on this forum, make sure you have read the different concepts
discussed in the unit (types of tests, validity, reliability, direct vs indirect, discrete vs
integrative, etc.)
Step 3) Answer the questions in a paragraph form, you don´t need to write the questions and
the answers. Make sure your ideas are clear and that they reflect your understanding of the
units. Separate your ideas and write paragraphs. Due date Tuesday.
Step 4) Read your partners´ comments and ask questions about their understanding of the topic.
Due date Thursday.
Step 6) Read the summary of the forum and react to it. Due date Monday