Language Testing
Language Testing
Heaton
.,„
Language
Testing
PREFACE
1 Language testing — is there another way? 56 Testing Spoken Language Keith Morrow
Brendan J. Carroll 59 Questioning some assumptions about cloze
11 Criteria for evaluation of tests of English as a testing Robert Keith Johnson
Foreign Language Alan Davies 73 Getting information from advanced reading
17 The great reliability/validity trade-off: tests Paul Nation
problems in assessing the productive skills 77 Writing in perspective: some comments on the
Nic Underbill testing and marking of written
24 Examinations — why tolerate their communication Brian Heaton
paternalism? Peter Fabian 81 Testing language with students of literature
28 The Cambridge Examinations — an exercise in ESL situations,/. P. Boyle
in public relations W.G. Sbephard 86 A place for visuals in language testing
32 Proficiency testing for tertiary level study and Patricia L. McEldowney
training in Britain Ian Seaton 92 Competence in English used for academic
36 Progress testing: preparation and analysis subject examinations Arthur Godman
C.S. Ward 97 Evaluation A. E.G. Pilliner
40 Tennis plays Nha — or how to humanise tests 102 Measuring student achievement: some
John Rogers practical considerations Frank Chaplen
43 An alternative approach to testing grammatical
competence Pauline M. Rea 107 Appendix: Research
48 Dictation as a testing device S.F. Wbitaker A comment on specific variance versus
52 Listening comprehension: teaching through global variance in certain EFT tests
testing techniques Penny Frantzis John W. Oiler Jr.
re race
Pref,
The publication of this collection of articles on terms direct, semi-direct and indirect to classify
language testing comes at a very opportune time, tests of speaking and writing ranging from highly
as recent developments in communicative language realistic tasks to unrealistic tasks. In the next
teaching are now resulting in a widespread re- article, Fabian reminds us that studying a language
appraisal of language tests and techniques. Not is from acquiring its communicative
vastly different
only have the many shortcomings of the structur- facility, and suggests that teachers can exert a
alistand behaviourist approach to language teaching beneficial influence on examining bodies by a
been strongly attacked, but the whole psychometric more critical and creative participation in the
basis of language testing has been seriously design and construction of examinations.
questioned. Fierce arguments still rage over such However, the solutions to the many problems
established criteria as test validity and reliability. facing test constructors and administrators are
Consequently, it is now necessary to take stock of neither as simple nor as straightforward as many-
our present-day tests and examinations in English might at first imagine. This is illustrated in an
and to re-examine many of the basic premises so article by Shephard, who traces the development
much cherished in the past. However, we should of the Cambridge EFL examinations from 1913,
take care not to discard every well-tried and with their emphasis on the formal correctness of
proven method of testing in our search for some language use, literature and translation, up to a
magic formula or new technique which will functional test currently under consideration, with
enable us to solve our problems in the assessment its one-third oral component. Throughout the
of language used as communication. All too often article, he shows the importance which the Cam-
in language testing, as in language teaching, there bridge Syndicate constantly attaches to public
seems to be a tendency for many to jump on any opinion, referring to questionnaires, correspon-
current bandwagon, accepting half-formed theories dence, queries, and important public relations
and applying them hastily and uncritically. As aspects of the Syndicate's work. Seaton also refers
most articles in this special issue of MET clearly to the problems encountered in constructing
demonstrate, considerable critical judgement is examinations which will be administered on a large
necessary in evaluating all the various types of scale throughout the world. He describes the
tests and the principles on which they are based. It various difficulties met and overcome in the
is always important to keep the best of established specification and design of the battery of tests set
methods while, at the same time, seeking to develop up and administered by the British Council and
where necessary new and more appropriate tech- the University of Cambridge Local Examinations
niques to reflect the different emphases now being Syndicate.
placed on language learning. Practical considerations in the use of progress
In the first article in the collection, Carroll tests by teachers and people directly concerned
points out that testing for any programme should with the running of particular courses are touched
be compatible with the ideas behind the teaching upon by Ward in an article on the preparation and
method used: hence communicative teaching pro- analysis of progress tests, while Rogers approaches
grammes should be assessed by communicative this topic in a different way by providing several
tests. The main aspects of communicative tests examples of ways in which test items can be
treated by Carroll are: the test-curriculum relation- devised so as to provide interest and even amuse-
ship, a purposive test framework, test content and ment on language courses.
procedures, levels of performance and methods of should be emphasised at this stage that a
It
analysing test data. Davies pursues this topic in concentration solely on such aspects of communi-
discussing criteria for the evaluation of tests of cative competence as authenticity, appropriacy
EFL, and examines three kinds of validity with and register is wrong if the testing of the gram-
examples of published tests
special reference to six matical system of the language is neglected and
in Britain. The question of validity and reliability regarded as subordinate or inferior in any way.
is taken up again by Underhill in an article which Language still grammar, as Rea points
consists of
identifies problems in assessing the productive out in her article on an alternative approach to the
skills of speaking and writing. Underhill uses the testing of grammatical competence within a model
rerace
Pref.
of language learning. She gives examples of ways in and the implications of such
local/linguistic errors,
which the selection and production of a language a marking of compositions.
classification for the
form is determined not only by its grammatical He concludes by questioning the validity of com-
correctness but also by its function within a given position tests which concentrate only on first
communicative area. drafts written within severe time constraints. This
While it is always important to experiment with subsection is then concluded by Boyle with an
new testing techniques, it is also essential to article discussing various kinds of tests of language
attempt to develop existing techniques much for students of literature. Examples of types of
further. After falling into disrepute in the 1960's, items testing reading, writing, listening and speaking
the long-established dictation test has been the are given in an attempt to foster an awareness
subject of considerable research during the past among teachers of the relevance and educational
decade. Whitaker examines the flexibility of values of the literature they teach.
dictation as a testing device and the various lan- Inter-dependence of verbal and non-verbal
guage skills involved, giving numerous practical information both in the classroom and in
suggestions for making dictation and a relevant the world outside suggests that visuals have a
realistic test. In a similar way, Frantzis touches on — provided
valuable role to play in language testing
the skills involved in understanding spoken English, that can be ensured that candidates respond to
it
describing in detail the use of a radio news bulletin language rather than find the meaning in any
for improving such skills in a systematic way. accompanying visuals. McEldowney shows how
After discussing problems experienced by visualscompensate for fragmentary verbal com-
teachers in constructing tests of spoken language, prehension questions on a text by helping to test
Morrow gives some suggestions for devising appro- objectively an awareness of the text content as a
priate realistic tasks, recommending the testing of whole, without overt clues from the questions
students in groups. He then gives a number of and eliminating the need for verbal production on
criteria for evaluating spoken language before the part of the candidate. McEldowney concludes
providing an example of the way in which these by showing how visuals can also be used to test
programme, and how a student's final grade can be mance in all the various tests studied, Oiler states
calculatedand decisions taken when several that there is no basis to conclude that a single test
teachers are involved in the process of assessment. such as a dictation or a cloze test is the best way
It is only fitting that this collection of articles to measure that general factor. He concludes that
should end with some brief comments by Oiler, methods concentrating on
a multiplicity of testing
whose research has contributed so much to the the kinds of language tasks which language users
development of language testing over the past will be expected to perform makes the best lan-
decade. Although maintaining that there appears guage test in any given set of circumstances.
to be a large general factor of language perfor-
ACKNOWLEDGEMENTS
We wish to express our gratitude to the following for allowing the inclusion of copyright material.
SEAMEO Regional Language Centre for permission to reprint, R. Keith Johnson's article Questioning
Some Assumptions About Cloze Testing, which originally appeared in Directions in Language Testing,
edited by John AS Read, Singapore University Press 1981 TESOL for permission to reprint
;
Dr. John W. Oiler's research notes A Comment on Specific Variance Versus Global Variance in Certain
EFL Tests, appearing in TESOL Quarterly 14, 1980; the Joint Matriculation Board for permission to
use two essay-type questions taken from their Test in English (Overseas), March 1968 and July 1968.
HIH
Brendan J Cairo
Language Testing
s tnere
th anotner
th. way ?
THE COMMUNICATIVE APPROACH repeated by observers of classroom practice in
Over the last three years, I have been taking part in many subjects and in many countries.
an evaluation consultancy Middle East
for a A second area of comment by the evaluation
teaching programme in which new communicative team which bears on our problem is the important
teaching materials are being devised to replace the question of correctness of children's language per-
structural materials in use there for many years. It formance. The team mentions the tension between
is interesting to look back on an early report of fluency of language use and accuracy of language
the evaluation team which commented on the pro- usage, recommending a more sensitive and tolerant
gramme as follows: attitude to student error. Language accuracy and
'The communicative approach stands or fallsby language fluency, it is maintained, should rightly
the degree of real life, or at least life-like, com- have varying priority at various points in the
munication that is achieved in the foreign lan- teaching/learning sequence and the final aim should
guage in the classroom situation. In our obser- be that the learners perform both accurately and
vations of classes in progress, was not often
it fluently to certain agreed levels.
that pupils communicated naturally and unself- In the programme we are discussing, there was
consciously with one another. Perhaps this is an obstacle to achieving these enlightened aims of
not surprising and should not be expected at greater pupil activity and a balanced attitude to
such an early stage in the project. However, it is correctness in that the tests being used to measure
worth stating that pupil-initiated communi- the children's progress tended to be traditional
cation should be one of the project's key aims, ones such as those focussing on the accurate
and the necessary provision should be made in mastery of lexical and structural items by indiv-
the materials and teaching methodology that it idual pupils. We thus had the position in which the
comes about We feel that there is still a children were being taught, as far as was possible,
tendency for teachers to talk too much and, by one approach — a 'communicative' one — and
correspondingly, for the pupils to talk too little being tested in terms of another approach — a
and that major aim of the teacher training
a 'structural' one. In truth, there were at the time no
programme should be to correct this imbalance. suitable, properly constructed tests for the pro-
We noted that the teachers whose classes were gramme, and this case is just one example of the
visited dominated the flow of communication dilemma we face at present. To test the accuracy
and allowed too little communication by and of a learner's knowledge of lexical and grammatical
between pupils.' patterns is a very different matter from testing the
These words are no doubt true and are often degree to which he has acquired the language so
Brendan J Cairo
that he can use it in the communicative settings he So far so good. But the testing problem has not
is likely to face. The effectiveness of a person as a yet been completely solved. For one thing, it is
communicator will depend on awide range of lan- often very difficult to decide if a sentence is correct
guage and non-language skills, test and an effective or incorrect without knowing the context in which
will have to specify and assess them — a much it was said. There are, for instance, certain com-
more complex task than the assessment of his munities in which
/ be and / were are accepted
mastery of lexical and grammatical items, which forms. For another, an utterance can be quite
can be much more easily pinned down and flatly incorrect from any formal point of view, and
counted. yet be perfectly intelligible to the ordinary listener.
It was at one time widely believed that a per- The French speaker who says 'I have been in Lon-
son's language competence could be adequately don since three days' is certainly incorrect in his
tapped by requiring him to respond to a string of English usage, but there will be little disagreement
separate multiple-choice items, sometimes as many among about the length of time he has
his listeners
as 200 in one sitting. Later, it was hoped that been in London. Much will depend on the amount
the assessment needs could be met by presenting of tolerance we are prepared to extend to such a
testees with the task of filling in randomly-selected speaker. Finally, the suitability of any utterance
spaces in a written or spoken text. We will examine will be closely related to the relationship between
more fully later the value of such techniques for the speakers: whether it is very close and informal,
testing language, all we will say now is: would that or distant and formal, and so on; and what would
measuring communicative performance were so be quite proper in one circumstance could be
easily done! To approach the problem method- quite offensive in another.
ically, I would like now to examine major prob- Who, then, was our sample sentence spoken by,
lems raised by the current structural-objective and who was the listener? Who are the mysterious
approach to language testing under five headings. 'she's', 'he's' and 'John's' of these pseudo-
utterances? Where are they? When was 'yesterday'?
1. The counting of bits. The plain fact is that, from the point of view of
2. The four-skill model. the tester, it just doesn't matter; these are not real
3. The place of correctness. utterances at all, but just vehicles for providing
4. The role of purpose. lexico-grammatical traps for the unwary.
5. Justification by correlation.
All this is just to say that an independent, de-
contextualised sentence is a very tenuous basis for
Later on, I will put forward five considerations for
making accurate judgements about a person's
broadening the approach to testing.
mastery of a language. It may well be that such
snippets will allow us to examine certain limited
1. The counting of bits
details of language performance, but I believe
If language performance is to be described by that any adequate test must consciously encompass
means of numbers, it would be most helpful if
in its design wider strategies and purposes of lan-
such performance could be broken down into guage use. Adding up the separate bits of perfor-
small, discrete parts which could be easily judged
mance, however many, cannot tell us the whole
as to correctness or incorrectness. Then the bits story.
could be put together in a test to provide a numer-
ical score, such as 35 out of 50. The tasks are 2. The four-skill model
unambiguous, the marking introduces no element One readily intelligible model of language des-
of capriciousness and a person's final score is clear
cription, and one widely used for many years, is
for all to see. Here is an example of such a test
that of the four skills of listening, speaking, reading
item:
and writing, certainly observable aspects of lin-
Yesterday I (A. be B. were C. was) very tired. guistic performance. It is tempting, therefore, to
(Instruction — choose the correct option in the specify our test content in terms of these skills —
and objectively marked, possibly by mechanical speaking and listening, between reading and
scanning methods. writing, and so on. The total communicative
2
Brendan J Carrol
situation cannot be adequately described by these tain agreed levels of performance, and within
separate categories. Furthermore, any test tasks agreed levels of tolerance. It is one thing to differ
must elicit some responses. Even an objective about ultimate aims regarding accuracy, it is quite
listening test required the testee to record some another to differ about emphases given to accuracy
response — verbal or graphic; a speaking test must and fluency whilst pupils are still struggling to-
be a response to some verbal or written instructions. wards those ends.
To resort to 'objective' techniques of ticking What features other than formal accuracy might
alternatives must trivialise the interaction, debasing one consider in building up assessment criteria?
one side of what is an essentially double-sided It seems to me that we can work at three levels,
spelt out in terms of pronunciation, spelling, message in given settings, through the strategies
vocabulary and grammar — have also been deficient used in achieving this level of effect, to the lin-
insofar, as they rely on features of formal usage guistic minutiae of spelling and pronunciation
rather than on effective use, and thus can be faulted through which the strategies are realised. This
•
on the grounds of the criticism we have already three-level hierarchy, arbitrary though it may be,
made of discrete-item, objective-type tests. can provide a framework for a comprehensive
assessment of performance. Below are listed a
3. The place of correctness number of factors subsumed under each of the
We mentioned at the beginning of the chapter the three levels.
matter of the correctness of children's language In one of our recently-framed writing assessment
performance. There are few more emotive educ- scales, a number of performance levels were spelt
one camp are the purists for
ational topics. In the out in terms of this hierarchy. The macro-scopic
whom any mistake — in spelling, grammar, pro- features, referred to as Message, included clarity of
nunciation, punctuation — is regarded as a per- presentation, coverage of points, validity of con-
sonal affront. To them the learning process boils clusion and the overall 'flow' of the work. The
down accompanied by
to the rooting out of errors, meta-scopic features, referred to as Text, included
indignant letters to The Times-. 'What is our educ- format and lay-out, coherence of theme, use of
ation system (or our country) coming to?' they cohesive devices, appropriacy of style and neat-
cry. In the other camp are the permissive ones who ness of appearance. And the micro-scopic features,
have little and who see any attempt
time for rules, referred to as Language, concerned the control of
to insist on their observance to be an assault on grammar, suitability of vocabulary, accuracy of
the liberty of the individual and his right to free spelling and intelligibility of handwriting.
expression. Merely to describe these two extreme Few would deny that such features as those
positions in this way is to imply that somewhere listed are very pertinent to the measuring of lan-
between them lies a sensible attitude to correct- guage performance. To confine our assessment
ness; the ultimate aim is to produce students who criteria to formal correctness would subtract greatly
can perform both accurately and fluently to cer- from the breadth and depth of our assessment. To
Brendan J Carrol
sum up on accuracy, then, we can say that it is an good at another (say, Essay Writing), and many
important criterion, but by itself cannot form the correlations fall in the area from +0.4 to +0.9 or
high degree of negative correlation is shown by a ponses of the sample being tested. If there is little
coefficientof —1.0, and absence of correlation variance in one or more traits, then their inter-
would be indicated by a co-efficient in the region correlations are most likely to be small. Further-
of 0.0. In practice, however, human traits and more, even if a correlation is sizeable, it is extra-
abilities tend to be related positively, that is, a ordinarily difficult to trace the precise reason for
person good at one task (say, Maths) tends to be its being so. It is often found, for example, that it
4
Brendan J Carrol
is similarities in testing method between the two their contribution to make in exploring the nature
tests rather than any relationship between the two of linguistic-communicative transactions. By all
traits which lie behind the correlation. Even with means, let us use them whilst recognising their
the use of factor analysis techniques, there are limitations, but let us not give up the more radical
many ways in which a factor pattern can be ex- probing of the and linguistic
social, psychological
plained. It is only recently that sophisticated entities involved in inter-personal communication.
methods of analysing factor patterns have been There is a place for descriptive and observational
used in language studies (see Palmer, A. and studies as well as quantitative, experimental ones.
Bachman, L., 1981). If these techniques are not We have now discussed five problem areas
used, unless every precaution is taken to identify associated with traditional language testing and
sources of trait and method variance and unless reach the following conclusions:
discriminant as well as convergent features are a What we need to do is to look at the whole
taken into account, it is very easy to leap to field of communication in devising our tests
unjustified conclusions from our correlationally and not restrict our task to the counting up of
handled data. easily-devised and easily-assessed bits of language
One particularly disconcerting assertion is one performance.
which says that such and such a correlation is 'a b We need to give systematic consideration to
good one'. But how can we tell what is a good, or purpose and strategy in our task design and not
high, correlation and what is a poor, or low, one? just rest with the simplistic framework of four
Is, for example, a correlation of 0.60 high, medium de-contextualised skills.
or low? And what about 0.85, or 0.52? There are c Our important assessment feature of accuracy,
three basic ways of answering these questions: or 'correctness', must be put in the context of
One-, to ascertain whether the coefficient is a range ofbroad and context-specific criteria.
statistically significant and unlikely to be due to d We a direct study of the dynamics of
need
chance. A coefficient of 0.60 with 100 subjects linguistic-communicative behaviour and should
would be highly significant in that the odds against not just rely on the circumstantial and post hoc
it having occured by chance are better than 100 to evidence of correlational statistics.
1. Clearly, then, there is almost certainly some- The question now is — if a new emphasis is
thing behind a correlation of this kind. needed — what should it be, and will it produce
Two : to calculate the percentage of shared variance workable tests? It is all very well to list the alleged
between the two variables. This is done by squaring defects of a current method of setting about things;
the correlation, so that the 0.60 correlation it is also expected that the critic outline an alter-
indicates that 36% of the variance is shared. By native approach and show that it has the potential
this criterion, the correlation is looking decidedly of producing better results. Is .there another way?
shaky as we have now left unexplained no less
than 64% of the variance.
Three-, to estimate the forecasting efficiency indi-
THE COMMUNICATIVE ALTERNATIVE IN
cated by the correlation. Using Kelley's coefficient
TESTING
of alienation, we find that a correlation of 0.60
The main aim of the new approach must be to
20%; that is,
has a forecasting efficiency of only
widen the basis of our tests from a narrow
we are only 20% better off than we would have
grammatico/statistical focus towards a broader,
been if guided by pure chance. (Indeed, even a
multi-disciplinary and multi-level approach which
0.90 correlation is only 56% better off than pure
can yet maintain essential features of measurement,
chance.) Our 0.60 correlation is now looking most
always remembering that language testing is much
decidedly chancy.
too important to be left to grammarians and
All this goes to show that it would be wise to
statisticians! The way we achieve this broadening,
treat apparently 'high' correlations with the great-
or opening up, is outlined below under five head-
est caution. A correlation is useful for comparative
—
ings to help provide answers in the problem
exploratory studies, but too flimsy a basis for
areas already discussed:
absolute value assessments. And, if the individual
correlations themselves are open to question, so 6. The test-curriculum relationship
much more so are the elaborate factor analyses 7. A purposive test framework
based on them. This is not to say that producing 8. The test content and procedures
correlational statistics and analyses is necessarily 9. Levels of performance
a fruitless operation; correlational approaches have 10. Methods of data analysis
Brendan J Carrol
6. The test-curriculum relationship justified on correlational grounds, such tests are
The close relationship of tests with other elements only a littlemore credible than the separate-item
of the curriculum has already been touched on. test described above. Who, for example, would
One way of ensuring their relevance to each other write a book, report or letter omitting every 7th
is to see that they both come from a common word, expecting the omissions to be remedied by
source, such as a specification of linguistic- the unfortunate reader?
communicative needs; thus the programmes of the If we are keen to see that our tests, in their
curriculum aim to meet those needs, and the tests content and tasks, are meaningful in themselves
aim to indicate how far the needs are being satis- and can be seen to reflect features of the settings in
fied. In this way, there is no 'general proficiency' which the testee will operate, we will, I fear, have
test, as this title would imply lack of specification to look further than these barren, eduationally-
of the language skills actually needed. Nor would destructive techniques.
there be a separate 'achievement test' element,
such as when a test springs exclusively from a 7. A purposive test framework
syllabus, because the spelt-out needs are there for To meet and to ensure a benign
learners' needs
direct use in both syllabus and test design. relationship between the
and other elements
test
Without some spelling out of language- of the curriculum, a justifiable approach is to
communication needs, and of related student make a clear and detailed statement of the purposes
aspirations, it is all too easy to base the language and settings of language use, and of the skills and
programmes on existing tests or examinations, functions to be called on, and from this statement
which are usually described in the vaguest of to generate the language, content and tasks which
terms, so that students spend their time either on a comprehensive test will have to encompass. All
past examination papers or on a course book these design operations must be carried out with-
designed to help students to pass the examination out losing sight of the specified purposes for learn-
with the minimum of effort. A recently published ing and using the language.
book of tests, for example, contains strings of The framework for approaching our test design
items (disguised here for security reasons) such as: may be adapted from such models as those of the
Council of Europe (see Van Ek, 1975) and J.Munby
(a) Her mother died when she was young, so she
(see Munby, J., 1978), to name but two. One of
was by her aunt.
the problems of these models is that they are
(4 options presented, the accepted one being
extraordinarily detailed and lengthy, and it is
'brought up')
difficult to fit even a fraction of the specified
(b) The lion fell into the set for it.
elements into a test of anything like an acceptable
(c) When he the age of 65, he will retire.
length. Thus, a decision has to be made at an early
(d) I stood in a for half an hour to get my
stage in test design as to how accurate and how
theatre tickets.
delicate, or detailed, the instrument must be. If we
(e) Roses are the flowers in our garden.
required a quick decision for rating students in
(f) You could tell from his big feet that he
broad categories for programme placement pur-
his father, (etc.)
poses, if any misplacements caused by the tests
The hapless examinee, had he not been conditioned could be remedied easily and if the applicants had
to such a might well ask who these
sequence, a wide range of purposes for learning the language,
mysterious and 'she's' are, and how he
'he's', T's' then a fairly short general test of competence
can plausibly be expected to consider in one could well fill the bill. If, however, the decisions to
breath a dead mother, a retiring employee, an un- be made are very refined ones, if the chances of
fortunate lion, a slow-moving ticket queue, a rose remedying them are small, and if the applicants'
garden and a boy with big feet. Apart from the language needs are very job-specific, then a detailed
inherent absurdity of this juxtaposition of topics, needs specification, careful test development and
the testee has to struggle through masses of options an allocation of time and resources for test appli-
which are either inappropriate or quite incorrect cation will be called for.
in the context of the sentence. It will, therefore, be for the organisers of the
Nor is the suspension of disbelief much less in testing operation to decide what scale of delicacy,
demand with those so-called integrative tests what tolerance of mistakes and what resources
which ask the testee to insert suitable fillers in shall be devoted to that operation. The object is
texts from which words have been eliminated at to devise tests which will give the necessary answers
random. Although much in vogue at present, and more economically.
6
Brendan J Carrol
To assist in making a reasonably exhaustive test 39 Distinguishing the main idea from support-
needs analysis, we have prepared working papers ing by differentiating: the whole
details
under the following headings which we illustrate from its fact from opinion, and a
parts,
by excerpts from an actual specification for a test proposition from its argument, (and so on
in English for Computer Programmers. for this and the other events).
of diagrams/tables/graphs/ into speech or for each medium is specified. In this case, a mini-
ary, or possible, to provide in any one test a com- to expect when awarding grades. Although basically
plete coverage of all the specified features but, subjective, thisexamining procedure is surprisingly
having selected the main events to be replicated in reliable and has a good backwash effect on the
the test, we will be able to include key features piano-playing of the pupils. To attempt to improve
implied by these events. reliability by breaking up a piece of music into
Test items themselves will fall broadly into small countable bits would, of course, lead to all
three main categories: sorts of musical absurdity, as the unit of perfor-
Open-ended items for which the testee is allowed a mance is the piano piece, which must be judged as
fair measure of latitude in carrying out the task, an artistic whole.
his performance being assessed on a graded scale In language assessment, probably the most
probably with accompanying actual samples of widely-used scale has been that of the American
different levels of performance. Foreign Service Institute (FSI) oral interview,
Closed-ended items, where the testee selects from which has proved to result in reliable, consistent
a given set of responses the one he considers the judgements whilst retaining some of the naturalness
most appropriate, from a 'Yes-No' dichotomy to of oral interaction. In the same tradition, we have
anything up to five possible options. devised scales of all sorts of communicative
8
Brendan J Cairo
activity described at several levels according to the deciles or centiles. Thus the most accurate infor-
main parameters of level, activity and descriptor. mation obtained is about where in the population
The levels range from 1 to 9. The activities are, in any individual lies — in the top 10%, or at the 50%
effect, important communication focusses as des- point, and so on. What are so often lacking in
cribed in our category of event/activity above, and such approaches are detailed descriptions of test
can be based on the 'four skills' categorisation if content and of the exact nature of a 'good' and a
appropriate. The descriptor consists of a label 'bad' performance. Briefly, the basis of perfor-
(e.g. 'non-writer' through to 'expert writer') mance assessment has been the relative perfor-
followed by a brief thumbnail sketch of the critical mance of individuals in a sample rather than on
features of a typical performance at that level, links with pre-determined behavioural criteria. We
which can be elaborated in terms of up to a dozen thus have the constant danger of vague and circular
performance criteria for detailed design purposes. reasoning behind our testing.
The scales are usually accompanied by photostats It would be wrong, however, to decry these
or tape recordings of performances judged to be 'norm-referenced' techniques because, as methods
representative of each level. For objective, scored of exploration, they can give us many initial
tests, conversion tables are provided to convert insights into behavioural problems. For instance,
raw scores into levels on the 1 to 9 band system. some years ago it began to be suspected that
Initial figures for reliability and validity for smoking and lung cancer were related.
cigarette
such tests compare very favourably with those for What doctors wanted to discover was the actual
current widely-used language tests and, more causes behind the observed correlations of cancer
important, authenticity and testee motivation are and smoking. So there came a time when corre-
much more in evidence. lational, circumstantial evidence had to be rein-
forced by more precise, and direct, techniques
10. Methods of data analysis for describing, observing and measuring the related
A perennial problem in measuring human behaviour phenomena. In short, it is crucial that we move as
is how to describe the target behaviour unequiv- soon as possible to a more authoritative test basis
ocally, and to assess accurately how far an indi- and do not remain shackled in the permanent
vidual reaches, or falls short of, that level. The relativity of norm-referencing.
demands are for valid, systematic description, for
accurate observation of responses, and for reliable CASE STUDIES IN COMMUNICATIVE
measurement and data analysis procedures. In the TESTING
main, language-communication tests have suffered There are now interesting developments in the
on all three counts, description having too often direction of communicative testing in various parts
been vague or based on limited, linguistic, features of the world, and I will mention some which I
of performance, observation methods being such have been connected with. Others are no doubt
as to destroy the very communication processes being reported on as I write.
under scrutiny, and data handling being focussed
The Royal Society of Arts examinations in the
on counting the bits with analysis by correlational
communicative use of English. (RSA London,
means. Thus the measuring devices have tended to
1980).
demean the very processes they were trying to
The English Language Testing Service set up by
measure causing the results, however precise in
the British Council and the University of Cambridge
appearance, to be of very little significance.
Local Examinations Syndicate, (descriptive Hand-
In the absence of direct measurement based on
book in preparation, 1981).
carefully spelt out, relevant criterion behaviour,
The 'Crescent' English Course and the Mid-East
much reliance has been placed on sampling and
projects associated with it. (O'Neill, T. and Snow,
probability theory, the scores obtained from
P., Oxford University Press, 1979 onwards).
selected samples of performers being arranged in
British Council Course in Testing, testing for
order.
communicative programmes, Course 040 (report
Crucial importance is given to measures of cen-
in preparation by the British Council, 1981).
traltendency (mean, median, mode) and dispersion
The Pergamon English Tests in both general and
(standard deviation, range, quartile and mean
specific areas of English, (in production by Per-
deviations). Norms of performance are established
gamon Press, Oxford starting in 1981).
on internal criteria, a 'good' score being expressed
in terms of standard deviations above the mean, a It is suggested that the above reports be referred
poor by standard deviations below the mean; or by to and, when early versions of the tests are released
9
Brendan J Carrol
from security, the actual tests be studied. It would Carroll, B.J. (1980), Communicative tests for communi-
who would claim to have resolved cative programmes. Paper given at TESOL Convention,
be a rash person
Detroit, 1981.
the incompatibilities between the terms 'test' and
Fishman, J. and Cooper, R.L. (1978), 'The socio-linguistic
'communication', but at least we can claim to foundations of language testing'. In Spolsky, B. (Ed.),
have made genuine efforts to produce tests which Advances in language testing series No. 2, Washington
contain identifiable communicative features. D.C. Center for Applied Linguistics.
Morrow, K.E. (1977), Techniques of evaluation for a
notional syllabus, Centre for Applied Language Studies,
University of Reading. Study commissioned by the
Pertinent references Royal Society of Arts.
Canale, M. and Swain, M. (1980), 'Theoretical bases of Munby, J.L. (1978), Communicative Syllabus Design,
communication approaches to second language teaching Cambridge University Press.
and testing', Applied Linguistics 1.1. Palmer, A.S. and Bachman, L.F. (1980), Basic concerns in
Carroll, B.J. (1978). An English language testing service: Test validation. Paper for RELC Seminar of April
specifications, The London.
British Council, 1980 (to appear in proceedings). More recent develop-
Carroll, B.J. (1980), Testing Communicative Performance, ments of this project were reported in TESOL, '81,
Pergamon Press, Oxford. Detroit.
10
Alan Davies
In this paper I want to make some general com- and more optimistically, the test can be used as a
ments about test choice and test construction, and check on the guesswork. I do, of course, admit
then to discuss six examples of (British) published that in the sense in which I have used guesswork,
tests referring particularly to use in an English as a language proficiency tests are also guesswork; but
Foreign Language testing situation. it does make sense, I suggest, for new courses,
I plan to begin by making a series of very large syllabuses, etc. to be piloted — which they are —
but, I hope, gnomic generalisations. Even if nothing and tested — which they are only rarely. Testing
else in this paper sticks, I hope that one or two of and teaching have the same interests if not the
these statements will. same purposes.
hold up internally, but, as we shall see, what is 4. Criterion and norm referenced tests are the
11
Alan Davies
series of criteria outside the system. Of course a course of textbooks, or a series of volumes on a
The first question to ask in language testing is they would be found out. Such institutions have
always for what? There are degrees of query, from everything in their favour — the reputation, the
the most lay (I want to know how much English contents, the money (and staff) for research and
they know) through the more refined (How good development and also the acumen to be at the
is their spoken English? Have they mastered that forefront of new ideas (although on the whole
communicative language syllabus?) to the precise testing is fairly conservative). It is probably the
(Does she/he have adequate English to work as a case that test development is much more a matter
secretary in a travel agency? What are the main of administration than of ideas. If you or I have
errors that need remedying in this unit of the those ideas, we'd be much better off selling them
syllabus?). Note that the precision is more apparent expensively to one of the prestigious institutions I
than real, in that the content of a test is necessarily have mentioned so often. This makes it really very
based on the earlier sampling of the syllabus or difficult for an organisation which is only partly
other universe of discourse. The test specifications (or locally) prestigious to see its way forward.
need to be as precise as possible in language con- Without government authority (as in state
full
tent validity, and they must attempt some kind of education systems, which tend to be very national-
external validation. Thereafter,
the question is istic) such an attempt has no chance of success
properly circular, and the test exists as its own unless it can cleverly gain some sort of tie up or
criterion, viz. Do they have enough English for sub-contract with one of the major, internationally
this test? known prestigious institutions. The West African
I want to propose that we use three kinds of Examinations Council did this, as have others,
validity in our evaluation of EFL tests. They are as with the Cambridge Syndicate; and it is interesting
follows: to see a more recent tie up between the Oxford
Examinations in EFL (Oxford Delegacy of Local
Validity 1 is concerned with the internal logic of
Examinations) and ARELS (of which more later).
the system, making use of levels or grades, each
However good an examination may be produced,
one dependent on and interlocking with the others,
it will just not gain equivalences, acceptance, credi-
such that the use of one element implies all the
bility unless it has such backing. There may be one
others (as in, for example, a printed circuit). It is
exception to this, and that is by making use of
no doubt the case that there is more at stake than
Validity 3 — though I am sceptical even here.
harmony or use in setting up a system of levels;
there is also the political, publicity, financial Validity 3 — the examples of EFL tests I will men-
aspect in the sense of capturing an audience, as in tion are all examinations rather than tests, and
12
Alan Davies
while these terms tend to be used quite often the what we look for is some evidence that justifies
» one for the other, it is in Validity 3 that they the cut off, the test and the levels of which they
differ. Validity 3 is the appeal to some external are part.
criterion, i.e. the establishing of validity statistically So what we look for in our tests of EFL are
through concurrent or predictive validity. I am these three validities. (Incidentally we also look
unaware of this being done — normally — for EFL for indications of reliability, and I can say now
examinations; it is, however, done as a matter of that no indication is given — the word is not men-
course for EFL tests. There are various reasons — tioned.)
examinations tend to be less clearly directed at a So back to a 'typical' EFL situation and against
given population; they change more frequently; such a situation I want to consider five tests of
they are less objective (in terms of items); and EFL, viz.:
and college exams, these can all be made to work. 'The tests are operational in nature, i.e. they are
Although it is time consuming to do the work, it is intended to measure whether or not the candidates
essential because, if new tests are needed, then can do certain things in English. The 'things' they
presumably they are doing something better than are asked to do are specified at each level and
the old test — and what they are doing better is represent authentic tasks of the sort which con-
predicting the criterion that one has selected. The front language users in real life.'
alternative is to fall back on Validity 2, and say I 'Authenticity is thus of major importance. It
know it's better and I'm an expert. This is certainly affects not only the tasks which the candidates are
respectable, but I want to maintain that it is not asked to perform ('Is this the sort of task which a
enough. It is, perhaps, a little better to provide real person in real life might want to do?) but also
verbal descriptions of performance at various cut- the type of texts which the candidates use in the
off scores (e.g. students can ) but once again Reading and Listening tests and produce in the
13
Alan Davies
Writing and Oral tests ('Is this a real English text the First Certificate in English (FCE) containing
being used for a relevant purpose?').' Reading Comprehension, Composition, Use of
English, Listening Comprehension, and Interview;
Content Areas What are the general areas in
which candidates taking this exam the Certificate of Proficiency (CPE) in English
yet since this is a new examination. However, we 'Although only Proficiency and the higher
are not expected to see the examination as relevant examination for the Diploma of English Studies
to the typical EFL situation since, at the beginning have any official recognition or equivalence with
we are told — 'the language and skills tested are the GCE or foreign counterparts, the main demand
clearly and explicity those in in-Britain is for the lower level examination and in its own
English'. This is fine if you intend to visit Britain right rather than as a preparatory stage for Pro-
in the near future, but I am less sure of using that ficiency as it was designed in 1939. The attraction
as the criterion. It is interesting to observe the ten- seems to be its simplified yet adult character and
dency in recent years to make life in Britain, inter- its attempt at an international flavour and freedom
action among native-speakers
least with or at from over-literary emphasis.'
them, the focus of attention in language teaching
materials, again perhaps one of the unexpected '
Proficiency (is) recognised as a
(and not altogether welcome) results of the com- quite demanding test of general literacy and
municative movement. maturity.'
14
Alan Davies
How do the Cambridge examinations measure 'Particularly in the lower grades the syllabus is
up to our Validation? It looks, in spite of dis- formed chiefly on a sequence of basic English
claimers, as though Validity 1 is there: we are told structures and usages. Because they are basic they
that there is no connection between FCE and CPE are important to communication by means of
and that very few people go in for the Diploma, English.'
but the system exists, and although FCE is not As we might expect from a College of Music
officially required for CPE, its parameters will there is very clever use of the internal system of
certainly be used as informal measures when interlocking grades — 12 in all. 'The grades are
assessing a potential CPE entrant. (It is interesting regarded as milestones along the road towards an
that Cambridge think too many people enter for advanced oral command of the language'.
CPE who shouldn't. Perhaps an intermediate level So 'yes' to Validity 1, the most valid (in this
between FCE and CPE is needed, and a lower Pre- sense) of all our EFL examinations.
liminary examination in English is now under dis- As to Validity 2, Trinity College is less of a PI,
cussion.) but still well-known, and so it scores reasonably
What of Validity 2? The PI backing is certainly there. Does the test have content validity? It
there. What of content validity? I am less sure claims that it reflects the theory and spirit of
here. Little indication is given, and we need to modern foreign language teaching, and the syllabus
look at past examination papers, or trust the Cam- information for each grade is the most detailed of
bridge judgement. But I am not too happy about all five examinations. The problem Iwonder about
being told that this is and com-
'an accurate, valid is whether the syllabus is really a spoken English
prehensive test' without being told why and how. syllabus or a written one. 'The principal aim', we
Validity 3 — lacking again. As I have said before, are told, 'is to find out how well the student under-
I think there is less excuse for Cambridge precisely stands educated spoken English within the limits
because they are the leaders and the most PI. Per- of each Grade, and how well he or she can speak
haps they do validate in this way, and I should be it'. Now there is a dilemma here. For native
grateful to be pointed to those statistics. speakers, there are substantial differences between
the spoken and the written language: what the
3. Trinity College, London. Grade Examinations communicative language teaching mission has
in Spoken English as a Foreign or Second Language attempted is to extend this difference to EFL
and Written English speakers. I am not sure (and I suspect others are
The following quotations have been extracted not) that this is a proper or feasible aim for TEFL;
from the Trinity College information booklet: it may be that we should accept for EFL spoken
'The examinations are recognised by the British English a limited goal which is written English
Council as comprising a useful series of graded spoken aloud. This is what (I think) Trinity College
tests in oral communications ability.' is doing, and I wish it would say so.
'This syllabus has been compiled to meet Again there is no evidence of Validity 3.
the needs of children and adults learning English as
a Foreign or Second language. It reflects the theory 4. Oxford Examinations in EFL (the Oxford
and spirit of modern foreign language teaching.' Delegacy of Local Examinations) (quotations from
Oxford Delegacy information sheet)
'The principal aim is to find out how well the
This is another new examination.
student understands spoken English
'educated'
principally con-
'The examination is
within the limits of each Grade, and how well he
cerned with assessing performance in a very practi-
or she can speak it. Importance is attached to the
cal way by using test items from among the reading
candidate's pronunciation, readiness, fluency, and
and writing tasks candidates might be expected to
comprehension, and to the appropriateness and The
have to perform whilst in England (sic.)
grammatical accuracy of the English used, but
level of the exam is below that of the Cambridge
above all to the ease with which the candidate can
FCE.'
communicate by means of English.'
'There are two papers:
'Progress is made by small steps from a very 1. practical writing skills
elementary level of achievement (Grade 1) to a 2. practical reading skills
very advanced one (Grade 12) the grades 3. ARELS preliminary Oral Examination
are best regarded as milestones along the road (optional)
towards an advanced oral command of the lan- Validity 1 can be claimed only by the attempts
guage.' to equate the test to other measures, e.g. 'the level
15
Alan Davies
of the examinationis below that of the Cambridge 6. The PALSO Examinations (Panhellenic Asso-
and again 'the Delegates' English
First Certificate' ciation of Foreign Language School Owners)
Committee is happy to recommend (the Prelim- Last year the PALSO organisation carried out its
inary Oral Examination of the ARELS Exam- own trial examinations at three levels, Basic,
inations Trust) as a counterpart to its own written Standard and Higher, so there was the attempt to
examination. achieve Validity 1. The other validites were more
Otherwise there is no internal system. difficult: for reasons I discussed earlier, PALSO is
Validity 2: the PI backing is there. As for con- not a PI — examination to gain equi-
if it wishes its
There are three levels, Preliminary, Certificate and for life in Britain, for communicative interaction,
Diploma, 'designed specifically as a reliable means or is it meant more generally (with the implications
of assessing ability in the use and comprehension of perhaps a Trinity College-like syllabus)? And
of Spoken English'. does it provide the validities I have listed — ways
The ARELS examination has a lot going for it. of helping us to evaluate and choose a test for our
It has Validity 1 (three interlocking levels), and it own use?
has Validity 2: PI backing and content validity, I have not mentioned other examinations like
which is certainly approved of by teachers and is the Joint Matriculation Board, the new Associated
not limited to Britain or, as Oxford quaintly put Examining Board, the Stages of Attainment Scale
it, England: so it is available for the typical EFL from the English Language Teaching Development
situation. I must enter a caveat here though, which Unit, the English Speaking Board, the Regent's
is that the cultural requirements of some parts of School Test and the new English Language Testing
the ARELS examination make it quite difficult Service of the British Council (interesting, I
for someone who is not actually in daily contact suspect, because it represents a move away from
wtih native speakers in their own environment. the previous test to a more examination-like
Validity 3: if there is none, it is not for want of instrument, more comparable therefore to the
trying, since I did myself agree to look into the measures I have spoken of today).
of the ARELS examinations
statistical validation There is a paradox what for question. A
in the
and I hope to get round to this soon. In the mean- test must be demands — but it
related to local
time I offer my apologies and note that the ARELS must also have wider currency: the best test man-
Examinations Trust are aware (unlike the others ages to serve both ends, the local and the global or
we have discussed) of the need for Validity 3. international.
16
Nic Underhi
The two principal criteria for evaluating any kind Compared with the characteristics listed above,
of test are reliability (whether it gives consistent (a) this is a discrete-item test — it aims to test
results) and validity (whether it measures what — only one component of language (vocabulary)
you think it does). — through only one skill (reading)
The main problem with tests of speaking and — and one aspect of that skill (receptive recog-
writing may be stated simply: high reliability and nition)
high are seemingly incompatible. The
validity An oral interview, on the other hand, requires
situation complicated by the existence of several
is the testee to listen (receptive) and speak (pro-
different kinds of validity, some theoretical and ductive), using many components of language —
intuitive and others empirical and quantifiable. As grammar, vocabulary, pronunciation, stress, all at
a result, what may be valid for one school of the level of discourse rather than the single word
thought may not be for another. or sentence.
If you are of the 'onward march of science' (b) item is objectively scored — the teacher/
this
frame of mind, you may be convinced that it's testerdoes not need to exercise his judgement in
just a matter of time before the ultimately reliable marking it; the decision has already been taken for
and valid productive test appears. If you believe him about the correct answer. In tests of productive
that real language use only occurs in creative com- skills, by contrast, he may have detailed guidelines
munication between two or more parties with to help him assess the testee's performance, but
genuine reasons for communicating, then you may ultimately he must use his judgement to decide on
accept that the trade-off between reliability and the value of a particular response. (It is worth
validity is unavoidable. Testing is an inherently noting that of the three stages common to language
artificial situation; the question is, how artificial
tests, viz.
can it be, and yet still be considered valid?
1. the compilation/construction of the ques-
This article outlines some of the attempts to
tions
resolve the reliability/validity trade-off, and con-
2. the answering of the questions by the testee
siders the influence of a third criterion, practical-
3. the marking/scoring of the testee's answer,
ity. Itexamines the chronological development of
only in 3. can the objective/subjective distinction
productive testing and the test types in vogue at
be maintained; all tests are 'subjectively' compiled
each stage. For the sake of brevity, examples of
and 'subjectively' answered) («).
test items are kept to a minimum — all the standard
works on testing contain numerous examples (t). (c) although any particular question must be tried
First, in the interests of successful communi- out in practice, this type of item is generally con-
cation, some definitions are in order. sidered highly reliable — from
one administration
WHAT MAKES A PRODUCTIVE SKILLS TEST to the next, without intervening tuition, the same
DIFFERENT? testee will answer the question in the same way
Considering for the moment only the oral inter- and the marker will mark in the same way.
view and written composition, the following In an oral interview or written composition,
characteristics distinguish tests of productive skills: there are three principal sources of unreliability:
(a) they are integrative tests 1. the testee may produce different answers to
(b) they on the whole, subjectively scored
are, the same task from one day to the next — he
(c) there are serious doubts about their reliability may be feeling uncommunicative, morose,
(d) they are, or can be, direct and pretty realistic uncomfortable, deaf, antagonistic to a part-
measures of performance in real-life situations; icular interviewer or composition topic, or
therefore just lacking in the confidence necessary to
(e) they have high face/content validity. produce connected self-expression.
So how do they contrast with other tests? 2. a single marker may score a particluar res-
Consider this item from a multiple-choice test: ponse differently from one day to the next
'The opposite of strong is 1. short 2. poor (for similar reasons!) (a problem of intra-
3. weak 4. good' marker reliability)
17
Nic Underbill
3. two or more markers may give different exercises, etc.); an indirect test requires no writing
scores to the same response (a problem of at all.
inter-marker reliability).
(e) because of its lack of realism, the discrete-
item, for the majority of students and teachers,
(d) the item is a thoroughly
multiple-choice
has low face validity — it is decontextualised, non-
unrealistic measure of language performance. It
creative and generally doesn't look like a valid test
does not reflect actual language use — there is no
of language ability. While criticisms can certainly
real-life situation in which we go around asking or
be made of specific productive tests in this respect,
answering multiple choice questions. Productive
— the option is open to the tester to make the testing
skills tests are not necessarily ultra-realistic few
situation as realistic as he has time and money to
adults have the inclination or the incentive to
spare. The more effort is expended in this direc-
produce compositions, and the atmos-
written
tion, the better the testee will respond by treating
phere an oral interview can be notoriously
in
as a valid task.
strained — but they should nonetheless be more
it
means of taped, visual or written cues (e.g. guided raised about the reliability of composition as a
or picture composition, gap-filling, re-ordering testing device from the point of view of the testee
18
Nic Underhil
the task and the marker. Few would argue that Such marking schemes are in wide use today; they
their performance on a creative writing task will are difficult to construct and often hard to inter-
vary more from day to day than their performance pret, but many assessors feel happier with some
on a multiple-choice test; or that a particular com- sort of written protocol for assessing both oral and
position subject will suit some people better than written work. In 1949, the battle lines were already
others, irrespective of their writing ability. From drawn up:
the examiner's point of view, a partial solution is
'Among teachers of English, a constant battle
to offer a choice of composition topics; but this
iswaged between supporters of analytic
compounds the problem of the comparability of
marking, and those who believe wholeheartedly
student responses.
in general impression it should be noted
In 1930s and 40s, when TEFL scarcely
the
that the analytic schemes were born out of a
existed as an independent profession and took its
realisation of the general unreliability of essay
methodology from first- and foreign-language
marking, and some schemes have gone to extra-
teaching in schools, examiners of English school-
ordinary lengths to achieve 'objectivity' and
children agonised for many years over how to
thus consistency', (viii)
reduce these sources of unreliability, and their
conclusions laid the foundations of the composition Already an association, if not an equation, was
marking schemes in use today. being made between reliability and objectivity.
Although neither method was proven in experi-
ments to be clearly superior, the analytic method
1. Standardisation of marking schemes
tended to be more popular, partly because of the
Two principal marking schemes were used: the
appearance of greater objectivity. To counter this,
method of general impression and the analytic
was recommended that three or four markers,
method. The first is exactly as it sounds — the
it
listing a number of sub-scales and specifying the As a corollary of standardising the marking
weight to be given to each in totting up the overall method, the principle was established that regular
score. For example, the FSI Oral Interview scales meetings should be held to brief assessors on the
are Accent, Grammar, Vocabulary, Fluency and marking scheme, and by 'test-marking' sample
Comprehension, weighted approximately in the compositions and comparing results, to standardise
ratio 1:9:6:3:6 respectively. the way all the markers applied that scheme.
Within the analytic method, there are again
marks within each
different systems used to assign 3. Increase the number of questions
category: (i) Impression; (ii) Additive (giving one (This is a well-established way of improving the
mark for each of a number of pre-selected feat- reliability of any kind of test, within the limits of
ures); (iii) Subtractive ('from a sub-total of 10 for fatigue and endurance.) For composition, it was
mechanics, subtract h mark
x
for each error of suggested that two or three shorter pieces give a
spelling or punctuation'); and (iv) Marking Pro- more reliable indication of the testee's performance
tocol,which is a pre-defined scale of levels, e.g. for than one long one.
Vocabulary:
4. 'Task-realism'
10 points — no errors The importance of task-realism was another notion
8 points — occasional misuse, but expression to be discussed long before it became fashionable
hardly impaired in the field of TEFL. Hartog said:
6 points — fairly frequent misuse, which may
'In real life, a person does not just 'write'. He
limit full expression
writes for a given audience and with a given
4 points — limited vocab. and frequent errors
object in view, which may be to explain, to
clearly hinder expression
persuade, to give an order or indeed
2 points — vocab. so limited and so frequently
fulfil any other purpose or combination of
misused that reader must often rely
purposes.' (ix)
on own interpretation
points — limitations so extreme as to make He carried out an investigation to compare perfor-
comprehension impossible. mance on 'Directed' versus 'Undirected' essay
19
HH
Nic Underh
subjects; for example, one of the Directed essay syllabus, and which is an open invitation to the
subjects for English schoolchildren was: testee to commit all kinds of errors. The interview
fared slightly better by virtue of being oral; but as
'Describe a school speech day at which you
unstructured elicitation procedures, niether was
have been present as if you were writing to a
acceptable to audio-lingualism.
boy or girl who has been prevented by illness
Both could be made more acceptable by asking
from being present.'
a pre-determined series of questions designed to
compared with the Undirected essay topic 'A elicit one-sentence answers containing specific
School Speech Day'. (Note that, for the purposes structures or functions. This technique of struc-
of this paper, I am not drawing a distinction tured interview and guided composition is in wide-
between 'essay' and 'composition', as some authors spread use today and has the advantage of eliciting
have done), (x). Hartog concluded: comparable speech samples from each testee while
being productive; but in the process, of course,
still
'The majority of examiners were decidedly of
the exercise becomes progressively less life-like,
the opinion that a Directed essay subject
ending up with a string of unconnected, decon-
yielded an essay of better quality than the
textualised stimulus-response questions.
corresponding Undirected essay subject, and
Another big disadvantage in audio-lingual terms
that it could be marked with greater
is that the unstructured interview and unguided
confidence.' (xi)
composition are not amenable to the objective-test
(ii) COMPOSITION AS THE PERPETRATION format, such as multiple choice, where only one
OF INJUSTICE correct answer is possible. This was another incen-
With the advent of the audio-lingual methodology tive to restrict the creativity involved in production
in the fifties and sixties, there were a number of tests; written gap-filling or sentence transfor-
important changes in the teaching and testing of mation, and simple oral question-and-answer
the productive skills. Principally: could be marked objectively and thus reliably.
There was certainly an awareness of what was
1. Speech was considered primary; the aural/
being lost in the process:
oral skills became the main objective of
language teaching. '
often we have to choose between more
2. Language was learnt by habit formation, apparent validity but less objectivity and more
mainly through repetitive oral practice. Written objectivity but less apparent validity' (xii)
work, of a rigorously controlled nature, was
but reliability was, and stillis, considered to be
permitted only after the patterns had been
logically prior to validity. Following the assump-
properly established. This was because of 3
tion of the discrete nature of language skills,
below.
numerous were devised to assess particular
tests
3. Making mistakes set up wrong patterns and
elements of language via oral production — pro-
could lead to the formation of wrong habits.
nunciation, grammar, vocabulary, intonation,
As far as possible, materials were constructed
stress,etc. — Valette (1977) gives many good
to reduce the chances of error, by careful
examples of this genre. Because these questions
sequencing of structures and adequate prac-
were composed of unrelated questions with a
tice of each structure before moving on to the
single correct answer, e.g.
next.
4. Not only was language proficiency composed 'The man who flies an aeroplane is a
'
20
Nic Underh
test. In order to maintain the same content vali- Inevitably, there was a resurrection of the argu-
dity, it had to be shown that the items tested were ments about the best way to score such tests;
in some sense representative of the testee's overall Rivers, along with many others, was in the analytic
proficiency; a lot of hot air was generated trying camp:
to establish adequate sampling techniques before
'an overall intuitive grade for written compos-
it was realised that this task was a linguistic as well
ition can be seriously influenced by neatness
as a statistical impossibility.
and clear writing. The grade should be a
Written composition became an outcast — one
composite one (xvi) '
'Tests of composition are necessarily unreliable 'It is impossible to obtain any high degree of
and of doubtful validity. Since, however, it is by dispensing with the subjective
reliability
important that composition should be taught, element and attempting to score on an
and since if not examined, it may not be 'objective' basis.' (xvii)
taught, it should be included in a language
During the seventies, the realisation^ spread that
examination.' (xiii)
language could not be divorced from its contexts-
Others were more extreme in their rejection of of-use for purposes of teaching and testing. New
composition: tests of productive skills were consciously task-
oriented, aiming for the best possible face/content
'
attention should be drawn to the con-
validity; for example, letter writing became a
sensus that injustices are prepetrated every
popular choice for written composition, and situ-
time an essay is set at an examination
ational role-plays for oral interviews.
it is widely recognised by linguists that an essay
Experiments to find ways of improving the
isnot an adequate test of knowledge of the
reliability of the more direct and realistic tests
language. If the student is cunning, he will
came up with the same results as before — increase
avoid constructions he is not sure of and create
the number of markers, hold standardisation
situations in which he can use his pet
sessions, use an analytic scheme — and the reli-
phrases.' (xiv)
ability claimed for some productive tests is high
This, surely, is the essence of communicative by any standards. However, such tests are expensive
ability: to make the best use of those language and time-consuming to administer and mark. This
elements and structures which one does command criterion of practicality is especially important if
and to avoid those which are likely to be com- you want to construct a test battery to be given
municatively ineffective. Some people do this to large numbers of testees all over the world and
better than others; they are better communicators. then marked as economically as possible.The
See the cunning student twist and turn to avoid search was on to find a valid and reliable method
the third conditional! of testing productive skills accurately without the
practical disadvantages of highly realistic tests.
Two solutions were offered: semi-
possible
21
H
Nic Underbill
limit, there will only be one correct answer. There An enormous amount of research has taken
is an immediate gain for reliability; with only one place, mostly in America, into constructing and
possible answer, all the marker has to do is to using easily administered but indirect tests. For
decide whether that answer has in fact been given example, a lot of work has been done to promote
or not. At the same time, the testee is deprived of cloze tests, both conventional and in many vari-
the opportunity to be creative or to display his ations, as tests of global proficiency, including
communicative proficiency in a realistic manner. speaking and writing skills.
However, there are many real-life situations in The new Test of English for International
which we dq hold a natural conversation about a Communications (TOEIC) is entirely receptive; it
visual, recorded or written stimulus; and such a consists of two hundred multiple-choice items,
conversation in an oral interview can be a lot half reading comprehension and half listening com-
more realistic than a so-called direct speaking test prehension. But the Educational Testing Service
in which the interviewer is controlling and struc- felt able, by means of concurrent validity corre-
turing the conversation so as to elicit particular lations, to interpret the scores in terms of speaking
structures or functions. and writing ability:
Direct and semi-direct tests should be regarded
'The correlation between the TOEIC listening
as being on a continuum from the most realistic to
part score and the direct Language Proficiency
the and the position of a particular test on
least;
Interview is 0.83. This high degree of
this scale can only be determined by an intuitive
would seem to indicate that the
correlation
examination of the test itself. Consider the
TOEIC part score is a good predictor of the
following description of an oral test: '
candidates' abilities to speak English
'The examinee is presented with four pictures
and again
differing significantly on one or two conceptual
dimensions. These may represent, for example, 'The direct writing measures correlated 0.83
a person performing four different actions, or with the TOEIC reading part score. This high
the four conjunctive possibilities of a man with correlation suggests that the TOEIC reading
or without a hat walking up or down a score is a good indication of the examinee's
staircase. ability to write in English' (xix)
The examinee is instructed to provide a single
sentence description to a visually remote This use of concurrent validity studies to justify
the interpretation of indirect tests of productive
audience of one picture which is randomly
selected from the set.' (xviii)
skills has become common. What are the argu-
ments for and against this procedure?
The audience (i.e. the examiner) then decides
which picture he thinks is being described, and FOR
compares this with the instruction given to the (i) All tests are artificial situations. The testee is
examinee. It is genuinely productive, arguably under a strain to perform well, and the interviewer/
communicative and highly reliable. How valid marker is under pressure to make the best assess-
would you consider it? ment in the short time available. Neither is acting
naturally. In these terms, there is little to choose
(b) Indirect tests between direct and indirect tests.
direct test used to validate the indirect measure for using direct tests of productive skills. Secondly,
is open to question terms of either its reliability
in the question of availability of resources (time,
or validity, then the concurrent validity correlation people and money) may exert a significant
is meaningless. influence on the choice of tests.
The last word has been reserved for a researcher
(iii) Although the nature of the correlation co-
who, apparently under pressure from all the con-
efficient itself forms the basis of all numerical
current validity studies giving the kiss of death to
reliability and validity calculations, the inter-
direct productive tests, felt it necessary to preface
pretation of correlations is far more complex (and
her findings on improving the reliability of oral
subjective!) than the calculations themselves.
interviews with the words:
Especially in the field of language testing, a per-
son's interpretation of a set of statistics may '
it has generally been recognised that the
depend entirely on the assumptions of his parti- best way to test for oral proficiency is to have a
cular theoretical viewpoint; the statistics haveno subject speak.' (xx)
inherent meaning other than as a purely math-
ematical relationship between two sets of numbers.
CONCLUSION
The only possible conclusion to be drawn on the
virtues and problems of different methods of (»') e.g. Lado (1961), Heaton (1975), Valette (1977),
assessing the productive skills is that what kind Oiler (1979).
(«) see Pilliner in Davies, A. (ed.) Language Testing
of test you use should be determined pragmatically
Symposium, Oxford University Press.
by the purpose for which the test is to be used, Clark, J.L.D. (1979) in Briere, E.J. & Hinofotis,
(*b)
the resources you have available for construction, F.B. (1979) Concepts in Language Testing:
administration and marking, and what you intuit- Some Recent Studies, TESOL.
ively feel will have the highest face/content validity (iv) U.S. Foreign Service Institute Oral Interview.
23
Peter Fabian
An examination is a formal and unnatural ritual. aren't getting the people they need. It is more
In theory, it sets out to gather information about a tragic when the foreign employer, say, discovers
person and passes it on to interested parties. It is no inadequacies because he has to trust the exam-
to all intents and purposes a dialogue between ination implicitly.
Supplier and Consumer. The suppliers are those A community gets the examinations it deserves.
who, in response to a need, create examination If it fails to take an interest, to be vigilant about
structures and strategies: the Examining Boards what the examining boards are up to, it will soon
and their fellow-conspiritors (i.e. teachers, schools, find that someone else has taken over: a vacuum is
attainment as well as potential in further skill unfair competition, the conduct of examinations
development. They thrive on faith. So long as the and their 'remoteness' to real life. Such are the
consumers believe that an examination is a mirror frustrated grumbles of children and they have
of something, it can withstand the grumbles and about as much impact. Whether we like it or not,
bloody-minded 'objections in principle' that examinations are here to stay. Their rejection for
abound these days. But when examinations cease the time being in revolutionary or reactionary
to be a communal act, in other words when some situations — times of social upheaval — does not
of those involved opt out, as they will and do, alter this a bit. They always come back again. No
then examinations begin to deteriorate, to become one has come up with a really workable alternative
distorted and eventually irrelevant except to those way of measuring attainment on which the selector
who get their living out of them, whether they be must depend to get round pegs into round holes.
examiners or schools. In our society, obsessed as it Parents, by and large, accept meekly, if not
is with qualifications and specialisation, even the exactly with reverence, the requirements of GCE.
worst examinations — and there are many — Perhaps they do not realise — or do not want to —
manage to survive long after they have ceased to how much their children have to sacrifice to the
be firmly rooted in the community's needs. This annual ritual, and what a huge chunk of a never-to-
happens because those who have resigned their be-repeated educational experience is affected,
active, creative and critical participation do unfor- even destroyed. There is a reluctance to cross
tunately continue to give their unquestioning con- swords with experts, though it is abundantly clear
fidence to an examination as a qualification; that experts all too often suffer from the con-
mostly they no longer know anything about it: strictions of their own expertise. But they do
how it is run,what it contains, how it is weighted, often have an awesome weapon: professional jargon
validated, what it is really for. With blind but dan- and impressive woolliness.
gerous faith they surrender their vital role to dis- Yet that jargon is often surprisingly imprecise,
tant and increasingly isolated 'experts'. Only when even in language teaching. Take the words 'exam-
they find themselves at the receiving end of inade- ining' and 'testing' which are so often used as if
quate applicants for a job, do they discover how they were synonymous and interchangeable. How
dubious the syllabus must have been that led to many people are aware that they are in fact very
such a qualification: then they complain that they different in kind and have different aims?
24
Peter Fabian
Tests are almost entirely diagnostic; they (schools are jealous of their independent role), the
belong in the classroom and are the teacher's most indifference of schools — of consumers generally —
constantly used tool. They help to close gaps, has isolated the boards more and more. Since they
clarify, prescribe individual treatment, and build are not perpetually exposed to the sort of criticism
confidence. You can act on tests. They do not which they bound to note or act on, that iso-
are
shape a syllabus but draw on existing techniques lation extends far beyond educational establish-
with the clear-cut purpose of edging and nudging ments so that finally the boards go one way while
the learning process along some controlled path. In the rest of us go another. Only when there is
examinations we have nothing so constructive wholesale disenchantment amongst thousands of
because they are final. Based on a controlled students does the whole problem come to light.
system of spot-checking, they are more superficial Many British schoolchildren do not feel motivated
and global. Examinations tell us if a student has to learn foreign languages because their courses
succeeded rather than why he has failed. It is often and examinations seem remote from their real
very difficult to get specific diagnostic information needs. The extremely costly exercise of foreign
from examinations. language teaching (despite gimmicks like language
laboratories and the sparse contingents of 'assis-
Who Improves What? tants') is largely a waste of money; and this is due
As we have learned from bitter experience, mostly to that friendly conspiracy between pro-
teachers, directors of studies and principals do not fessionals.
as a rule take an active part in engineering the
structure or contents of examinations. They are Unworkable Progress
content to be passive and to follow a leader. They Until 1965, the emphasis in language teaching was
may moan, of course, about what examinations on Writing and Reading, on definitive grammar
contain and how they interfere with what they even within the 'direct method', and on what in
want to do, but on the whole they accept what is the case of the native tongue might be called 'lan-
profered and are reluctant to interfere. If for some guage study and definition in retrospect'. It tended
reason — worthy or otherwise — an examination is to pull in the same direction as the treatment of
considered prestigious* impressively mounted, and the native tongue. The reason why we all put up
professionally validated, that seems good enough with this lop-sided prescription was that it was
for most of them. The American TOEFL test is easier and more economical to handle. If exam-
universally admired in the States, not for what it inations are best conducted in Writing and Reading,
does to teaching but for its technical and admini- then the preparatory work is bound to take on the
strative gloss. Cambridge examinations are gener- same slant. One supposes that if the driving test
ally supported because 'everybody wants them'. could be carried out by answering questions from
Much of the teaching syllabus takes its lead both an examination paper, driving schools would sell
from them and from their numerous textbook their cars, sack their instructors and revert to
satellites because it is convenient, not because the the classroom for driving instructions.
learning strategies they prescribe are realistic or The pretence that the major part of communi-
evenly balanced. cation was on paper, as it were, was bound to lead
The view
that a school is the servant of a string to the neglect of oral work. Apart from not being
of examinations is deplorable. A school
such required for examinations, teachers were often not
simply cannot opt out of its special responsibility too hot in speaking and understanding the language
for this link in the Learning-Teaching-Testing- they were teaching. The miserable appendages pur-
Examining chain. It owes that much to its students, porting to examine speaking and understanding in
whose faith lies in the entire teaching-testing EFL commanded little respect. They were — and
system. Nor can a school ignore the simple fact continue to be — tolerated because the school
that examinations influence to an unacceptable follows the leader, and the leader found it too
and unreasonable degree what is being taught on complicated and too expensive to make the
its premises and why. urgently needed improvements. The examining
The result of this easy-going relationship, this boards which enjoy the greatest respect and pres-
friendly conspiracy between examining boards and tige today know full-well that their oral 'interviews'
schools, is that in the end examining boards are are comparison with their written
negligible in
practically all-powerful. They invade the classroom sections. But instead of improving them, they
and saturate the staffroom without a murmur of simply down-grade their status in the examination.
protest. And for all the pretence to the contrary In the act, however, they also down-grade the
25
Peter Fabian
associated learning-teaching activities in schools. 'audial' springs from an urgent need and was intro-
Cambridge, for example, still awards only 25% of duced by the architects of the Arels Oral exam-
the total weighting to speaking and understanding; inations. Mentioning this small point may look
and that in a situation and at a time when oral pedantic but in fact it is symptomatic of an atti-
skills are most urgently sought after throughout tude.
the world.
Such examples from the past protrude into the A Lever — Not Just an Exam
present; they illustrate the power and influence Arels Oral examinations 1967 —
— introduced in
wielded by examining boards. The need, therefore, are an attempt to rock the foundations oforthodox
for upgrading oral/audial skills to the same level of academia-bound language learning. As examinations
thorough examining as written and reading skills are powerful, the attempt had to be made through
must be painfully obvious to anyone. But so long them. But it should not be thought that they were
as examining boards fail to give the lead here, all introduced merely to offer yet another examin-
attempts to infuse systematic training will remain ation. The role of the Arels Oral is still widely
sporadic, localised and superficial. misunderstood. Of course, the field was wide open
There are those today — and I am one of them — for reform oral testing had not yet progressed far,
;
who suggest that oral/audial skills are really the and any improvement was a purpose in itself. But
central and determining aspect of language acqui- it was wrong — and still is wrong — to say that
sition and that no one should be put in a position these are no more than specialised 'extras' which
of having to decide whether or not to 'study' a you can decide to use or ignore. Or that in the
language until they are orally and audially pro- context of English in Britain the student can pick
ficient up to a point. and choose because it really does not matter which
Fads and fashions like Programmed Learning examinations he takes; it has to be bluntly stated
or Skinnerian lab drilling have come and gone. that schools claiming to teach English in Britain
Since 1965, the classroom has accepted speaking should very seriously consider making these exam-
as the real goal: a spontaneous development and inations their central theme. Preparation for most
for once independent from examinations; but it other examinations can be had at home; the advan-
has remained unreinforced by the disciplines of a tages of taking them in this country are marginal.
recognisable end-objective. The effects of the shift But not so with the ARELS examinations. Practical
have thus been abortive to some extent because, and sustained exposure to the language, which is
after all, the pressures and traditions of orthodox then systematised and monitored in a school, can
examination-based language teaching proved too only be achieved in the country of the language
great. and in the context of a multi-national student
body.
The Fourth Skill: Aurality? The ACEs are a manifesto, a statement of
In all this hullabaloo, the fourth skill — listening practical objectives which are today indispensable.
comprehension — was the Cinderella. The spon- True, the examination makes additional demands
taneous Speaking Revolution has itself down-graded on a school. An oral/audial examination cannot be
it. Neither 'Threshold' nor 'Waystage' has given textbook-bound in the way written examinations
it much attention in depth. Few schools have done are. It can only give a lead; through its past exam-
much about installing such things as Listening ination scripts and tapes it can say: 'Here are
Libraries to counter-balance the Reading Library. samples of all the oral and audial skills we test,
While ad hoc syllabuses in speaking have been NOT because we find them easy or convenient to
devised, the entire problem, of listening com- test but because you cannot perform in practice
prehension has remained largely unfertilised, un- without them. Now you go and elaborate on these;
researched, and neglected. Such aspects of listening draw on your own daily experience and observ-
comprehension practice as audial drilling, ear con- ations in how we use the native tongue; use our
ditioning and aptitude testing, exposure to accents past papers as models and to remind you of what
and dialects, intuitive comprehension and guesswork must not be forgotten, and the syllabus will look
are still in their infancy. People do not talk about after itself.' Is that asking too much? Yet, any
them: linguisticians write each other the odd paper school claiming to take advantage of English in
which explains why there is still no universally Britain which fails to cover the Arels Oral range
accepted word to replace 'aural', which is in effect may actually be guilty of gently and innocently
indistinguishable phonemically from 'oral' and misleading its students. And that is putting it
26
Peter Fabian
An Oral Century to bring oral and audial work up to the lavish levels
venient when you can ring your business partner in we are of us capable of practising a dozen lan-
all
Sydney. International conferences are increasingly guages at least. This seems to be supported by the
conducted in English and that means a smooth millions of bi- and tri-linguists who acquired their
command of social English as well; if the asymetric skills by accident or circumstance, not because
language system adopted in the EEC, the only
is they were gifted or in love with language. They
languages for listening comprehension will probably speak these languages naturally in their multi-
be French and English. Negotiations, social con- lingual environment, and the only inhibition that
tacts are both drifting fast into English, while the exists comes from nationalistic pressures. It is
sophisticated hardware technology from tape likely, therefore, that the vast majority of language
recorders to video and internationalised TV is learners are not students so much as people
helping to shift the emphasis further away still naturally receptive to the right environmental
from writing and reading. A school which ignores treatment.
these trends is in danger of getting out of touch,
Democracy in Examinations
while its brochure may be promising by vague
Arguably the most democratic virtue of an exam-
implication what it is ill-equipped to do.
ination concept like the Arels Oral lies in the fact
It is not a matter of forcing examinations down
that it aims truly to reflect current usage: it has
students' throats: far from it. The Arels examin-
to do this. Written language remains static for
ations are a mere by-product designed to set off a
several decades. Oral language changes quickly.
new and systematic approach to the two neglected
English especially adapts constantly to the ever-
skills, just as the Cambridge examinations are
changing flow of influence across the English-
admirably designed to initiate the 'study' of
speaking world: not by decree of ossified gram-
English, which points them elegantly in the ESP
marians but by popular inclination. Usage precedes
direction.
acceptance. The pedant and the linguistician may
No more than a fraction of students coming to
deplore this as a concession to fashion or vulgarity.
Britain have the slightest wish or desire to 'study'
But, then, pedantry is always a little ridiculous in
English. Twenty years as Principal have taught me
oral language which springs from spontaneity and
this. It is a modern tragedy that they are so often
from the heart. Examinations must respond
encouraged to think that studying language is
swiftly to these changes and if they do, then lan-
synonymous with acquiring communicative skills.
guage will never be consigned to the museum.
Pendulum of Fashion But they will only adapt if everyone who is
Nor is it a question of a pendulum going this way involved in the process of language acquisition,
and that. A pendulum only exists where there is whatever it may be, participates inmaking exam-
uncertainty of purpose. To suggest as some do inations come close and stay close to the reality of
these days that 'we have gone too far in oral the day.
directions' is nonsense. In fact, we have not begun
27
H
W G Shephard
The Cambridge Examinations-an exercise
in public relations
Examinations stultify and constrict, emasculate these appear more humane and practical than
the new and perpetuate the old, and appeal to the august authority is expected to be.
lowest motives on the part of all concerned. Every For Cambridge, the specifically linguistic prob-
teacher is sufficiently convinced of the efficiency lems involved in the absorbing and regurgitating
of his methods, and every student sufficiently full of knowledge came up early, through the expansion
of faith that the standard reached will guarantee of activity in what are now called the ESL areas.
him the ability to function in the job of his choice, The Cambridge School Certificate examination
for examinations not to be necessary. All this is survived in these areas both the UK transition to
believed, and repeatedly asserted. Yet entries for the GCE in 1951 and in many cases the far-reaching
the Cambridge EFL examinations are increasing at political changes, and is still one of the Syndicate's
an average rate of nearly 5% on already large num- major activities. A specifically EFL commitment
bers (80,000 plus annually since 1979) in over began 1913, with the introduction of the Certif-
in
sixty countries. Their central position as a target icate of Proficiency in response to the demand for
and a definition of standards can be seen in any a qualification for non-native-speaker teachers of
school or college brochure or publisher's catalogue. English.
The organisers of the examinations, in Cambridge In 1935 this examination was first set in
and at the 400 local centres, are in a unique pos- December as well as in June. In 1937 it was recog-
ition to observe day-to-day and year-to year the nised by the University of Cambridge as an English
interplay of attitudes, assumptions and desires in qualification for matriculation purposes and by
the fields of teaching and testing. Some details of Oxford in the following year. The Lower Certificate
this experience, and particularly of a large-scale was also introduced in response to demand in
consultation with the centres just concluded, may 1939. The more appropriate title of First Certifi-
be of interest, therefore, read in association with cate was adopted in 1975, when both examinations
the discussion elsewhere of basic aims and models appeared in a totally revised form following a long
in testing. period of research and consultation.
The Cambridge examinations have reflected the Long-standing collaboration with the British
fluctuations in views ofwhat a relevant and effect- Council was formalised in 1941 by the establish-
ive test in a foreign language should be since ment of the Joint Committee of the Syndicate and
extremely primitive times. The Local Examinations the British Council. The Diploma of English
Syndicate was one of the first two examination Studies was introduced in 1945 at the request of
boards established in Britain, based autonomously the Council. The Executive Committee for these
on what Victorian social and educational conser- examinations includes representatives of a wide
vation termed 'the universities' and providing range of interested bodies; institutions of further
quality control of the range of schools existing education, ARELS, universities, the British Council
before compulsory education came in. General and, when possible, visiting overseas represent-
education, geared to the needs of the world's most atives.
highly-industrialised state in respect of operatives The early pattern for EFL testing was merely
and was tested in line with contemporary
officials, to carry over the skills thought appropriate for the
notions of content and method, and had its first-language or second-language aspirant towards
Dickensian moments. A properly conducted local 'educated native speaker' status. The expression of
examination, leading to a Cambridge certificate, proper sentiments with formal correctness and the
began with the ceremonial arrival by train in a study of literature were the main features, with
given town of a solemn don in charge of a black translation as the revolutionary 'element of func-
box. A little of this aura clings even to today's vast tional relevance'. The Syndicate is still sometimes
operation, coming out in awestruck responses to involved today in consultation with education
suggestions or instructions, particularly where officials from countries who are applying the same
28
W G Shephard
axioms in their developing local examining systems. teristic teaching pattern, to an integral place in the
Very gradually came the move to the present scheme of the examination. Other minority com-
system of a five-paper series of written and spoken ments argue equally pressingly backward to dic-
tests, covering productive and receptive skills and tation, or forward to video. Information retrieval
as internationalised and functional as it can be exercises on a variety of visual stimuli are suggested,
made by teacher participation in syllabus design, together with criticisms of every such exercise so
marking and test setting. far setand pleas to 'keep the examinations Cam-
Translation and literature remained integral parts bridge' and 'avoid railway timetables'. Above all,
of the syllabus until 1975, with a quasi-compulsory the clamour for individual paper grades instead of
status not much affected by widening the range of the present aggregate result, formerly widespread,
alternatives. was heavily favoured
Translation has died down significantly. The replies give a
produced the first suggestions that a semi-objective, timing and general administration. Recognition of
well-designed Use of English type paper should be it is a leading element in the detailed consideration
the core of the Proficiency examination rather the Syndicate now proposes to commission, with a
than one among a mixture of ESP alternatives. In a view to a basic re-design of syllabus taking effect
striking contrast of tempo highly revealing of the possibly in 1984. Oral/aural activities deserving of
dynamism of the EFL teaching field, a further a place in an extended interview and listening com-
revision is now planned only five years after the prehension session will be tried out, and their
introduction of the current syllabus in 1975. The actual contribution to the candidate profile
inspiration for this came from a straightforward matched against that of the present range of tests,
administrative need to take general opinion on the in order to establish which new, and which old,
feasibility ofconducting listening comprehension features most qualify. 1975-1981 experience
tests on recorded material. As well as this issue suggests that, in spite of incidental criticisms, the
(which has proved inconclusive, in view of the picture stimulus for conversation achieves its
teachers' equal distrust of examiners and mach- modest aim, that of evoking something more
inery), the Syndicate's enquiry invited comments individual and lively than 'How long have you
on all aspects of the standard and content of the been learning English?' between total strangers.
examination. It is interesting to note some of these Experience also suggests that 'reading aloud' is not
in the context of the historical development of the basicallyan unclean concept, as it does reflect,
Syndicate's examinations and of discussion else- when suitably non-literary, a realistic skill and
where in this collection of articles. does help the tempo of an interview which is, after
The teachers, nearly 250 of
from whom replied all, a marking exercise as well as an extended role-
24 countries, have endorsed the move away from play. Current experience has borne out, however,
culture and background testing, and shown the criticisms of the present three-passage,
approval for a very large proportion of the objective examiner-read listening comprehension test, and
and semi-objective elements introduced. The old we be drastically re-vamping here, greatly
shall
'guessing game' criticisms of multiple-choice assisted, it is hoped, by the ability to use approp-
batteries are now as tiny a minority as the pleas riate recorded material: material with more dialogue
for a return of prescribed reading, with its charac- and general variety of text and delivery, and a
29
W G Shephard
good many steps nearer to the ideal of total eaves- From teachers, the range of queries is also wide.
dropping realism, probably in colour video, which 'What is the pass mark?' is a basic question, but a
will be the next demand. Only one centre put in confused one. It cannot be answered in a standard
for a 100% oral, in the ARELS manner, and no- way, as it is asked from an approach to testing
one felt a separate examination with no oral com- concepts which varies according to country.
ponent, (in Oxford manner?) to be viable. In
the Questions on the length of essays, and the relative
assessing a possible weighting, we were thus some- importance of content and language, also need to
where between 0% and 100%, with considerable be answered other than by formula to be helpful,
pressure to go above our present 25%. The current though we have our series of general reports with
model, now being considered by the centres and as illustrative extracts from candidates' work to send
stated already the subject of trial working, has out. 'How many mistakes are allowed?', another
gone for a one-third oral component, not so much 'backward' question usually related to rigid pass-
for neatness in an examination for which our com- mark concepts, is heard now and again, and shows
puter cheerfully allows 39 as a (raw) maximum little appreciation of the prominence given to com-
oral mark along with marks scaled to one decimal municative competence, which, as was suggested
place in the case of two written papers, as for a above, is much more basic.
combination which can be meaningfully filled in One very productive aspect of public relations
To increase the openly oral
the right proportions. with the teachers, in terms both of the general
element simply on demand has not seemed, over mounting of the operation and because the only
the years, necessarily the way forward, or the only effective answers to many queries on marking
way and complete credit to oral-based
to give true standards and procedure are gained in this way, is
fluency, though the day is clearly coming when their active involvement in our marking panels.
Cambridge can increase the oral weighting both For the present large and increasing entry, a total
because it is good to help to define the teaching team of over 1,000 examiners is appointed in the
syllabus and because it can be done more reliably. year, and the vast majority of these are teachers
The recent questionnaire was a special oper- with extensive and current specialist experience.
ation, emerging as explained from particular needs. The largest proportion of these are the oral exam-
The Syndicate's contacts with the EFL teaching iners at overseas centres, some responsible only for
public, and the candidates themselves, are however small numbers at isolated centres working with the
constant through the year, and highly revealing of aid of instructions, sample recordings and occas-
the variety of aims, emotions, misconceptions, etc. ional feedback, but a much larger number as part
associated with EFL. By means of yearly issues of of an organised teaching/examining operation on a
regulations and documents aimed at candidates, large scale. A panel of over 250 oral examiners
local secretaries, supervisors, oral and written deal with the interestingly polyglot U.K. entry,
examiners, all revised and resharpened in the light and many of these double as markers of Compos-
of the previous year's experience, the plain message ition or Use of English papers. The 'hard core' of
is conveyed about the timing and cost of entries, this group are the 50 or so who in addition to
conduct of the examinations, and issue of results. marking current examinations participate in the
A vast amount of correspondence, telephone setting of future examinations across the whole
enquiries, questions at conferences, etc., indicate range of objective and open-ended test types,
however, that much of this material is not received, written and oral/aural. These are our best allies in
not read, or not understood either through lan- the public relations field, those who know that
guage difficulties or because what is laid down is perfection in testing is not a matter of realising
not what is desired. Late entries, for instance, heart's desire once, but over and over again in a
simply cannot be accepted under a computer- way that will satisfy the demands of security,
processed control system which apportions ques- consistency, discrimination and still not offend
tion and answer documents and provides statistical against a wide spectrum of methodological or ideo-
data for the monitoring of marking for the entire logical feeling.
entry, yet provides also personalised documents, In general, the Syndicate's contacts with the
from timetable to final result, for each candidate. EFL teachers, whether as regular collaborators,
At the other end of the process, results once issued, or occasional enquirers or critics, indicate in a
though subject to a check when requested by highly interesting way the current degree of accept-
teachers who feel a grading is significantly out of ance of various concepts in language attainment
line, are final and related to a carefully-maintained and the teaching approaches based on them. Over
general standard. a period of ten to fifteen years we have seen a
30
W G Shephard
swing away from the revealingly formulated cry, taken his eye off mundane considerations of
heard at an ARELS conference, of 'Too much marking techniques and values.
objectivity!' against multiple-choice testing, and The Syndicate would claim that, by and large,
the 'Crossword puzzles!' criticisms of material in itsgood relations with the EFL teaching public are
structured answer-book format. A question setter, based on recognition that its role and procedures
quite eminent in course-book production circles, have caught the whole testing process, from reliable
did once actually submit a crossword puzzle as theory to consistent practice, somewhere near the
part of a semi-objective paper, having temporarily right point of balance.
31
Ian Seaton
mation for all concerned: students, teachers, the system would — it was agreed — organise its shape
Council itself, sponsoring bodies and British instit- and content, and present its results in a way easily
utions. Clearly, such tests operating all over the understood by the non-specialist.
world and informing all interested parties have
heavy demands placed on them. The tests have to 2. Requirements
select, diagnose and predict to enable the various The committees and working parties that met
people involved to take a variety of important during this period soon came up with a formidable
decisions. list of requirements for the new test system. It
Some five years ago it was decided by the would have to deliver information on three counts.
Council, together with other interested bodies like Had the student reached a minimum level of ade-
the Committee of Vice-Chancellors and Principals, quacy which would indicate that his/her planning
that the tests used hitherto — the Davies English to come to Britain within a year or so was reason-
Proficiency Test Battery and the Council Subjective able? Had the student reached a level where a
Test — should be changed. The reasons for this period of one to six months English language
decision fall into three groupings. First was the teaching would bring him/her to a fairly adequate
recognition that views on what language is and level? Had the student reached this fully adequate
how it is had changed in the 1970s. In
used (target) level already? The test system would
syllabus and in the classroom there was
design therefore have to be criterion referenced or linked,
more emphasis on presenting language in use and needed for
testing those language skills likely to be
ensuring that the learning was more specific and the specificpurpose of studying or training in
appropriate. It was felt that the new test system Britain. It would have to establish the various
should derive from the same concern, account for levels of adequacy in these skills validly and reli-
the communicative use of language and comple- ably. Its content and item types would have to
ment and integrate with the general shift in ELT. have at least a beneficial backwash into the class-
Second was the desire to achieve a balanced system room and its results accurately predict the part
whereby a central and controlled test system could language ability would play in the outcome of
provide a reliable and valid measure yet be flexible that student's actual course.
enough to allow for local (national) differences. Then the test system would have to be capable
The previous tests had been operated on an of being marked locally or centrally as certain
increasingly local basis, leading to unreliable scores decisionswould have to be made in the student's
which could not be consistently interpreted by all own country and certain in Britain. It would have
those involved in the long and often complex to be a comprehensive test, and yet not so com-
chain of getting the student to and from Britain plex as to inhibit its uniform administration in
32
Ian Seaton
some seventy countries. It would have to be a tests can be reliably scored wherever and whenever
testing service, flexible and 'on demand', not a the test is given. How could a test system empha-
33
Ian Seaton
sub-tests around the single modes of 'reading', is now available in all Council
offices in Britain. It
'listening', 'study and 'speaking'.
skills', 'writing' offices throughout the world for both sponsored
This re-organisation recognised that both students and private students, and parallel versions of four
and teachers still think in these terms, and that of the sub-tests are being phased in. In countries
such a test system could not afford to be too inno- like the Sudan it is used by ELT Institutes pre-
vatory. Again there was a compromise on item- paring students for study in Britain, while in Aus-
types. The system would need the 'anchor' that tralia, the University of Melbourne has contracted
discrete-point items provide while testing the more to use it for its own pre-sessional and concurrent
integrated activities specified, and so three of the To meet
service English courses. 'local' require-
sub-tests are multiple-choice and the other two ments where some students may already have a
task-based. Although the editing produced good English language background or may be
'common-core' skills it was decided that, partic- planning only a short period of study not necess-
ularly to make the test more acceptable in 'face' arily in Britain, one of five combinations or
terms, there should be six modular tests with patterns of sub-tests can be taken, as appropriate.
'source booklets' (collections of texts) particular The Liaison Unit, as well as monitoring the use
to the five specified subject areas — with an extra of the Service worldwide and controlling some
module for 'General Academic'. of the subjective assessments, plans the develop-
The solution to the administrative problems set ment of the tests on the findings of the various
by such a large-scale system was there from the validation studies. These studies currently focus on
outset in the decision to run the Service jointly. features, construct and content
basic reliability
The Council would provide its unique network of validity and most importantly predictive validity.
ELT qualified staff through its office world-wide,
The Unit also publishes occasional information on
while Cambridge Syndicate provided its
the
the Service together with the User Handbook (for
experience and facilities in running examinations prospective candidates) and the Specialist Hand-
and tests in Britain and overseas. The Test Develop- book (for the professional ELT community). As
ment and Research Unit at Cambridge would pro- part of the development of the Service, the Unit
vide the analysis and computing services while the plans to bring into operation in 1982 a series of
Council set up a Liaison Unit to combine these modular tests for the increasing number of students
various resources in operating and especially coming to Britain for vocational technical training.
validating the test system. The first versions of these tests were ready for
It was decided to report the results by presenting trialling in the Autumn of 1981.
them as a five-point profile of language ability;
the scores on each sub-test are converted to a
band, or level of performance, on a scale of one 6. Directions
to nine with each band having the appropriate des- Although it is a complex and comprehensive test
cription of what it means in terms of ability. This system using a lot of specialist resources, all of
framework already has interim validity but will be those involved in its design, operation and valid-
refined through the 1980s to improve both its ation recognise that we are at the beginning of the
definitions of target levels and its predictions as to business of describing and measuring the com-
the average time and type of language tuition municative use of language rather than at the end —
needed to reach target levels. The Council has its particularly so in attempting to define the role
own unit, the English Tuition Coordinating Unit, language plays in academic study, vocational
which is now interpreting these andprofiles training or any learning process. The English Lan-
advising students, sponsors and institutions on guage Testing Service more than fulfils the standard
placement and/or pre-sessional English language requirements of a language test, but beyond that
tuition. This means that, although there is only should provide in the next ten years a systematic
one test system, it can be used to inform on the and stable procedure to investigate the use of lan-
varying levels demanded on a whole range of guage in the areas in which it operates. Validation
courses. studies such as the one recently set up with the
Institute for Applied Language Studies at Edin-
5. Operation burgh University, which will run for the next five
After two years of pretesting, analysis and revision years, will support such investigations while, of
with large samples in Britain and overseas, the course, contributing to the improvement of the
Service went into operation in 1980 in forty decisions that have to be taken in arranging in-
British Council representations and certain regional Britain study or training.
34
Ian Seaton
Many surveys over the years have pointed out working systems which can then be increasingly
that other factors such as personality, motivation, validated and improved.
cultural background, etc. play a crucial role in
Key references:
determining the outcome of study or training in
The English Language Testing Service. User Handbook,
Britain. It is hoped that, when research into these 1981: Specialist Handbook, forthcoming, The British
factors has reached a certain stage, it might be Council.
possible to extend framework of profile
the Graded Objectives in Modern Languages, 1980. Centre for
reporting to build a more whole profile of a Information on Language Teaching and Research.
Testing Communicative Performance, An Interim Study,
student's learning style and language ability. Such
1980, Carroll, B.J., Pergamon.
a step would fit in well with the whole develop- Communicative Syllabus Design, 1978, Munby, J., Cam-
ment of the Service, which has been one of apply- bridge University Press.
ing current thinking as rigorously as possible to
35
;
CSWard
Progress testing
preparation and analysis
One of the main tasks of any educational institution with allowing the student to show what he has
and its teachers is to check on the success of their mastered. Scores on it should thus be high
courses with reference to particular students and (provided, of course, that progress has indeed
the whole group. In the case of the individual been made). Whereas in standardised achieve-
student, there is a need to know how well he is ment and proficiency tests, a wide range of
keeping up with the programme. Should there be performance should be indicated, the progress
difficulties, he needs to be provided with a test should show a cluster of scores around the
case, there will be much more pressure for a clear statement of aims and methods, which is also
common yardstick by which to judge students' a requisite for successful testing. As the progress
progress. However, even in the small school a testshould reflect the syllabus, the statement of
double check, such as provided by a progress test, aims and methods prepared for the syllabus will
will help the teacher to review his assessments and largely dictate the form the progress test will take.
will help to indicate what areas of the course have Each course syllabus, except those for an
yet to be mastered. A good progress test can also absolute beginner in a particular language, will
help the student to understand where his weak- assume an ability in certain areas of the language.
nesses lie and build confidence by indicating the On the basis of this previous ability, the course
areas he has mastered. will aim to develop abilities in new areas. In the
Many of the principles for preparing good pro- progress test that follows such a course, the design
gress tests are similar to those for preparing and content will seek to show that the students
achievement and proficiency tests. Instructions to have attained those abilities the course sought to
the students should be clear so that they have no develop. The test may include questions that
difficulty in understanding what they have to do. involve students using the ability they were assumed
Trick questions should be avoided. These are more to have at the beginning of the course, but it
likely to trick the better students than the poorer should not be central to the test. Nor should the
ones. However, in other ways, the principles will test include anything which was not considered a
be quite different, reflecting the different purpose necessary previous ability, and which was not
the test is to serve. A quotation from J.B. Heaton's covered in the course. In other words, the content
Writing English Language Tests (Longman, 1975) of the test should be such as to result in any
will serve to summarise these differences. student who has successfully mastered the course
content getting perfect or near perfect scores.
'Good performances act as a means of Once the content of the test is decided, the
encouraging the student, and although poor testing method will have to be decided. The types
performances may act as an incentive to more of possible questions are well described in the
work, the progress test is chiefly concerned standard texts and need not be discussed here. The
36
CSWard
choice will again reflect the course as much as Analysis
possible. We need to use the testing method that Once the been used, there is unfortunately
test has
will indicate as near as possible that the student a tendency for it one side. An analysis
to be put to
has attained the target ability. For example, it of the test results Can be done fairly rapidly and
would be inappropriate to ask students who have can provide a lot of information about the students
attended a course which emphasised reading and the course. It can also lead to the development
business letters to write a business letter as a test of better progress tests for the future. Often the use
of their successful completion of the course. of a test can show flaws in it which cannot be seen
Similarly, it is not advisable to depend on reading by inspection.
comprehension tests when checking the success One of the first steps in the analysis of the test
of a course in writing. This is true of other forms would be to seek wherever possible the opinions
of tests, but it needs to be emphasised even more of the teachers and students who have used it. If
for progress tests for two reasons. First, students the progress test has been a good one, most
will tend to concentrate on areas of the course students will be satisfied that their results reflect
in which they know they will be tested. Second, their progress. They will have seen the course
the correlations between different types of tests reflected in the test and any loss of
will see that
that are quoted as a basis for accepting standardised marks was due to their lack of mastery of sections
tests, the contents of which do not completely of the course. General discontent with any section
sample the language, may be suspect when pro- of the test will usually indicate that either the
gress tests are considered. A course which empha- instructions were not clear, or the question was
sises a particular skill over others will probably obscure, or the students did not feel that particular
cause such correlations to be reduced substantially. area was covered in the course. Teachers who have
However, in large institutions or where time is seen the test in operation will also often be able to
limited, a machine markable test may well be used give helpful advice.
as a common yardstick. If combined with, for A fairly easy second step is to look at the dis-
example, a teacher's assessment (hopefully, a tribution of marks. As stated in the earlier
continuous assessment) of areas not covered by quotation, the scores on a progress test should be
the test, and if the students are aware of this, then clustered towards the top end of the scale. If they
the disadvantages of using such tests may be are not, something has gone wrong. There are
avoided while the benefits are retained. several possibilities: for example, either the course
Finally, the questions need to be written and design needs to be revised or the test needs to be
the test booklet designed. These should always be redesigned, or both.
at least double-checked. A
second opinion will The final step is to investigate each question. It
often see flaws that the first writer cannot see is useful to keep each question on a separate card
until they are pointed out. The following questions with a summary showing when it was used and
need to be asked: how successful it was. This may take a little time
-
El
CSWard
and the information that can be obtained can be testing, this would generally be regarded
as too
very useful. However, the item analysis as des- easy as attempt is made to spread the
an
cribed in most texts was developed for checking scores along the whole scale. Items that all or
items in standardised testing and, if used in the nearly all candidates get right or wrong do not
same way for progress tests, will not help to develop help to do this, and so items with difficulties over
the type of test that is needed. Thus, the process 0.80 and less than 0.20 are rejected from such
will be described in detail and any differences of tests. However, in progress tests we expect the
approach needed will be pointed out. items to represent areas the students have mastered,
The first step is to rank all the answer booklets and thus easy items should be retained. On the
according to the total score obtained by the other hand, items which the majority of students
students. The lower and upper 27% are then set get wrong are suspect. Any item that has a diffi-
aside for analysis. There should be an equal num- culty of less than 0.50 may have been badly
ber of papers in the upper and lower piles. Thus, written or may test an area not covered sufficiently
if there were 100 candidates, there should be 27 in the course. In the first case it should be rejected.
papers in each pile. Sometimes there are several In the second, either the course should be revised
papers with the same score at the point where we or the item rejected.
have to make the cut-off. For example, there The discrimination coefficient for the above
might be in this group of 100, 24 students with
scores above 88% and 5 students with scores of
example would be
24 — 22 —27——
2
= - = 0.07 (to 2 deci-
27
88%. We would then choose 3 of those 5 randomly mal places). Again, in standardised testing, such a
for the upper group. We could then prepare a low discrimination would be regarded as unsatis-
table for each item as follows: factory. Low discrimination indicates low agree-
ment with the other items in the test. As much
A* B C D NA TOTAL agreement as possible is needed between items to
u 24 ° 27 help spread the candidates along the whole scale.
L 22 3 1 1 27 Thus, generally items with a discrimination of less
than 0.20 are rejected for such tests. However, in
Key: A to D — Choices in the item.
progress testing this is not such an important con-
* — Indicates correct answer.
sideration. Indeed, it can be proved mathematically
NA — No Answer.
that, when using the formulae given here, an item
U — Upper group.
L — Lower group. with a difficulty over 0.90 cannot have a discrimi-
nation above 0.20, and a question that students all
From these tables we can easily work out two get right can only have a discrimination of 0.00.
statistics: the difficulty coefficient (= simply the Furthermore, it is unlikely that an item will have
proportion of people who got the item right) and the maximum discrimination theoretically possible.
the discrimination coefficient (= a figure which Thus, few items with difficulties over 0.80 will
tells us how well the item discriminates between have discriminations over 0.20. In a progress test,
the upper and lower groups). The formulae are: as we wish to retain these 'easy' items, we will
have to accept lower discrimination coefficients.
Difficulty coefficient =
UR + LR
However, the discrimination coefficient remains
2n
useful. In progress testing, as in standardised testing,
Discrimination coefficient =
UR-LR we use the total score rather than the score on
n
individual questions, and so it is still important
Key-. UR — The number of people in the upper that particular items do not work against the rest of
group who got the item right the test. If more of the lower group get the answer
LR — The number of people in the lower right than the upper group, then the discrimination
group who got the item right coefficient will be negative and the item will be
n — The number of people in one of working against the rest of the test. Such an item
the groups. should be rejected or revised. A suggested approach
(N.B. There is a more complicated and accurate is to reject all items with a negative discrimination
formula for discrimination for those con- coefficient, and reject items which have both a
templating using a computer.) difficulty coefficient of less than 0.80 and a dis-
24 + 22 _ crimination coefficient of lessthan 0.20. In this
For the above example, difficulty is 2X 27
way we would build a test that could identify the
~ = 0.85 (to 2 decimal places). In standardised
weaker students while continuing to have a cluster
38
CSWard
of scores towards the top of the scale. Many will simply be badly written, but others will
Following this, we should check the distractors provide keys to where the course is failing to cover
in the table previously given. The distractors are certain areas or failing to clear up misunderstand-
the incorrect choices (B, C and D in the example ings (or where it is actually creating misunder-
given). Distractor B would be acceptable inany standings). Where it is decided that the course is
test as more of the lower group chose it, and it at fault, the course can be revised and the item
thus helps to identify the lower group. Distractor retained.
D would be rejected or revised as more of the The above discussion has been an attempt to
upper group chose it, and it thus confuses the issue. suggest an approach to the preparation and analysis
Distractor C is more difficult. It would be rejected of progress tests that emphasises such tests as a
in standardised testing as does not help to dis-
it tool for the educational administrator or teacher, a
criminate and is thus just a waste of candidates' tool to encourage students and a tool to help check
reading time. However, in progress testing, if it and revise the course so that the aims of the
represents a common which students com-
error course will be realistic aims for both students and
pleting the course have mastered, it is worth teachers. Such tests cannot function properly if
keeping. If, on inspection, however, it proves to be they are divorced from those who are responsible
a distractor that is so ridiculous that no-one would for the courses. Used wisely, progress tests will
think of choosing it, it is worth revising. lead to better designed courses. Progress tests are
The final stage is to go through all the reject an adjunct to courses. They should never super-
items and try to establish why they are rejects. sede them.
39
John Refers
Nha (/na/) was a Vietnamese member in one of the course members about what had been said in class
courses for the English Language Institute's Dip- and about possible traps in the test itself, which
loma in the Teaching of English as a Second then taught as well as tested. Other items were
Language. He spent a lot of his time on the tennis about the test itself. All of these helped to take
court. (He doesn't mind my saying this.) So when some of the tension out of the test and to provide
it was time to give a test on a linguistics unit, it some interest and amusement in the actual test
seemed the sensible thing to change the wording of paper. Again, it would be difficult to say whether
Chomsky's 'Golf plays John', and make the ques- the results were better or not. Perhaps it was more
tion of immediate, personal relevance to Nha and important that items like the following provided
his colleagues. In addition, of course, the course some primary motivation. They might even have
members all smiled — some laughed — when they stimulated course members to try something
came to this question. I cannot prove that more similar in their own testing programmes back
people got the question right because of the per- home.
sonal reference, but at least the examinees relaxed For the classifying sentences test, we usually
a little and the name Nha triggered a memory. had a sentence like this as an example:
This same technique worked equally well in S V O
usually feared Grammar Simple
tests. In a so-called We/like/tests.
Sentence Patterns test, course members were asked In the re-test, the example becomes:
to classify sentences into SV/SVC/SVO/SVOO/ S V o
There VS types. Previous tests included the usual We/(still) like/tests.
dreary classroom/grammar book sentences. We Scattered through the tests were sentences like
found that sentences referring to individual course the following:
members and to some of the things that they had 'I hope you avoid the traps.'
actually said in conversation or in tutorials caused 'There aren't any problem sentences.'
considerable visual and audible amusement in the 'So far this test doesn't seem very difficult.'
examination room. Apparently, these personal 'I think number 15 was a trap.'
references evoked vivid, and therefore easily re- 'There won't be any more there sentences.'
trievable, memories of grammatical explanations. 20. I can't remember the difference between
This may have been an interesting example of the SV and SVC sentences.
involvement of both the right and left hemispheres, 21. This is an SVC sentence.
and their distinctive thinking or heuristic styles, in 22. This isn't.
problem-solving. Here are a few examples. (The The last item in a test: 'I've begun to enjoy classi-
SVC
re-
'Lim has tasted the night life of Wellington.'
quirement mistake you make.'
b. During a very cold spell, there were complaints
about the beds in a university hostel.
You/have been/warned! (or SV, if we haven't
'The university hostel wouldn't give them any
brainwashed you)
Dutch wives.'
S V o
c. One course member had been described as a
I/hope/you all pass.
very romantic person.
S V c
'Ibrahim gave her his heart, body and soul.'
Can you follow directions? 18. Now that you have finished reading every-
{Speak to no one. Do not look at anyone else's thing carefully, do only sentences one and
paper. Work very quickly. You have only five min- two.
utes.)
1 Read everything before doing anything.
2. Put your name in the upper right-hand corner
(Have you been caught out by this test? The editor
of this paper.
of this collection of articles was caught out not
3. Circle in the word name in sentence two.
many years ago!)
4. Draw five small squares in the upper left-hand
corner of this paper.
At the end of the Study Skills course, we gave a
5. Put an 'X' in each square.
Reading Comprehension Test. The opening in-
6. Sign your name under the title.
structions contained the following:
7. After the title write, 'Yes, yes, yes'.
8. Put a circle around each word in sentence 'Can you follow directions? Do you remember
number seven. the test on following directions? Read all the
9. Put an 'X' in the lower left-hand corner of instructions first. This isn't a test of reading
this paper. speed. But there is a time limit for the whole
10. Draw a triangle around the 'X' that you have test.'
just written.
Course members smiled, heads nodded and they
11. On the back of this paper multiply 703 by
started work on page 1. It was a long test, inten-
9805.
tionally, and when they got to the end, on page 7,
12. Draw a rectangle around the word 'paper' in
they found this final instruction, which, of course,
sentence number four.
the majority had NOT read before they started on
13. Call out your first name when you get to this
page 1
point in the test.
14. If you think you have followed directions 'Now that you have read all the instructions
properly up to this point, call out, 'Yes, I and now that you know how to follow direc-
have'. tions, do NOT 1, 5 or 9 on
answer questions
1 5. On the back of this paper, add 8950 and 9850. The Stolen Letter (question 1).
16. Put a circle around your answer. Put a square If you have already circled a, b, c or d for
around the circle. questions 1, 5 and 9, you will lose three marks.
17. Count out loud in your normal speaking Do NOT ask for another question paper. Do
voice backwards from ten to one. NOT rub out any marks you have made.'
41
John Rogers
Quite a number of course members laughed out Alt you need
is love and oxygen
loud at this point, shook their heads and sat back anyone's guess how many people have
It's
resignedly. I think they then learnt how to follow joined the Mile High Club by making love in an
directions. Tests CAN teach. And they CAN be aeroplane.
made more human. 'I mean, who knows what do under
Finally, an example of a cloze test that pro- blankets when the are turned out,' said an
vided a great deal of amusement and entertainment. Qantas spokesman.
It proved to be challenging, but the inherent 'Love-making in is certainly possible and
human interest of the material kept motivation our and hostesses have caught people
high. And there was an instant clamour for the the act.
'official' answers. The follow-up 'correction' 'If the couple_
quite happy and not dis-
lesson was as entertaining as the original test. I turbing we'd probably say a
else, 'Sorry,
offer it to anyone who would like to try it out. sir' and leave them. it. We've nothing in the
42
'auline MRea
An alternative approach to testing
grammatical competence
The main thrust in language education today is on language proficiency, involving a wider breadth of
the teaching of language as communication. The vocabulary and more sophisticated manipulation
terms 'notional', 'functional', and 'communicative' of language structures, takes a longer time to
are labels frequently used to describe current develop.One can therefore anticipate difficulties
approaches to language teaching. In other words, when second language learners try to operate at a
the central concern is with the imparting of lan- higher level of understanding and communication
guage skills which will enable our language learners in the language but find they are unable to do so
to engage more efficiently and effectively in because of an inadequate formal linguistic gener-
natural communicative activities. Whereas at one ating mechanism.
time the focal point has been on the learning of When we say that someone 'knows a language',
the grammatical patterns of the target language, it we mean that this person has acquired certain
is now claimed that the primary aim of most lan- abilities. These include the ability to produce
guage programmes is to develop learners' 'com- grammatically acceptable sentences in the target
municative competence' in the target repertoire. language, together with an ability to use these
There is, however, a tendency to interpret a 'com- correct forms appropriately, as the occasion
municative' syllabus as one which is organised demands. It is essential, therefore, that both these
primarily around a set of notional and functional aspects of communicative competence are taught,
categories, which subordinates the role of grammar and that the importance attached to the teaching
as an organising principle to second place. This of the social functions of language should not
sharp swing in the pendulum in the orientation of obscure the crucial role of the grammatical system
language teaching programmes will certainly lead to the successful communication of ideas and
to considerable difficulties in the foreseeable future intentions. It follows from this that the shift in
if the imbalance in present trends is not redressed. emphasis in language teaching programmes has
There are two inherent dangers with some inter- neither eliminated nor even reduced the need for
pretations of the communicative approach to lan- teachers to assess their students' grasp of structural
guage teaching that are relevant to the present dis- items of the target language. The requirement to
cussion. The first is associated with the rather assess grammatical competence is as necessary
unsystematic and unprincipled presentation of the today as it ever was. However, the different views
grammatical system of the target language. Indeed that we now hold of language, in particular the
Wilkins (1976), as one of the major contributors role of grammar in terms of its function within a
to discussions on the notional syllabus, stresses the semantic and pragmatic framework, do have impli-
importance of the acquisition of the grammatical cations for the way in which grammar will be
system of a given language (p. 66). The second, assessed. The rest of this paper will examine the
related problem stems from the emphasis which changes in the approach to test syllabus specific-
many communicative courses give to the acqui- ation, the method, and the format of tests which
sition of socialising skills during the early stages of are a direct result of the current trends in language
the language learning process. The likely outcome teaching and learning programmes.
from both these factors is the emergence of groups Hitherto, specifications for a grammar test have
of learners whose language proficiency, at best, been in the form of an inventory of different
demonstrates an adequate degree of fluency at a aspects of English grammar. Such a list would
basic level communicative interaction, but
of include determiners, such as 'some', 'any', 'much',
whose knowledge of the underlying target language 'little', and verb forms such as 'present', 'impera-
system is grossly inadequate. Cummins (1979) has tive', 'modals'. Typically, test items to match these
found that basic interpersonal skills may be areas would be similar to those shown in Table 1
acquired fairly rapidly whereas literacy-related on the next page.
43
Pauline M Rea
DETERMINERS IMPERATIVE
We haven't got tomatoes at all. Q. '
a cigarette?'
A. 'No thanks'
1. some 1. Do you have
2. much 2. Have
3. a few 3. You have
4. any 4. Have you got
PRESENT MODALS
He often in the bath. Q. 'Must I do it this evening?'
A. 'No, you
1. is singing 1. mustn't
2. to sing 2. can't
3. sings 3. needn't
4. singing 4. won't
Implicit in a specification of this kind is the belief sample test specification which relates grammatical
that 'knowing a language' corresponds to accurate and communicative categories appears in Table 2
manipulation of the grammatical forms of that below.
language, and test items have tended to re-inforce The second inadequacy of existing grammar
this belief by tapping knowledge of the (formal) tests is associated with the method of testing used.
rules of the language (usage). Thus, they account The isolated sentences format illustrated in Table 1
for only one of the two abilities involved in com- is not valid within a communicative framework. It
municative competence. Given current concerns is obvious that a test method which excludes the
to develop communicative language proficiency, total context inwhich the grammatical and lexical
any test specification and any test item should system of a language operates cannot claim to be
Noun Phrase
Use of determiners — some, any, few, etc. for QUANTIFICATION
— a/an for INDIVIDUATION
— zero for GENERALISATION
— possessive/the for SPECIFICATION
Verb Phrase
Use of passive for PROCESS
Use of infinitive for PURPOSE
Use of finite forms — imperative for INSTRUCTION
— present for DESCRIPTION
also account for the second ability mentioned assessing the appropriate use of grammatical struc-
earlier, namely the appropriacy of language use. tures and forms. There are three related difficulties
In other words, they should not only list and which make the conventional methods incompat-
assess the grammatical functions basic to com- ible within the framework of the communicative
municative activities but they should also assess competence model.
the application of the rules of language use. A
44
'auline MRea
1. Inadequate coverage and imbalanced item petence, the final part of this paper will make
distribution suggestions for an alternative approach. The test
This is the inevitable consequence of testing items which are illustrated below are designed as a
grammar and vocabulary in discrete sentences. For more valid means of testing grammar. Additionally,
example, there are some items which are very easy these items permit the objective marking of ques-
to test, such as adverb-tense collocation (perfect + tions, thus satisfying the criterion of practicability
just/never/ever/ + neg + yet/etc.) and question tags which is an overriding consideration when large
(isn't it/shouldn't we/didn't we/etc). Thus we numbers of students are involved in the testing
often find an overabundance of items of this type process. The discussion is restricted to the com-
included in grammar tests. On the other hand, municative function area of 'description', and
there are aspects which are more difficult to assess takes as a specific example the case of students
and are thus often excluded. These include areas whose purpose for learning English is to facilitate
such as the passive verb group, and modals. further studies through the medium of English.
Given this information, we are in a position to
determine that these students may be required to
2. Limited demands on students from test items
use English, for example, to describe and relate
themselves
historical facts, to explain physical and scientific
Normally, test items are heavily weighted towards
phenomena, to compare and contrast events and
recognition-type tasks and rarely assess control
processes, or to report on an experiment. Because
(i.e. production) of appropriate linguistic forms.
a test syllabus isexpected to reflect the relationship
Students may be very used to manipulating
between linguistic forms and communicative
grammatical forms transforming, for example, the
functions, we become aware of the importance of,
present into the past tense, or direct into indirect
for example, finite verb forms — present and past —
speech. There is therefore a strong possibility that
for 'description', of the passive verb group for
students produce correct answers on traditional
'process', the perfect verb group for 'development',
multiple-choice test formats mechanically, and do
use of the modal group for 'ability', 'possibility',
not demonstrate an ability to use the appropriate
and so on. The next step, once the syllabus has
forms as the 'real-life' context requires.
been detailed, is to select topics which are relevant
to the purposes for which your students are
3. Lack of authenticity
learning the language. These should be sufficiently
The isolated sentences format is inadequate as
general and accessible to all students, and varied
there are a large number of items which cannot be
so as not to favour any particular group. After this
adequately assessed without reference to the con-
comes the selection of a suitable text, followed by
text in which they are normally found. In actual
the design of the test items.
communication, an appropriate linguistic form is
Example 1 is an illustration of the way in which,
selected for its function within a text, which also
within the context of a report on an experiment,
involves decisions about the overall function of the
the ability to produce grammatically correct and
text itself.
appropriate verb forms can be evaluated. It mainly
Having outlined above the invalidity of the uncon- involves the selection between contrasting verb
textualised method of assessing grammatical com- forms of the past active and the past passive.
Example 1
YOUR REPORT
Water 1) into a displacement vessel until it 2) . . through the pipe into the measuring
jar. The level of the water measuring
in the jar 3) ... . . . Then the solid 4) into the
vessel until it 5) by the water
45
Pauline M Rea
Contextualisation of items in this manner allows assessing grammatical competence at the level of
the realistic assessment of mastery of individual the verb phrase, focus on appropriate tense sel-
grammatical elements appropriate to, in this case, ection and accurate verb formation within the
the genre of sub-technical report writing. Depending communicative category of description are given
on the level of students, we may require the below. In the first example, students are expected
recognition or the (cued or uncued) production of to insert the most suitableword for each space in
the required answer. Two further examples, the passage.
Example 2
PORTUGUESE EXPLORATION OF THE COAST OF WEST AFRICA
The progress of Portuguese exploration 6) slow at first. Madeira 7) discovered in
1418, but Portuguese ships 8) not pass Cape Bajador until 1434. The Azores 9)
first sighted in 1439 and in 1441 Cape Blanco 10) rounded, and Arguin just to the south
of it 11) discovered in 1443 by Dias and Tristao
The next illustrates a cued production task in most suitable form so that they make sense within
which the verbs to be used are provided in brackets. the passage.
Students are instructed to put these verbs into the
Example 3
The final example is designed to assess word order, one word has been omitted from each line and
especially at the level of the adverb phrase (with word is listed on the right of each line.
that this
adverbs used as modifiers), and noun phrase (ad- They have to choose the place (a, b, or c) where
jective placement). Students are instructed that the word should be written.
Example 4
OMITTED
WORDS
(20) Aa / walk through any / of the
c
/ slums city
If —
(21) in the / world / is an / experience. unpleasant
c
(22) They*/ start just / outside the city / frequently
c
(23) limits. / As / they are / peopled by usually
The final, important stage in the process is the Our answers to these questions should indicate
check on the adequacy of our testing procedures. the extent of the test's potential validity, which
We may do this by asking four main questions.- operates at three distinct levels:
46
'auhne MRea
iii) format — authentic item-types defined by of language have to produce grammatical elements
their function within the selected which are determined by their overall function
text. within sequences of linguistic events; this requires
an analysis and synthesis of these events rather
It should be clear from the preceeding examples
than what might be simply the result of fortuitous
that the selection and production of the required
an item, or a mechanical response.
recall of
structure and grammatical form is determined not
only on grounds of grammatical correctness, but
also by its overall function within a given com-
municative area. These examples have been used to References
illustrate one alternative way in which linguistic Cummins, J.,'Psychological Assessment of Immigrant
Children: Logic or Intuition?', in Journal of Multi-
competence may be evaluated within a communi-
lingual and Multicultural Development, Vol. 1, No. 2,
cative model of language teaching, learning, and 1980, pp. 97-111.
testing. Although emphasis in the test items reflects Halliday, M.A.K., 'Language Structure and Language
a concern for appraising a large number of gram- Function', in New Horizons in Linguistics, Lyons, J.
(Ed.), Penguin Books, 1970, pp. 140-65.
matical items, the method and format used is such
Leech, G & Svartvik, J., A Communicative Grammar of
that individuals are required to integrate their lin-
English, Longman, 1975.
guistic knowledge in a way similar to the normal Wilkins, D.A., Notional Syllabuses, Oxford University
use of language for communicative purposes. Users Press, 1976.
47
S F Whitdker
No doubt there are fashions in testing procedures, written record, we must see that dictation has
as there are in clothing and cars. Dictation, essay, more relevance to measuring linguistic ability
precis, and transformation exercises are 'out', like than is generally recognized. Add to this theorizing
mini-skirts and spats, but multiple choice and the remarkably close correlation found in practice
Cloze are 'in'. Yet rational currents are discernible: between results on tests in dictation and those on
a search for economy, simplicity, speed on the one the best alternative tests devised, and we cannot
hand, and for validity, or something approaching ignore its claims. (See the work referred to in Oiler,
natural language use, on the other. If a case is to 1979, and in Carroll, B., 1980.) In order to use
be made for the use of dictation for testing pur- dictation to advantage, we must decide what it is
poses (and also for teaching purposes), we must we want to test, and then try to conduct the task,
consider in turn (1) the language skills that are and score the results, in such a way that we give
involved, and therefore (2) the system of scoring credit where we consider it is due. This clearly
results so as to measure these skills; (3) the sorts does not mean unthinkingly dictating in such a
of texts which might be chosen, and (4) the prac- way, at such a speed, and with so many repetitions,
tical procedure of dictating: speed of delivery, that no student feels stretched or unsure, and then
length of utterance (or frequency of pauses, the scoring only according to success in spelling,
length of pauses, the number of repetitions (if including punctuation — unless we have decided
any). that that is all we want to test.
48
S F Whitaker
classics. (Sadly my cross-eyed bear; Send tbree- focusing on the Without the necessary
learner.
and fourpence; She left him with three hundred rigour in its would be use-
presentation, dictation
children.) We need only look at the efforts of less, and students' acceptance of this rigour must
students attempting a suitably difficult dictation be obtained, or they may feel frustrated, and fail
to see how fundamentally constructive their pro- to cooperate — to their own disadvantage.
^cessing of aural data is: words that they know, or
Alternative texts
that seem plausible, will be written — because they
The more plausible and realistic the context for
have been 'heard' — instead of the original. (See
dictation, the easier this acceptance should be.
examples by H.V. George, in Oiler & Richards,
What is plausible and realistic will depend on the
1973); and in Oiler, 1979, pp. 276-285.) 'A
learner's situation, his interests, his imagination.
brought taste in music', instead of 'a broad taste'
(A little make-believe
is often appropriate and
(phonologically explicable, and 'heard' by an
enjoyed.) Recorded information and announce-
Armenian as well as by a Chinese); 'settlements',
ments offer possible material, as do songs if one is
for 'sentiments' (U.S. pronunciation); the pro-
anxious to get the words down. There is a built-in
duction of one preposition when another was dic-
incentive. Ringing travel enquiries might yield
tated: of instead of for, for instead of with, of something that would be written down like this (if
instead of on or at. We remember that these pre-
time allowed):
positions will be in their weak phonetic forms,
and all these errors demonstrate the operation 'To get to Birmingham by 23.00, you should
of a 'transitional competence', of a 'grammar of take the 19.37 from Grantham, arriving at
expectancy'. Even the words are not 'given' (as Peterborough at 20.03, and change at Peter-
Lado alleged): they have to be inferred or con- borough, departing at 20.23, and arriving
structed through linguistic ability, from various at Birmingham at 22.52.'
kinds of evidence arising from aural stimuli.
Further study and analysis of students' per- This involves 'comprehending': (a) times (given in
formance demonstrate even more forcefully
will time-table form, probably), (b) key verbs (get,
how the taking of dictation calls on an 'integrative' take, change, depart),and (c) place-names, a special
language competence, which combines elements of area of lexis, helped by the possession of as much
phonetic discrimination, availability of vocabulary, geographical knowledge and previous experience
understanding of structure at the phrase, clause, of characteristic English place-names as possible.
sentence levels and above, together with overall Of course this, like much information 'dictated'
textual comprehension. It is true that it involves a in thisway, would normally be taken down in
largenumber of different language features: this note form. Notes imply intelligibility to the writer,
makes it difficult to offer specific remediation for and may be in his own code; but he must be able
errors committed, but it is precisely what gives dic- to reconstruct the message from the notes, and it
tations, even more than Cloze tests, their pragmatic is not unrealistic to suppose that he might write it
validity, as approximating to natural language use. so that it would be intelligible to someone else.
The minimum written version might then be: To
Recognising the relevance of dictation get to Birmingham by 23.00 take the (Note that
A number of practical decisions will have to be 'train' is understood from this use of the definite
taken, once the principle has been agreed. But it is article with a time) 19.37 from Grantham arrive ,
viction should be shared by the students carrying 22.52 In assessing and scoring such a dictation,
.
out the task. They must accept that the task of one might decide that there are 18 essential bits
grasping the full content of (for example) an of information to be intelligibly recorded in writing,
announcement, a broadcast, while difficult, is a and these have been underlined. It is not the
desirable skill, and that it must be practised when spelling (eg of Peterborough) that is the criterion,
it is difficult rather than when it offers no chall- but the recognizability of each bit of essential
enge. It is of a nature which must exclude any information in the written version. This may some-
individual plea for external assistance when it is times require some discretion (and the spelling of
being trained, and of course when it is being tested. some place-names is unguessable), but the criterion
(I.e.Face the challenge with your own resources, is a relevant one.
instead of asking to have 'the answer' handed to Writing down a 'telephone' description of lost
you.) In that respect, it is individualised instruction, property would provide another plausible task, in
49
5 F Whitaker
which the criterion of success would be the num- justify the repetition of each burst, but once only,
ber of specific bits of information. in order not to approach the frustration threshold
too closely. Readers may like to experiment with
E.g. 'It is a small, black, plastic bag, with a metal
the division and delivery of a short narrative
clasp, gold-plated, and my initials, DJB, in
extract, from William Trevor:
small silver letters on one side. There is a
narrow strap for holding it, which has become 'Without in any way sounding boastful, Edwin
'
rather worn and frayed in the middle told her of episodes in his childhood, of risks
taken at school. Once he'd dismantled the
Students might enjoy getting a good story elderly music master's bed, {rising or falling
written down accurately, so that they could later intonation?) causing it to collapse when the
memorize and re-tell it. Here notes would not music master later lay down on it. He'd removed
suffice: fully articulated sentences would be the carburettor from some other master's car,
required. But though spelling is certainly not un- he'd stolen an egg-beater from an ironmonger's
important, it would not have enough significance, shop. All of them were dares, and by the end of
in relation to the task, to justify spelling errors his schooldays he had acquired the reputation
being scored on the same basis as others. (Oiler, of being fearless.- there was nothing, people
1979, p. 281, showed that performance in spelling said, he wouldn't do.'
did not correlate at all with overall proficiency.) A
text that is to be dictated, like one used for Cloze
The dictation of punctuation
procedure, should be worthy of the attention it
There is no clear reason why a text, spoken at
will receive, either because it fits well enough the
something like the normal speed and in something
lowly purpose or level it serves — at an elementary
like the normal manner, should solemnly include
or intermediate stage — or because it is satisfyingly
the announcement of 'comma', 'speech marks',
worded and expresses some memorable thought —
'full stop'. These are not part of the spoken lan-
at a suitably advanced stage. It should be well
guage, which has another signalling system. If
written.
knowledge of certain conventions of the written
Length of sections between pauses system is being tested at the same time, that know-
It may be agreed without argument that dictation ledge can be displayed better by the student sup-
should be at a speed close to 'normal', with longer plying the punctuation he judges necessc-ry, as in
pauses for writing, rather than at an unnaturally straight composition, rather than by his showing
slow speed, with words separated, and shorter that he recognizes the words 'full stop', and
pauses. For a standardized examination, a tape will succeeding in making a dot on the line. In this
be provided, or precise instructions given regarding way the learner will be actively producing even
timing and repetition. The teacher working on his more for himself, in the act of taking dictation.
own will be guided by experience and reflection. Scoring will again require the exercising of a little
off from verb phrases, head-words from defining paration, execution and scoring of a dictation, but
relative clauses, and verbs from their complements. these are not as great as the difficulties, and
Some texts will have to be rejected as unsuitable expense, of compiling really satisfactory multiple-
for intelligible division into suitable short portions choice tests. Even more important, the validity of
for dictation at a given level. Texts will always dictations, when it has been measured rigorously,
need to be carefully prepared, and pauses studied appears to be much higher. (Oiler, 1979, p. 267.)
and marked, with particular attention to inton- 'Dictation and closely related procedures probably
ation, respecting the natural tunes of speech-that- work well precisely because they are members of
has-been-exploded, rather than adopting the the class of language processing tasks that faith-
'listing' During practice it may be
intonation. fully reflect what people do when they use lan-
better to err towards dictating longer bursts, guage for communicative purposes in real life con-
stretching the short term memory span, thus pro- texts.' It is not suggested that dictation should be
viding enough data for successful processing, rather adopted as a predetermined package, but app-
than uttering short fragments that need to be re- roached and developed according to a pragmatic
interpreted as the co-text is added. This may well view of language teaching.
50
^^^^^B
S F Whitaker
Summary of suggestions References
i. Reflect upon the validity of dictation for Carroll,Brendan, Testing Communicative Performance,
yourself, so that you can use it in a way that Pergamon, 1980, p. 99.
exploits
„
V .
its useful features.
, ,. . , , ,
°" er '
Oiler, J. W.
}*'
&
*?!*£*&
Richards,
?" " Sf^°
s
C. (Eds.), Focus
J.
Z 1979
' ^^^
on the Learner.
-
U. Convince students of its value, through the pragmatic perspectives for the language teacher, Rowley,
relevance of your procedure and the satis- 1973.
faction and enlightenment that reasonable
success at it can bring.
iii. Select and prepare your text judiciously; pre-
sent it strictly in line with principles agreed
51
Penny Frantz is
Listening to and understanding spoken English ject 'studio commentator' or on the verb form
involves the student in a range of skills: for ex- itself, such a lapse being at variance with this
ample, the ability to identify individual words particular student's usual grammatical performance.
from a blur of speech, recognizing the significance What seemsto be restricting the students' accu-
of stress, intonation and syntactic patterns, and rate decoding of the message is their limited
retaining what is heard long enough for the message experience of the range of phonological realizations
to be understood in its entirety. Along with these that words or strings of words can possess. Further-
skills is the ability to anticipate or predict what is more, this limitation seems to override any seman-
likely to be heard in a given situation, using clues tic or syntactic knowledge the student could use-
drawn from the cultural context in which the fully bring to bear on the listening comprehension
speech is heard and from the observation of such process, and is characterized by the common com-
features as the speaker's facial expression, speed plaint that the English the overseas student hears
or loudness of voice. in no way corresponds to the English
Britain in
Comparing examples of students' transcriptions he has exposed to in his own country.
been
of spoken discourse in the form of a news broad- The following description is of an attempt to
cast from a tape with the original text, one finds tackle this problem. The primary consideration
some indications of the complexity of the listening was to make the student aware that full and
task. explicit articulation of each word is not a feature
In transcribing, any visual clues are absent, the of normal speech and thence to enable him to
acoustic paramount and the listener's
signal is identify words from the blur of elided, assimilated,
internalized knowledge of syntactic, semantic and stressed and unstressed vowels and consonants that
phonological rules are what he must refer to in make up the acoustic signal he receives.
order to decode the message on the tape. Although At the same time, however, it was recognized
in the first three fragments above, the students' that close attention to spoken discourse at the
comprehension is not in question, what is surprising phoneme level results in the loss of overall com-
is that the students were aware of the use of articles prehension of the text. A combination of exer-
and familiar with such strings as 'in favour of, and cises, tasks or testshad therefore to be devised to
'as a result of, yet they failed to use this know- ensure that predictive skills were encouraged, over-
ledge to modify what they heard, whereas the all comprehension was not lost and that phoneme
native speaker would automatically have cued in discrimination was developed — in. other words
the missing words. Similarly, in example 5, the that the students' awareness of the phonological
listener codes 'rest' as a verb but fails to provide potentiality of the language was extended.
any compatible number marker either on the sub- The material chosen for presentation at a weekly
52
Penny Frantzis
one-hour session was a two-minute news bulletin same time, however, two of the problematic
from Radio 4, recorded the day before the class to features of spontaneous discourse are still very
allow time for a transcription and preparation of much in evidence in the material chosen: rapid
exercises to be made. The language of a news delivery and incomplete articulation.
bulletin is, of course, distinct from the spontaneous The presentation of the material falls into three
spoken discourse of conversation in that the lan- sections.The first section deals with overall com-
guage heard is being read from a prepared text and prehension. The two-minute news bulletin is played
is thus devoid of such verbal redundancies as hesit- and the students are instructed to identify the
ationr-re-phrasings Furthermore, as
and false starts. number of news items (usually between six and
the maximum amount of news has to be condensed seven items). After students' answers have been
into a given time, the information content of a compared and a consensus has been reached, the
news bulletinextremely high, the choice of syn-
is tape is played again. This involves a writing task:
tactic structure correspondingly economic, deli-
is jotting down keywords or notes to identify the
very is rapid and the articulation is not always gist of each news item. Students then pool their
explicit. To counter objections that this is a very information in pairs or in groups, and are invited
restricted diet of spoken discourse, it should be to give the main content of each item, thus even-
mentioned that tapes of conversations, interviews tually arriving at a general outline.
and lectures are presented for study in other An alternative presentation involving reading
classes. The rationale behind the choice of a news can be used for testing purposes. Students are
bulletin, however, was based on several factors. given a sheet containing a list of ten possible head-
Firstly, it was felt that its content was of general lines and asked to pick out those they expect to
interest and relevance to the student, informing be included in the news bulletin before it is actually
him of the current issues in Britain and providing heard. While listening to the tape, they are in-
him with topics and the necessary structure and structed to number the headlines in the order in
vocabulary to initiate or join in English conver- which they occur.
sation outside the classroom. It appears also that During the next re-play, the tape is stopped at
the ability understand news broadcasts can
to intervals to allow more precise quesioning and to
serve to combat feelings of loneliness; one student identify any vocabulary problems. Names of people
remarked that he spent a great deal of time studying or places unlikely to be known are written on the
in his room, with the radio providing background blackboard or on a handout, and any cultural
music, and had very little contact with the 'outside information crucial to general understanding is
world'. Subsequently having attended the class, provided either by the teacher or students.
he actively listens to the news coverage instead of Throughout this stage, it has frequently been ob-
allowing a stream of sound to wash over him and served that students in their replies to questions do
feels much less isolated from the community in not use the vocabulary of the actual news bulletin
general. but provide a synonym, the inference being that
Secondly, given the repetitive nature of news the word heard (although understood) may not
bulletins in Britain, i.e. strikes, earthquakes, hi- yet be in their active vocabulary: e.g. 'strike' is
jacks, riots and the Financial Times Share Index, used instead of 'industrial action' or 'stoppage';
lexical items as well as structural features are 'wind' instead of 'gales'; 'snowstorms' instead of
frequently recycled in slightly altered contexts, 'blizzards', etc.
allowing the student the opportunity to consolidate A technique which can be used at the more
his previous learning. Reinforcement of the material detailed questioning stage is a series of oral state-
presented is, of course, also available outside the ments constructed so as not to include sections
class in the form of newspaper articles, TV and from the text, requiring 'true or false' judgements.
radio news coverage. Students reply by saying or writing T or F to each
Moreover, unlike spontaneous discourse, in a statement. Written replies can be checked by
news bulletin there are no changes of register or means of self-monitoring, pair work or group work.
significant differences in the accents of the news The amount of material listened to before the
readers and thus a measure of continuity can be tape stopped and questions are asked can vary
is
guaranteed each week. These constant factors have in length depending on the density of the infor-
the advantage of enabling both teacher and student mation carried. During a whole course on listening
to become more clearly aware of the progress comprehension it is noticeable how students'
made from week to week, thus quickly generating auditory memory develops and retains increasingly
confidence in listening to radio broadcasts. At the more information. Even when the material has not
53
I
Penny Frantziis
been totally understood, the student's echoic more suited to a self-study mode in the language
memory (i.e. of the sound signal) is frequently laboratory than to a class presentation.
long enough to enable him to arrive at the sense of The second stage of the presentation dealing
the stream of sound, especially if prompted by a with phonemic discrimination more obviously
searching question. This whole section could be involves testing techniques. Students are given a
presented in the form of a series of short multiple- sheet containing gap-filling tasks.
choice questions, but this, as I will amplify later, is
Text A Transcription
Every fifth word has been omitted in the text
below. Listen to the item and fill in the gap.
The Civil Service campaign support of . . . The Civil Service c ampaign[jnJsupport of
wage increase .... brought two of Scotland's wage increase ) has] brought two of Scotland's
airports to a complete | major] airports to a complete IstandsriTTI
Air traffic controllers at and Edinburgh Air traffic controllers at |Prestwick| and Edinburgh
walked out after 7 o'clock this walked out shortly] after 7 o'clock this |morning|
and they're not expected resume work until . . and they're not expectedftojresume work until
this this lafternoon.]
TextB Transcription
Listen to the news item and fill in the gaps in the
transcript below. Each dash represents one letter.
The employers' organisation says the The employers' organisation |the CBl| says the
recession - - deepening but - - - - - recession lis still deepening but |in its|
report there are some signs quarterly] report |it says] there are some signs
- - - levelling out. The report predicts that it is]l evelling out. The report predicts
small decline next 4 a further small decli ne |during the] next 4
months and cautions - - undue optimism. months and cautions against undue op timism.
The report that encouraging signs The report |says| that encouraging signs |should|
— distract attention - - that |not| distract attention from th e factjthat
!
manufacturing output is 12% its manufacturing output is over 12% below its
1975 level. 1975 level.
TextC Transcription
Words have been missed out from the transcript Many peoplejin Lincolnshire and
below. Indicate with a '/' where the words should
Oxfordshire arelstilllwithout
1 1
be and write in the margin the words you hear. electricity aft er th e |weekend|
i I
The number in brackets indicates how many blizzards, and full supplies may
words are missing in each line. not| be restored until llate)
Many in Lincolnshire and Oxfordshire are without (2) tomorrow Ex tra . tea ms of
electricity after the blizzards, and supplies may be (3) engineers] h ave| been] brought in
restored until tomorrow. Extra teams have brought (4) from Cumbria
as far |afield| as
in from as far as Cumbria to help restore power to (1) power to three
to help restore
three homes in Lincolnshire and people in Oxfordshire. (4) [thousand] homes in Lincolnshire and
|about two thousand| people in
Oxfordshire.
Text D Transcription
Complete the following gaps with numbers after
listening to the news item.
All ... . were arrested . . . years ago and are All 4 were arrested 10 years ago and are
accused of handing out rifles, . . . accused of handing out 74,000 rifles, 300
artillery pieces and more than rounds and more than 10,000,000 rounds
artillery pieces
of ammunition to their militia. of ammunition to their militia.
54
Penny Frantziis
and alerts the teacher to the problem. An alter- hear the entire news bulletin five or six times, for
native exercise as described by
might be such the most part concentrating on different listening
H. Templeton (ELTJ XXXI,
1977) in 'A new 4, skillseach time, the phonological properties of the
technique for measuring listening comprehension' language having been explored at the same time as
— a cloze exercise specially designed for listening, the meaning of the text has been unravelled and
where no reading is involved and where the sound sound and syntactic patterns have been registered.
is bleeped out of the text at given intervals. This material can, of course, be used on a self-
Text B enables the teacher to concentrate on access basis in the language laboratory. The
words^or strings of words which in their unstressed selecting and ordering of a list of headlines, a set
or elided forms are likely to cause problems of dis- of multiple-choice questions and true/false state-
crimination for the student. The tape should be ments would cover the initial comprehension stage.
prepositions, and weak verb forms, which require The transcription exercise, a sophisticated form
acute phonological discrimination and syntactical of dictation where the sounds, however, are not
awareness. By systematically drawing attention to distorted by segmenting, is particularly useful for
these forms in this way, the teacher can discover the teacher and the student in diagnosing problem
the problem areas of the students and can also making the teacher aware of the
areas and also in
sensitize the students to these difficulties. complexity of the listening task the student is
Text D concentrates on the understanding of confronted with. E.g. The phrase 'in and out of
numbers, a special skill which can be developed was transcribed as 'in doubt of by a number of
separately by a similar gap-filling exercise with students, reduced was the 'and' acoustically
so
recorded time-tables, the Financial Times Index or that a strong conviction about the meaning of the
even football results. Interesting intonation pat- text was required for the student not to believe his
terns can also be observed in the latter, and own ears!
students have fun predicting draws, wins or losses One final comment about this particular self-
depending on the intonation of the broadcaster! access format. I have used it for some time as a
The final section of the presentation involves listeningcomprehension progress test administered
the student being given a full transcription of the to a group of students at the beginning of their
news bulletin. This time the text is followed while general language course and at the end of the
listening to the tape. Exercises requiring students course. The first test is not administered until the
to mark in tone unit boundaries, stressed words students have become familiar with all the types of
and stressed syllables, and to check for contrastive test during two or three class sessions, and the final
stress are of great help in developing listening test results correlate well with the teacher's sub-
ability. The of the position of the
significance jective impression of the students' progress and
stress in the following sentence, for example, falling with their performance in general language pro-
on a word usually unstressed needs an explanation: ficiency tests. Furthermore, the student is im-
mediately made aware of his progress by the degree
understood that Mrs. Thatcher WILL now
'It's
of ease with which he feels he has understood the
make a statement in the House of Commons
news broadcast.
about her talks with President Reagan.'
55
Keith Morrow
There is general agreement among language teachers 2 It is very difficult to get the pupils to say any-
that testing students' command of the spoken lan- thing interesting. I don't mean, of course, that
56
Keith Morrow
where pupils are asked to read aloud a passage in 'Repeat' or 'Please repeat' or
the foreign language and then answer a series of 'Could you repeat that please'?
questions on it would not meet this criterion of
'reality'. Exactly what sort of task you ask your
FLUENCY: How do you equate a pupil who
pupils to perform will depend very largely on the
says very little, all of which is
58
Robert K Johnson
Cloze testing is used in this paper to provide a fam- The three factors are: (1) choice of text, (2) the
iliar point of focus for questions which are applic- scoring procedure, and (3) the deletion rate.
able to language testing as a whole; not because Alderson's findings are that 'individual cloze
cloze tests are poor tests — on the contrary, they tests vary greatly as measures of EFL proficiency
have been shown to be more effective than most — changing the deletion frequency of the test
nor because they are more vulnerable than other produces a different test, which appears to measure
tests to the kinds of questions asked. different abilities, unpredictably. Similarly,
changing the text used, results in a different
'Objectivity' and the Statistical Validity of Cloze
measure of EFL proficiency (and) changes
Tests
in scoring procedures also result in different valid-
There are a number of myths quite widely held
ities of the cloze test, but the best validity corre-
about cloze testing, which have arisen perhaps out
lations are achieved by the semantically acceptable
of wish-fulfilment rather than the literature itself,
procedure.' (Ibid., p. 225)
and are based on the premise that cloze is 'an
false
Alderson concludes: 'Testers should above all
automatically procedure which results in
valid
be aware that changing the deletion rate, or the
universally valid tests of language and reading'
scoring procedure, or using a different text may
(Alderson, 1979: 220).
well result in a radically different test, not giving
Those who adhere to this belief in its strongest
them the measure that they expect.' (Ibid., p. 226)
form stress the 'objectivity' of cloze procedures.
Other investigators have of course obtained
At some point, the words 'statistically valid' or
different results, e.g., Stubbs and Tucker (1974),
worse 'proved statistically' are likely to be used as
replicating correlations obtained by Oiler and
the ultimate knock-down argument. This position,
others, showed significant positive correlations
named by Strauch 'quantificationism', and by
between exact and contextually appropriate res-
Polanyi 'objectivism' (House, 1977: 14/15), results
ponses ( r = 0.97; p < 0.01). However, there is no
from the desire to find objective procedures which
necessary conflict between the results obtained
will relieve the investigator of any responsibility
by Stubbs and Tucker and Alderson, which suggest
for the results. Given the pressure that is placed
the eminently reasonable conclusion that the effect
upon those who set tests and examinations, this is
of changes in deletion rate will vary, depending
an understandable aim, but the conclusion that
upon the characteristics of text and subject. These
cloze procedures fulfil this requirement cannot be
and related factors are discussed further below. It
sustained (1) because the claims regarding the
may be worthwhile at this point to make two
objectivity and automatic validity of cloze tests
further points about statistical data and the con-
are largely false, and (2) because statistics are data
clusions that can be based upon them. Firstly, a
and not arguments, and valid conclusions can only
statistically significant difference is not necessarily
be reached by processes of argument. (Statistics
an important difference and an important differ-
provide a basis for argument.) The first point has ence may not be statistically significant (Carver,
been demonstrated by research findings and can
1978); secondly, tests which are statistically
be amply supported by appeal to common sense;
identical in relation to a particular population may
the second is taken up in the later sections of this
yield very different outcomes when used with a
paper.
different population (Farhady, 1979).
Alderson (1979) focuses upon three factors
which are important for claims that
critically The Selection of Cloze Texts
effective cloze tests can be constructed by objective Having dealt with the question of automatic valid-
procedures which are independent of the intuitions ity, there remains the question of objective pro-
and judgements of the person constructing the cedures. Cloze tests do not satisfy the basic pre-
test, and give results which are valid and reliable. requisites for a claim to objectivity, and seem
59
Robert K Johnson
typical of that class of instrument discussed by It should be clear from the above that the
House which has 'replicable procedures', but is exercise of judgement in the selection of passages
'infected by biases and hence qualitively necessarily causes cloze tests to be 'infected by
subjective' (House, 1977: 41). bias', and iteminently sensible and desirable
is
To quote from Moser and Kalton, bias arises that this should be so. Subjectivity in the form of
when the legitimate exercise of informed judgement is
(1) the selection is consciously or unconsciously not and should not be regarded as harmful. To
influenced by human choice quote House once again:
(2) the sampling frame which serves as the basis
The evaluator must be seen as caring, as inter-
for selection does not cover the population
ested, as responsive to the relevant arguments.
adequately, completely or accurately (Moser
He must be impartial rather than simply
and Kalton, 1971)
objective. (House, 1977: 46)
'To ensure true randomness, the method of selec-
tion must be independent of human judgement.'
Deletion Rate
(Ibid., p. 82) Most studies which have focused upon deletion
Consider at least some of the major acts of
rate have shown that there is a high level of corre-
human choice based on judgements which go into
lation in the results achieved if the rate is every
the selection of a suitable cloze text for a partic-
fifth word or above, and that going beyond every
ular group of language learners.-
fourteenth word is uneconomical and unnecessary.
(a) intellectual content,
It is not valid to conclude from these studies
(b) cultural content,
that it makes no difference whether every fifth,
(c) linguistic difficulty,
seventh, ninth etc., word is deleted, and evidence
(d) register,
exists that deletion rate does make a difference in
(e) level of formality, and some cases (see Alderson above). A more valid and
(f) idiosyncracies of style, e.g., lists of items and obvious conclusion would seem to be that two
a high proportion of idioms, proper names,
very similar cloze passages, tested with identical
and numbers. populations, will give very similar results. As a
Consider too the adequacy of a cloze passage, corollary we might note that changing the gapping
usually 250-300 words in length, as a sample.
procedure for a passage may provide two tests
Clearly, it does not cover the population (every-
which are practically identical, and that this has
thing ever written in the English language?) been shown to be the case in a number of studies,
adequately, completely or accurately, and any though not in all.
attempt to do so would be ludicrous because of its
There is nothing magical about a randomizing
insensitivity to the requirements of the testers and
process for gapping a cloze passage. Given the sub-
the purposes and abilities of the learners.
jectivity of the judgements that normally contri-
Given that these requirements regarding the bute to the selection of the passage there can be
selection of a suitable passage are met through pro-
no subsequent claim to objectivity. Alderson
cesses of rational argument, intuition, and common offers a revealing analogy, and draws what seems
sense by people with the necessary knowledge and
to me to be the appropriate conclusion:
experience to support their judgements, a number
of conclusions seem likely to follow, e.g. that the The (deletion rate) procedure is in factmerely
results from a cloze test based on a carefully sel- a technique for producing tests, like any other
ected passage would correlate less highly and dis- technique, for example the multiple-choice
criminate less well. It also seems likely that, given technique, and is not an automatically valid
a highly selected passage, variation of the deletion procedure. Each test produced by the technique
rate might not affect the results to a statistically needs to be validated in its own right and
significant degree, while this would be less likely modified accordingly. (Alderson, 1979: 226)
if a passage were selected at random.
We are all accustomed to the notion of a bad
I would argue then that the high levels of corre-
multiple-choice item, yet the notion of a 'bad'
lation achieved in a number of studies involving
item in a cloze test seems alien. It should not be.
cloze tests may
be regarded as resulting from, and
Each gap is an item and capable of being validated
providing supporting evidence of, the reliability
in the same way that other items are validated.
and validity of the judgement of the person who
selected or prepared the texts rather as evidence 1. Rational justification It seems to test the kind
.
bearing upon cloze procedures per se. of thing we think should be tested which ideally ;
60
Robert K Johnson
would require a theoretical framework including Before looking at these factors in some detail, it
a theory of language, of reading (or listening), may be desirable to clarify the situation with
of the relationship between reading and cloze, regard to claims that exact scoring solves the prob-
and a full statement of the learners' aims. lem of marker subjectivity. If the examiner is in a
The fact that we do not have such a theoretical situation where the markers cannot be trusted or
framework at present does not mean that we adequately supervised, exact scoring is a useful
should abandon rational justification, which as expedient, no more and no less. The subjectivity
has been stated, is the basis for the selection of of judgement has simply been transferred from the
the passage in the There are theoret-
first place. marker, who is in a position to know what he is
ical models available and these can and should doing and why he is doing it, to the author, who,
be used. There are also insights based on experi- unless the passage was specially prepared, had no
ence which can be brought to bear. idea that his work would be used in this way.
61
Robert K Johnson
comparison ends. Above all, no judgement can be feelings, and thoughts, and it does so in some-
made or is required as to whether non-exact re- what the same way regardless of the particular
placements are in some sense worse than, or better form of the language or the culture of the
than, the original. user, as long as the language is a so-called
Once the focus changes from the passage to the natural language that is used from childhood
reader, the purpose from establishing degrees of on as a native language by its users. (Carroll,
compatability of text and reader to determining 1971: 177)
absolute levels of attainment by the reader, and
conclusions from value free to value loaded, then
It is not possible therefore to assume that the
the premises underlying the use of cloze procedures
second language speaker uses that language in ways
must be re-examined very closely indeed in order
which are directly comparable to the first language
to justify their continued use. In particular the
speaker. The experience of learning a language
wording of the passage is no longer unchallenge-
primarily in the classroom (and this discussion
able. There is no reason for example why a non-
relates only to such learners) is totally different
exact replacement should not be 'better' in some
from the experience of learning a language 'natu-
sense, e.g., more precise, more expressive, less
rally' as part of the general process of socialization
ambiguous, etc., than the original. Similarly, the
and maturation. It is not to be expected that the
rationale for random deletion no longer obvious
is
range of information, experience, feelings, and
once the purpose is the comparison of readers and
thoughts will be as wide, nor will the language
not of passages. The justification for cloze pro-
usage be particularly closely attuned to the native
cedures which provide a means for comparing
speaker culture. The imposition of a native speaker
passages becomes irrelevant. equivalence as the performance standard by which
There are a number of problems relating to the the second language speaker should be measured is
shift from readability to reader ability where the
therefore impractical. It is also undesirable in that
reader is a native speaker of the language, e.g.,
the aim is set far beyond the rather limited require-
variations of dialect, idiolect, control of the 'elab-
ments and purposes of most second language
orated' variety, etc., but these are not considered
learners and second language teaching programmes.
here, and can be implied from the arguments
It cannot be assumed, therefore, that testing
regarding the next change in the function of cloze:
procedures which are valid for native speakers can
from the assessment of native speaker reading
be applied automatically to second language
ability to the assessment of second language
speakers. In the case of cloze procedures there are
speaker reading ability.
many reasons for concluding that such an assump-
tion would be false.
From Native Speaker to Second Language Speaker Text type is one factor which is revealing. One
Evaluation part of native speaker competence seems to be the
The use of procedures with second language ability to identify different registers and styles to
speakers which were considered appropriate for the extent that text type is not normally a source
testing the language skills of native speakers in- of variation in native speaker cloze scores. Research
volves a number of assumptions. The first assump- has shown however that for second language
tion that has to be challenged is that the constraints learners, texttype is an important source of vari-
under which a second language speaker operates ation (Freeland, 1979). Freeland concludes (p. 6)
are the same as those of a first language speaker. that
The second assumption is that the second language
certain assumptions about cloze tests need re-
is acquired for the same purposes as the first lan-
examining And the practice of expressing
guage.
learners' scores as a ratio of native speakers'
Carroll has stated the position very clearly in
is between native
suspect, since the relationship
laying down principles for dealing with educational
and non-native scores can fluctuate in unknown
issues in relation to native speakers and non-native
ways.
speakers of languages as well as non-standard
dialects: Oiler came to a similar conclusion:
(1) Language is a complex human phenomenon There if any reason to assume that con-
is little
that takes the same general form wherever it is clusions from research with native speakers can
found. It permits the expression of a certain validly be generalised to the case of non-native
very wide range of information, experiences, speakers (Oiler, 1973: 107).
62
Robert K Johnson
On the question of exact versus acceptable linguistics, and little if any understanding of the
scoring, Oiler states: 'Clearly, when dealing with passage.
non-native speakers there is something counter- Two points are at issue here: the largely self-
intuitiveabout requiring the exact word' and data evident one that the intellectual content of a pass-
from tests reported by Oiler (1973: 109) age will affect the scores, and is a potential source
of bias; and the less obvious point, which for that
supported the conclusion that with non-native
reason may be more important, that it is possible
speakers the method of allowing any context-
to complete items satisfactorily in the absence of
ually acceptable response is significantly superior
any global understanding of the meaning of the
to the exact word scoring technique.
text.
Elsewhere, Oiler discusses the problem further, The test setter therefore has to make decisions
noting (1) word replacement makes
that exact regarding the intellectual neutrality of the passage
tests extremely difficult for L2 speakers, and (2) content in order to avoid criticisms that the text
that exact word replacement often requires insights favours some learners and disadvantages others. It
which may not be regarded as language skills (Oiler, may be desirable to bias the content of a passage
1972). to reflect the purposes of a particular course
Thus there is evidence of the need for principled (English for nurses; English for engineers, etc.) but
judgements to be made in the development of the question still remains whether the content
cloze tests in relation to the selection (or develop- should be truly neutral, i.e., intellectually familiar
ment) of suitable texts, decisions regarding the to all learners so that the gap-filling exercise tests
elements to be deleted, and regarding degrees of primarily language ability, or whether the content
acceptability under a system of non-exact scoring. of the passage is regarded as part of the challenge
to the learner in reconstructing a genuine piece of
communication. The latter will bring into play
The Selection of Cloze Passages
powers of deduction and other analytical skills as
Intellectual Bias well as memory, which will not be required by the
The relevance of intellectual knowledge is well less intellectually demanding passage.
illustrated by a short cloze passage taken from
Anderson (1971). Cultural Bias
The problems of cultural bias can be illustrated
B. The idea that the (1) of a language, very obviously by the following:
unlike (2) words, are probably in-
finite (3) number, so that they (4) C. blind . Three mice,
be listed, is no (5) one, how- how run. how run. all
three mice.
Some readers may be 'tuned in' by the word
'infinite' as a cue for 'sentences' as gap-filler for Those who have had at some time or another an
(1). (Though why not 'sounds' or 'phrases', 'mess- intimate experience of certain aspects of the child-
ages' or 'meanings'?) The real clue is 'Chomsky', rearing practices of native speakers of English will
whose often quoted position on the generative be able to complete this cloze passage without dif-
power of language is invariably expressed in terms ficulty. Of those who lack this experience, only
of 'sentences', no doubt because transformational those who have studied eighteenth century British
generative grammar was until recently essentially a political history are likely to have come across the
sentence grammar. In other words, if the cloze passage in its original role as political satire.
reader does not have an intellectual grounding in It is obvious that such a passage would not be
the transformational generative orthodoxy as pro- selected because of its idiosyncratic style and the
pounded by Chomsky and his followers, he will fact that some learners at least may have memo-
not be able to gap (1) and possibly gap (5). If
fill rized it. Yet the same reasoning applies to ex-
failure to understand results in carelessness in the pressions such as:
use of clues which are available, the learner may
D. Birds of a flock together.
fail to fill gap (4) satisfactorily. The other gaps will
probably be filled satisfactorily by most native Such idioms are 'known' in very much the same
speakers even though they have no knowledge of way as nursery rhymes are known and for the
63
Robert K Johnson
same reason. Success the gap is directly
in filling almost vertically on the enemy with a sound very
related to the extent to which this particular string much like a hail storm, no doubt, as a roof of
of words has been committed to memory. Anyone shields was hastily erected as protection. The
who has not committed the string to memory will weapons may change, but the cliches live on.
insert, for example, 'kind' or 'species'. There are no easy dividing-lines to be drawn.
The step from so-called common sayings such When does a metaphor become a cliche? How do
as the above, to cliches, which are in fact far more we differentiate between a cliche and a string
common, is a short one, and objections to 'birds of which has a high level of sequential or collocational
a feather', etc., apply equally to surviving archaisms probability (Beattie and Butterworth, 1979: 210)?
such as It is no more possible to have a piece of discourse
read or have read a certain genre of historical and technological world is culturally neutral in the
novel will probably identify a 'dungeon' as the sense that there is nothing specifically British
appropriate place for traitorous villains, and will about it, that access to this world is the main pur-
know that traitors are not 'put' or 'placed' or pose of learning English as a second language and
'locked up' in a dungeon, they are 'cast' into them. any bias that results from a choice of passages
On second thoughts they might be 'flung', but I
relating to this world is therefore fully justified.
would vote for 'cast'. One more example: The argument is sound in international terms, but
in relation to a national education system for
G. The posse rode unsuspecting into the ambush example it may be said to introduce an unaccept-
and was met by a of bullets from the able level of discrimination in favour of the
outlaw guns. socially and economically advantaged sections of
Those of us who have read our Westerns, not to the community, who have access to that world,
mention war stories and detective stories, are and against the socially and economically dis-
sufficiently habituated to the notion of a 'hail of advantaged, who have spent their lives in villages
bullets' not to notice how extraordinary the meta- or urban slums.
phor, which presumably preceded the cliche, really As Oiler (1973) has observed, you cannot and
is.In this context 'hail' is almost a collective noun, should not try to separate language skill from
to be included with a 'pride of lions' and a 'gaggle knowledge of the world in the measurement of
of geese'. The meaning is linked to the base performance, but it is necessary to consider pre-
meaning in the sense of a large number of small, cisely what knowledge of the world can reasonably
hard, punishing objects travelling at speed and in be expected of particular learners given the con-
close order; yet 'hail' operates on a more or less straints under which the learning has taken place.
vertical axis, while bullets, generally, do not. You If such factors are not taken into consideration the
64
Robert K Johnson
Linguistic Bias depends upon a steady of elec-
In addition to the judgements that have to be tricity a break. the middle of the
made regarding intellectual and cultural content, would also cause problems. course, the
there is linguistic content. (It is not suggested of fellow could , but the human voice
course that these are discrete categories; they are not loud enough to that far. An
merely convenient points of focus.) additional is that a string break
R.E. Johnson has noted the high level of corre- on the instrument. there could be no
lation between cloze scores, and measures of to the message. It clear that the
redundancy (Johnson, 1975: 435) and raises best would involve less distance.
several points which are relevant to the selection there would be fewer problems. With
of texts, and other issues (marking and what it is face to contact, the least number
that cloze tests measure) which are discussed fur- things could go wrong.*
ther at a later stage.
In making judgements regarding the linguistic
features of a text, then, it is necessary to decide
1. Johnson notes that missing words may be in-
the extent to which the gaps in a given passage
serted without an understanding of the passage
some can be filled regardless of an understanding of the
as a whole, or even, in instances, of the
which a gap occurs. passage as a whole, and to determine whether or
particular linguistic string in
It may therefore be questioned whether the
not this is acceptable in view of the purposes for
purpose of the cloze test is in fact the recovery which the test is being set.
of the original message. It seems rather to be Before advancing any further it is obviously
the recovery of the original text. necessary to attempt to come to terms more
closely with what it is that cloze tests actually
2. He suggests that passages may prove effective measure.
for cloze testing because they are highly redun-
dant and that such passages may be extremely A Transfer Feature Theory of Cloze Items
boring in their unreduced original form. Reading has been described by Goodman as 'a
It is arguable then at least that texts selected psycholinguistic guessing game', a characterization
for cloze tests are by normally accepted stand- which many people have found intuitively satis-
ards poorly written, a strong argument against fying. The objective of the game is to achieve
accepting only exact replacements. understanding, and it is generally accepted that the
meaning of a linguistic string is not the arithmetical
On the first point, it is easy to demonstrate that product of the elements in that string, nor is the
cloze passages may test something other than the meaning of a passage the arithmetical product of
ability of the reader to reconstruct the original the meanings of all its sentences. On the contrary,
message. The following passage was constructed by the global meanings which result from the pro-
Bransford and McCarrell in such a way as to ensure cessing of linguistic strings in the short-term mem-
that the reader finds it totally incomprehensible ory are achieved partly in terms of the constraints
regardless of the fact that there are no linguistic exercised by the linguistic items in the string, and
difficulties. The fact that the writer's overall partly by means of the non-linguistic information
meaning remains totally obscure does not materially that the receiver brings to the task of compre-
affect the use of this passage as a cloze test, which hension. Smith (1971) and others have suggested
gives support to the argument that cloze tests that what the receiver brings to the task of de-
focus on relatively low order language skills re- coding is of far greater importance to eventual
lating to 'core proficiency' rather than higher comprehension than the linguistic items on the
order language skills like reading comprehension page. This extra-linguistic data was categorized by
(Alderson, 1979: 225). Uhlenbeck (1963) as follows: (1) the situation in
Clearly, the following passage tests core pro- which the sentence is spoken, (2) the preceding
ficiency in some limited sense and little else. sentences, if any, (3) the hearer's knowledge of the
speaker and the topics which might be discussed
H. If the balloons popped sound wouldn't
with him.
be to carry since everything _be
These extra-linguistic factors will be referred to
too far away_ the correct floor. A be clear
here as the 'presuppositional base'. It will
.window would also prevent
sound from carrying, since buildings
tend to be insulated. Since the whole 'See Appendix on page 71
65
Robert K Johnson
that the failure of the reader to understand the Selection of items
passage quoted earlier from Bransford and It is now possible to return to the consideration of
McCarrell, even to its fully restored form, results some types of judgement that might be made in
from the lack of an adequate presuppositional base. selecting items for gapping.
A recent discussion of cloze procedure by Finn It has been suggested that one reason for using
(1977) provides a useful means for relating the objective procedures in gapping a cloze passage is
general model of decoding characterized above to that no adequate theoretical base exists for making
the specific case of completing a gap in a cloze decisions affecting which linguistic items should or
passage. Finn proposes a 'transfer feature theory' should not be gapped. As the previous section
of processing in reading, which combines Shannon shows, we may not have all the answers, and may
and Weaver's (1949) contribution to communi- never have them, but we do have a basis for app-
cation theory and Weinreich's (1966) semantic roaching the task in terms of transfer features, and
theory. in particular in terms of the analysis of the sources
Finn defines the 'cloze easiness' of a word in of the transfer features, e.g.,
to be carried by a particular item. (It will be tested knowledge derived from sources
obvious from such examples as a function word other than the text, and the only grounds
with a low easiness score, that 'information' here is for accepting such an item in a cloze test
used quite differently from 'meaning' and is in fact would be that the non-textual knowledge
determined without reference to meaning.) forms an integral part of the learning pro-
Finn uses the term 'transfer features' to describe gramme and/or the inclusion of this pre-
those grammatical and semantic markers and 'dis- suppositional element provides a desirable
tinguishers' which supply redundancy or which bias in view of the aims of the test (a de-
generate expectancy. The term is taken from cision which most public examiners might
Weinreich and relates to the fact that the inclusion find it hard to defend) or that the infor-
of a particular word in a discourse can dictate mation in question is neutral in that it is
some lexical features for other words in the dis- readily and equally accessible to all candi-
course, and Finn defines 'information in a word' as dates.
'a function of the number of features not supplied If the presuppositional requirements for
by transfer features in discourse' (Finn, 1977: 520). the items cannot be justified in these terms,
In a theory of processing in reading, I find the the item should not be used.
distinctionbetween information and meaning in- b. If the transfer features are drawn from the
appropriate and unhelpful since the objective of text, do they relate to meaning or form?
the processor is to arrive at meaning and not Do the transfer features arise out of an
'information'. In cloze testing, however, the pro- understanding of the passage as a whole,
cessor actively seeks transfer features in order to are they dependent upon understanding
achieve a high level of expectancy regarding the the immediate context only, or do they
identity of a particular linguistic item. In this con- depend primarily upon collocation or other
text, and for my purposes (unlike the Weinreich primarily linguistic features? It seems
model), transfer features are seen as being derived reasonable to suggest that cloze tests can
from the non-linguistic aspects of the presup- only claim to test communicative skills if
positional base as well as from the text. items which depend on recovery of the
theme and reasonably precise grasp of the
66
^i^a^MH M^H
Robert K Johnson
meaning of local contexts are emphasised, (3) 'the' exercises a strong constraint on the
while those which require a purely linguistic meaning since it identifies the 'swearer' as the
response are restricted in number if not driver of the bus. 'a' also exercises a strong
eliminated. This does not imply an emphasis constraint in that it eliminates the driver of
on content words at the expense of function the bus and indicates some other driver.
words. It means the selection of items which
exercise a positive constraint on the (4) 'a' is eliminated by transfer
unacceptable,
meaning of the passage and which carry features from
and exemplifies a colloc-
'first'
sufficient transfer features from other ational pattern which should be familiar, e.g.,
aspects of the text to make that constraint the / first, last, only / time, chance, etc. The
readily apparent to those who have achieved meaning constraint is extremely localized;
the level of language ability which the test however, the item is suitable and could be pre-
is designed to evaluate. An example may be tested.
helpful at this stage to illustrate the notion
of constraint exercised by an item on the A subsequent part of the (reconstructed) cloze
meaning of a passage; a function word is passage reads as follows:
used in the example because the notion of
As I got off the bus, I heard the driver call out:
constraint on meaning is comparatively
'Mind the traffic this time!'
obvious in the case of content words.
Articles have been chosen because auto- It is now clear from transfer features based on
matic deletion procedures tend to provide evidence internal to the text that gap (3) should be
a large number of such items. filled by 'the'. It is considered to be an excellent
item in that it tests understanding of the text as a
Yesterday as I was crossing (1) whole and should be included in the pre-testing.
I. road,
dodging cars and bicycles to catch (2) The example given above illustrates another
bus that passes near RELC, (3) point very clearly. It is not the case that we do not
driver
swore at me in English, (4) first time
know what is being tested in a cloze passage. It
this has ever happened to me in Singapore.
seems to me that we can state rather precisely
what it is that each item is testing, and to this
Let us assume that I intend to include at least extent it could be claimed that cloze tests are not
one item in my cloze passage which tests control integrative,but a means of providing discrete point
of the distinction between the definite and in- items which have a somewhat enhanced presup-
definite articles, and I have to consider the merits positional base. However, the distinction (discrete
of (l)-(4) above for this purpose. point/integrative) is not one which can usefully be
applied to the nature of the learner's task (e.g., a
67
I
Robert K Johnson
primarily through linguistic transfer features. who produced the following in response to the
Rational justification for the selection of items cloze test on Three Blind Mice that was considered
and a requirement that the pass mark must be in earlier.
the 60%-70% range would greatly improve the
K. Ignoring blind alleys. Three investigated
quality of cloze tests and cause more them to bear
closely upon the aims of the teaching programme mice, analysed how they run, studied how
they are used to evaluate as well as promoting farms run. Three all sought after rich farmers.
more constructive classroom practices. One, who casts off his tails to a deserving
68
Robert K Johnson
word gap. Provided that the candidate performs usually make a text unduly difficult. (Bright
those functions which have been identified as the and McGregor, 1970: 20)
aim of the particular test, are there really grounds
Difficult texts can be toiled painfully through
which consist
for rejecting acceptable replacements
but the process is dreary, bears no resem-
of two words or more, rather than one? Is there
blance to reading and is not conducive to the
any reason why the gaps themselves should consist
establishment of good learning habits. (Ibid.,
only of a single word? The gap could consist of a
p. 19)
phrase, a sentence, or indeed of a complete para-
graph.
Even a revised claim, that cloze activities corres-
The answers to such questions should be form- pond to intensive reading with a difficult text,
ulated in terms of the purposes of the tests and the
cannot be sustained; partly because of the type of
constraints under which the examiners have to item as previously stated, but more importantly
operate. Assertions that such tests would not be perhaps because what cloze requires is entirely
then does cloze test: receptive ability or productive vagueness in guessing the meaning of words
ability, understanding knowledge?
or linguistic must be accepted. The teacher should not ex-
The rather answer seems to be
unsatisfactory pect students tocome up with exact meanings
almost any combination of these, but with a pre- while guessing in this manner. (Kruse, 1979:
mium, at the lower levels of achievement, against 209)
communicative competence (i.e., the ability to
This is a very different, and much less precise
gain understanding in spite of linguistic deficit)
requirement than that of filling a cloze gap; it
and in favour of linguistic competence (i.e., the challenges the reader's communicative competence
ability to identify an acceptable slot-filler in spite
in the broadest possible terms, while the cloze
of a communication deficit). A high level of item challenges the linguistic competence in very
achievement would normally reflect a balance of precise terms indeed.
communicative and linguistic competencies, but,
as has been noted, accepted achievement levels on
Can Cloze Technique Be Taught?
cloze tests are frequently low.
One aspect of test construction which is rarely
Perhaps the best way in which to approach the considered as seriously as it should be is the effect
question of what it is that cloze tests test is to
that tests have upon the classroom teaching and
look at what the candidate does. It is sometimes
learning situation. A great deal has been written
claimed that the gaps in a cloze passage are equi-
and said in the last twenty years about account-
valent to unknown elements and that in a text,
ability and the 'contract' between teacher and
gap-filling samples a natural and normal reading
learner. It is often assumed that the primary con-
activity. The claim is false on both grounds. Gaps
tract between the English language teacher and
do not reflect unknown elements in a text, which
learner involves the acquisition by the learner of a
are normally low frequency content words. Gaps
certain competency in the English language. In
in cloze tests are rarely of this type and often
these terms, the relevant function of the examiner
consist of function words. Secondly, whatever the
is to devise a test which samples the knowledge
gapping rate (or average gapping rate if non-
and skills of the learner in such a way as to deter-
random selection is used) the proportion of un-
mine whether or not a satisfactory level of language
known elements will be far higher than would be
ability has been achieved. The examiner's task
acceptable for any normal reading purpose.
would be much simpler if this were true, and one
Experience has shown that more than twenty major question regarding reliance on cloze tests
five new words per thousand running words could be eliminated. In fact the primary contract
69
Robert K Johnson
that teachers have with the learners is to get them to give poorer students intensive practice in
through their examinations, which leads to the obtaining transfer features from the gapped text
well-known practice of 'teaching to the exam'. It which would assist in determining an appropriate
should be true that the best preparation for an replacement item.
examination which tests language ability would be As was noted above, varying proportions of the
the development of the learners' language ability, transfer features on each item are grammatical.
but this is generally not the case, partly perhaps The 'good' second language speaker shares the
because examiners are really concerned more with native speaker's ability to judge grammaticality to
obtaining a satisfactory distribution of marks than some extent at least; the poorer reader will be less
with true accountability. As a result, teachers have able to make such judgements and this will be
to decide whether to develop their pupils' language reflected in the cloze scores. Another approach
abilities to the greatest possible extent, while then would be to develop an intensive basic pro-
accepting that for most of them this will mean gramme in grammatical analysis. This might go
failure in the examination, or abandon true com- some way towards compensating for the deficiency
municative competence as a goal and aim instead in the poorer reader, and repay time and effort as
at the appearance of such competence as demon- regards cloze score improvement rather better than
strated by the ability to carry out the tasks required a programme designed to develop overall language
by examiners at a level which will secure a pass. and reading ability.
Nodoubt, in seeking to acquire these examin- Various writers have shown that very high levels
ation skills, pupils' language abilities improve, but of correlation can be obtained between cloze test
the examination is the goal and a pass is the moti- results and a range of other tests and conclude that
vation. the cloze test can, under these circumstances, be
In these circumstances, it behoves any examiner substituted for the scores from a whole battery of
to bear in mind the likely classroom repercussions tests with consequent saving in time and resources
of the selection of a particular testing technique. of examiner and learner alike. Perhaps the most
The next claim that we might consider regarding important reason against such a step is the effect
cloze is that the technique for completing such that this would have upon the language teaching
tests cannot be taught and therefore the only way programme. Even if it is true that cloze cannot be
to prepare pupils is by improving their language taught, teachers and learners will believe that it
ability. This seems doubtful in view of the differ- can.
ences already discussed between the activity of
completing a cloze test and normal language activ- Conclusion
ities such as reading, with which cloze bears the The problem in language testing, as in linguistics, is
closest superficial correspondence. It seems much that the chief result of increasing methodological
more likely that rather specialised skills can be rigour has been to show how little we actually
brought to bear on the task, and that these skills understand what we are dealing with. This is true
can be taught. on the macro level and may always be true since
One way of improving scores on a particular the ultimate questions (what is language? How is it
type of test is to give massive amounts of practice. different from/related to intelligence, cognition,
If the test passages are typically so difficult that and the broader issues of communicative com-
most learners will understand very little of what petence? etc.) may prove to be unanswerable.
they read, then it will be necessary to practise However, it is no solution to hope that safety from
with passages of a similar level of difficulty. Work these imponderables lies in the rejection of judge-
with more suitable passages would have little trans- ments based on principled argument; nor is it true
fer to the examination task, though it would be that at the micro level (the level of the construction
more likely to result in a genuine development of of tests and test items for particular groups of
communicative ability. learners following specific programmes for specifi-
It has been noted that 'good' readers tend to able purposes), that we lack resources for making
use the whole text in determining the appropriate reasonable and reasoned judgements. By doing so,
gap-filler, while 'poorer' readers tended to make we can increase both the true validity of our tests
mistakes which showed that they paid attention and our understanding of what it is that we are
only to preceding text, or to the immediate textual testing, whileminimising the dangers inherent in
environment (Neville and Pugh, 1976). One way of any situation where an examination rather than
improving performance on cloze tests, therefore, a syllabus or teaching programme may determine
though not necessarily of reading ability, would be what happens in the classroom.
70
Robert K Johnson
Appendix Theory of Processing in Reading', Reading Research
pp. 122-26.
steady flow of electricity a break in the
Hildyard, Angela, and Olson, David R., 1978, 'Memory
middle of the wire would also cause problems. and Inference in the Comprehension of Oral and
Of course, the fellow could shout, but the Written Discourse', Discourse Processes 1, pp. 91-117.
human voice is not loud enough to carry that House, E.R., 1977, The Logic of Evaluating Argument,
far. An additional problem is that a string could C.S.E. University of California Monograph Series in
Education, no. 7.
break on the instrument. Then there could be
Jongsma, E.R., 1971, The Cloze Procedure: A Survey of
no accompaniment to the message. It is clear the Research, ERIC ED 058 015, Bloomington, Indiana
that the best situation would involve less dis- University.
tance. Then there would be fewer potential Johnson, Ronald E., 1975, 'Meaning in Complex Learning',
Review of Educational Research 45, no. 3, pp. 425-59.
problems. With face-to-face contact, the least
Kruse, Anna Fisher, 1979, 'Vocabulary in Context', Eng-
number of things could go wrong. lish Language Teaching Journal 33, no. 3, pp. 207-17.
Littlewood, William T., 1979, 'Communicative Perfor-
References mance in Language Developmental Contexts', IRAL,
Alderson, J. Charles, 1979, 'The Cloze Procedure and Pro- 17, no. 2, pp. 123-38.
ficiency in English as a Foreign Language', TESOL Mishler, Elliot G., 1979, 'Meaning in Context: Is There
Quarterly 13, no. 2, pp. 219-27. Any Other Kind?', Harvard Educational Review 49,
Anderson, J. 1971, 'A Technique for Measuring Reading no. 1, pp. 1-19.
Comprehension and Readability', English Language Moser, C.A. and Kalton, G., 1971, Survey Methods in
Teaching 25, no. 2, pp. 178-82. Social Investigation, 2nd ed., London, Heinemann.
Beattie, Geoffrey W., and Butterworth, B.L., 1979, 'Con- Neville, Mary H. and Pugh, A.K., 1976, 'Context in
textual Probability and Word Frequency as Deter- Reading and Listening: Variations in Approach to
minants of Pauses and Errors in Spontaneous Speech', Cloze Tasks', Reading Research Quarterly 12, no. 1,
Language and Speech 22, pt. 3, pp. 201-11. pp. 11-31.
Bowen, J.D., 1969, 'A Tentative Measure of the Relative Oakshott-Taylor, John, 1979, 'Cloze Procedure and
Control of English and Amharic by 11th Grade Ethio- Foreign Language Listening Skills', IRAL 17, no. 2,
pian Students', UCLA Workpapers in Teaching English pp. 150-58.
as a Second Language 2, pp. 69-89. Oiler, J.W. Jr., 1972, 'Scoring Methods and Difficulty
Bransford, J.D., and McCarrell, N.S., 1974, 'A Sketch of Levels for Cloze Tests of ESL Proficiency', Modern
Cognitive Approach to Comprehension'. In Weimer, Language Journal 56, no. 3, pp. 151-58.
W.B. and Palermo, D.S. (Eds.), Cognition and the Oiler, J.W., Jr., Bowen, D., Dien, T.T. and Mason, V.W.,
Symbolic Processes, Hillsdale, N.J., Lawrence Frlbaum 1972, 'Cloze Tests in English, Thai, and Vietnamese:
Assoc. Native and Non-Native Performance', Language
Bright, J.A. and McGregor, G.P., 1970, Teaching English Learning 22, no. 1, pp. 1-16.
as a Second Language, London, Longmans. Oiler, J.W., Jr., 1973, 'Cloze Tests of Second Language
Cambourne, Brian, 1976, 'Getting to Goodman: An Proficiency and What They Measure', Language Learn-
Analysis of the Goodman Model of Reading with ing 23, pp. 105-18.
Some Suggestions for Evaluation', Reading Research Oiler, J.W., Jr., and Perkins, Kyle (Eds.), 1978, Language
Quarterly 12, no. 4, pp. 605-36. in Education: Testing the Tests, Rowley, Mass., New-
Carroll, John B., 1971, 'Language and Cognition: Current bury House.
Perspectives from Linguistics and Psychology'. In Riley, Pamela M., 1973, The Cloze Procedure: A Selected
Laffey, James F. and Shuy, Roger (Eds.), Language Annotated Bibliography, Lae, Papua New Guinea Uni-
Differences Do They Interfere?, Newark, Del., Inter- versity of Technology.
national Reading Association. Rivers, Wilga M., 1968, Teaching Foreign Language Skills,
Carver, Ronald P., 1978, 'The Case Against Statistical Chicago, University of Chicago Press.
Significance Testing', Harvard Educational Review 48, Robinson, Richard David, 1972, An Introduction to the
no. 3, pp. 378-99. Cloze Procedure: An Annotated Bibliography Newark,,
Clarke, Mark A. and Burdell, Linda, 1977, 'Shades of Del., International Reading Association.
Meaning: Syntactic and Semantic Parameters of Cloze Strauch, R.E., 1976, 'A Critical Look at Quantitative
Test Responses'. In Douglas Brown, H. et al. (Eds.), Methodology', Policy Analysis 2, no. 1 (Quoted by
On TESOL '77 — Teaching and Learning English as a House, 1977).
Second Language: Trends in Research and Practice, Streiff, Virginia, 1978, 'Relationships among Oral and
Washington, D.C., TESOL. Written Cloze Scores and Achievement Test Scores in
Finn, Patrick J., 1977, 'Word Frequency, Information a Bilingual Setting'. In Oiler and Perkins, 1978.
Theory and Cloze Performance: A Transfer Feature Stubbs, Joseph Barstow and Tucker, G. Richard, 1974,
71
Robert K Johnson
'The Cloze Test as a Measure of English Proficiency', 2nd ed., New York, Harcourt, Brace, Jovanovich.
Modern Language Journal 58, nos. 5-6, pp. 239-42. Wainman, H., 1979, 'Cloze Testing of Second Language
Taylor, Wilson L., 1953, 'Cloze Procedure: A New Tool Learners', English Language Teaching Journal 33,
for Measuring Readability', Journalism Quarterly 30, no. 2, pp. 126-32.
pp. 415-33. Zurif, Edgar B., and Blumstein, Sheila E., 1978, 'Language
Uhlenbeck, E.M., 1963, 'An Appraisal of Transformation and the Brain'. In Halle, Morris, Bresnan, Joan and
Theory', Lingua 12, pp. 1-18. Miller, George A. (Eds.), Linguistic Theory and Psy-
Valette, Rebecca M., 1977, Modem Language Testing, chological Reality, Cambridge, Mass., MIT Press.
72
^^^^^^^B
Paul Nation
A major construction problem is the diversion of the river to enable the foundations for the dam to be
excavated and the concrete placed. Since it would be uneconomical to construct diversion works and
cofferdams to divert the full flood discharge of the river, the diversion has been divided into several
distinct operations. The critical period will be during the low-water season, because, as the river falls,
5 the cofferdam on the left bank will be demolished where it crosses the diversion channel, allowing water
to flow through the temporary openings in the dam wall. This having been done, a rockfill cofferdam
will be constructed across the main river channel downstream of the main site. This will cause the water
at the dam site to remain quiescent by preventing any flow in this part of the main river channel and
directing water through the diversion tunnel.
(from Adamson & Lowe, 1971, pp. 106-107)
The diversion has been divided into several dis- The test shows that reference words referring to a
tinct operations because noun group are easier than those referring to a
clause.
a. the full flood of the river must be diverted.
The reference word test can also give us inform-
b. the foundations have to be excavated.
ation about individual learners. If a learner does
c. a complete diversion is too expensive.
not answer item 4 and similar items correctly, we
d. it occurs during a critical period.
know that that learner needs help or extra practice
The rockfill cofferdam with the reference words this, that, and it where
a. will direct the flow through the diversion they refer to a clause.
tunnel. We can contrast the two kinds of tests by
b. is on the main site. thinking of learning as a journey. The first kind of
c. will increase the flow at the dam site. test, exemplified by the multiple-choice items,
d. is on the left bank. tells us how far the learners have come along the
road. The second kind of test, exemplified by the
it in line 5 refers to in line
reference word items, tells us what the road is like,
This in line 6 refers to in line what difficulties can be found on the journey, and
how learners have coped with these difficulties.
This in line 7 refers to in line
There is another important difference between
The multiple-choice test tells us how much infor- these two kinds of tests. Good multiple-choice
mation the reader got from the passage. We can items, comprehension questions, and true/false
use the results of this test to classify our learners statements are not easy to make. Items like the
into groups (Pass/Fail) or rank them on a scale (A, reference word items however are easy to make
B, C ).
because the items can follow a fixed formula like
The reference word test, on the other hand, can This in line 6 refers to in line
provide us with several kinds of information. It
whereas This in item 4 refers to a larger unit (the ures to test to gain information about reading, and
cofferdam on the left bank will be demolished). we will look at possible types of items.
73
• I
Paul Nation
Types of test items: language features the verb from the text. Here are items based on
Comprehension questions direct attention to the the examples of noun groups given above.
message of a text which is peculiar to that text.
This makes it difficult for the teacher or the are no longer satisfactory.
learners to get information about points that need
depends on
further attention in order to make it easier to read
other texts. The items described in this section,
2 Co-ordination
however, test language features that are important
The following sentence can be divided into three
for the understanding of almost every text. Per-
parts.
formance on these items can be used as a basis for
1
planning further teaching.
The main cables of all modern suspension
Noun groups 2
1
Much of the complexity at the sentence level is bridges/are fixed to the tower tops, /and subject
3
caused by noun groups containing relative clauses
or reduced relative clauses. Here are two sentences
the towers to a very heavy vertical load (almost
power depends.) (An industrial demand) has the learners the verb and they copy the subject
arisen for (entirely new metals which were and object of the verb from the text.
of the noun group in the text. tests described in this article to make their own
b. The teacher gives the learners a verb from the rankings of difficulty. Examples of reference word
text and they copy the subject and object of items have already been given.
74
Paul Nation
4 Verbs encroaching a. What part of speech is it?
Verbs typically enter into a relationship with a b. What does not encroach on
subject, and an object, adjunct, or complement. In what?
a sentence like The committee reached a decision, c. Which word could you put
the subject and object of the verb reach are quite between the two sentences —
apparent. These relationships are less apparent in a but, because or then}
sentence like After much deliberationby the com- d. What does encroach mean in
mittee a decision was finally reached. Sometimes the text?
the relationships are even less apparent when the
If the learners' guess at d is a different part of
verb has become
noun. For example, in the
a
speech from a then they need to make a basic
following sentence, some learners will have diffi-
change in their strategy. Their guess must be the
culty in deciding what uses what and what com-
same part of speech as the unknown word, b tests
petes with what.
whether the learners notice the immediate gram-
Showing the orchard to grass will result in a tem- matical relationships, c tests the learners' appreci-
porary check to the vigour of the trees, due ation of the wider context. A breakdown at any
principally to the use of the available nitrogen one of these points shows where further practice is
by the grass and competition by the sward for needed.
the moisture.
2 Simplifying sentences
(NZDA, 1974, p. 50)
The steps in this strategy consist of items men-
We have seen how
What does what? item can
the tioned in the section on language features, namely
be used to test learners' understanding of noun a. reference
groups and co-ordination. It is the most efficient b. co-ordination, and
way of whether learners see the subject
testing c. noun groups.
verb (object/complement) relationship in different Here is a sample item.
parts of a sentence. Here is a sample item.
Simplify the following sentence by replacing all
that a learner cannot cope with these difficulties. Here is the answer.
If help is to be given, the teacher must know
The legs of the Severn towers, on the other
where the learner is going wrong.
hand, have no cells. The visible outer plating is
If the home orchard area is small, the problem This strategy is used whenever the learners meet a
of the large size to which most fruit trees grow sentence that they cannot understand in spite of
can be met by planting dwarf or semi-dwarf knowing the vocabulary (Nation, 1979; Long &
trees They can be grown conveniently Nation, 1980).
along the edge of the vegetable garden without
encroaching on it. Validity
(NZDA, 1974, p. 51) The validity of the test items described in this
75
Paul Nation
article depends on whether the items test real reasons for this. Firstly, performance on the items
problems in advanced reading and whether the provides the teacher and learners with feedback
strategies work. which can result in appropriate help and practice
There is considerable evidence that the items with features that occur in almost every reading
test real problems. Most of the evidence however text. Secondly, the items allow teachers to invest-
comes from studies of children learning their igate learning. That is, teachers can act as experi-
mother tongue. Researchers have found that chil- menters and develop and validate rankings of
dren learn items like reference words, or noun learning difficulty of structural features. Through
groups, in a particular order. This order seems to their testing they can gain new insights into lan-
reflect the difficulty of the items. So, children guage and how it is learned. Thirdly, the items
learn relative clauses attached to the object before direct attention to language as a system. An impor-
relative clauses attached to the subject. In addition, tant educational goal of language learning is the
comprehension tests reveal that have
children development of an interest in language for its own
more difficulty in interpreting sentences containing sake. An awareness of the system behind language
a relative clause attached to the subject. The small is a step towards this goal.
amount of experimentation done with learners of It needs to be stressed that the items described
English as a foreign language supports the findings in this article are not offered as substitutes for
of first language learning research. But there is comprehension questions. They are useful add-
need for more research with foreign learners. itional tools which may lead to a greater under-
Teachers can carry out much of this research in standing of learning and a corresponding improve-
their own classroom by combining the use of the ment in teaching.
test items described in this article with translation
References
checks and individual interviews with learners. Evi-
Adamson, V. and Lowe, M.J.B., General Engineering
dence about whether the strategies work can only Texts: English Studies Series 9, Oxford University
come from their use. They have been useful in my Press, London, 1971.
teaching but teachers should not accept them un- Barnitz, J.G., 'Syntactic effects on the reading compre-
hension of pronoun-referent structures by children in
critically. The question of validity should be the
grades two, four and six', Reading Research Quarterly
concern of all teachers.
15, no. 2, 1980, pp. 268-89.
Long and Nation, Read Thru, Longman, Singapore, 1980.
Nation, I.S.P., 'The curse of the comprehension question:
Conclusion
some alternatives', RE LC Journal Guidelines 2, 1979,
The test items described in this article direct
pp. 85-103.
attention towards structural features of a reading N.Z.D.A. (New Zealand Department of Agriculture), The
text and to analytical strategies. There are several Home Orchard, Government Printer, Wellington, 1974.
76
Brian Heaton
The need to improve tests of writing does not primarily on those aspects of grammar and syntax
seem to have received as much attention in the which meaning
relate to in a piece of discourse
past decade as the need to improve techniques for rather than meaning in isolation: e.g. reference
testing grammar and However, even during
lexis. features, connectives, substitution devices,
the heyday of the psychometric and structural omission. In addition, a controlled composition
school of testing, composition writing, together test may seek to identify and measure such judge-
with the oral interview, provided an essential ment appropriacy of style, register, rele-
skills as
mark is deducted for each mistake made) were are also ways of testing writing beyond the level of
used at one time by some examiners in an over- the sentence, using much shorter stretches of lan-
riding desire to achieve reliability, misguided and guage. Samonte and Sharwood-Smith cite the use
inhibiting methods of this nature were on the of a two-sentence text which measures an ability
whole short-lived, and were soon replaced by to form a coherent unit of language. For example,
analytical and impression methods of scoring. The candidates may be instructed to write a sentence
advantage of setting two or more short realistic to precede the statement 'Moreover, it was imposs-
writing tasks in place of one literary type of essay ible open the windows'. Example responses
to
has also been recognised for many years in several could be 'It was very hot in the small room', 'There
widely administered examinations. was only one fan in the room, but it was broken',
Nevertheless, in spite of its many merits, free 'The door slammed behind John, and he realised
composition is by far from being the only means he was locked in the room', etc. Though in all
of testing the writing skills. Neither is it necessarily cases students are required to demonstrate an
the most reliable means. Controlled composition awareness of the communicative nature of language
testing, still in its way of
infancy, offers a reliable in general,and cohesive devices in particular, there
measuring number of identifiable skills at
a limited is and freedom of res-
a large degree of subjectivity
a time. Although by no means appropriate for ponse in this type of item, as can be seen from the
every situation in which writing is tested, controlled following example stimuli.
composition may be useful in many progress and
There
(/') is one here, too.
diagnostic tests, simply because it can help teachers
(«) To do this, the water must
first be boiled.
to identify, and concentrate on, specific areas of
(Hi) These should then be carefully sorted.
difficulty. Moreover, the controlled writing task
(iv) For wild life, however, there are even
itself can be made far less time-consuming.
greater dangers in the pollution of rivers,
What does controlled composition measure
lakes and seas.
which a grammar by definition does not
test of
(v) But there is no reason to be so pessimistic.
usually measure? Most tests of grammar are con-
cerned with the recognition and manipulation of The degree of control (and objectivity) in
correct forms of language and operate at the sent- testing composition writing can vary widely. The
ence level (though good grammar tests often operate following is an example of a closely controlled
beyond the level of the sentence). Tests of written composition task which resembles in many ways
composition, on the other hand, should concentrate a test of grammar. However, it should be remem-
77
Brian Heaton
bered that the composition is intended to test when carrying out the writing task. Indeed, in
solely an ability to use appropriate connectives everyday situations in real life we rarely write
and reference devices in connected discourse. without a particular purpose in mind, whether it is
an article, a report, or notice. Consequently,
a letter,
A travel agency in Bangkok is now arranging
even the two examples of controlled composition
day tours to foreign countries by jumbo jet.
previously given would benefit from such rubrics
you can leave Bangkok very early in the
,
as:
morning and have lunch at the Taj Mahal.
you can fly to New Delhi .do some shop-
The following paragraph has been taken from a
ping returning to Bangkok at midnight.
newspaper report. The writer hopes to interest
you may leave Bangkok much later in
,
his readers by giving surprising information
the morning spend a day shopping in
about new kinds of day tours. Rewrite the para-
Singapore, returning home in time to watch the
graphs, inserting a suitable word or phrase in
evening news on television.
each blank.
Control can be relaxed to varying degrees; at
and
the other extreme, as illustrated by the following
example of written work based on information Use the following table to provide information
little control need be
given in tabular form, very about the influence of propaganda. As your
exercised. readers are largely ignorant about the effects of
Propaganda
Means Result
Regardless of the extent of the degree of con- propaganda, try to convince them of its power
trol operated, however, it is important that con- over people.
trolled composition tests should never become un- Where it is important in the test to concentrate
natural exercises involving the performance of on a particular register, students can be instructed
mental somersaults on the part of the students. to write a letter to a friend, an article for a news-
Tasks requiring students to form sentences accor- paper, a report drawing certain conclusions, a
ding to certain patterns 'Combine the foll-
(e.g. memorandum advising someone, etc. The following
owing sentences, using which and although.') often two topics from the Joint Matriculation Board
result in all kinds of errors simply by forcing Test in English Overseas illustrate how writing
students to guess what was in the examiner's mind, tasks can be put into a communicative context
adopting unfamiliar and alien lines of thought. which provides both motivation and guidance for
All written work, whether free or controlled, candidates. It is also worth noting that these items
should be carried out as far as possible in a com- appeared many years ago.
municative context. When we speak, we are gener-
ally aware that we are addressing another person: Example (i)
hence there is a basic desire — whether conscious Imagine that a British friend of yours has re-
or subconscious — to communicate with that per- cently gone to live in your country. You have
son. Because the writer often finds himself add- arranged for him to stay for a week-end with
ressing a general audience which is far less clearly some relative of yours. They are eager to wel-
identifiable than in normal speech situations, come your friend, but have never met any
such a desire is not so apparent in writing. Con- young people from Britain before.
sequently, it is all the more important in tests of Your friend is very frank, sincere and likeable,
writing to provide both a context and a purpose but he has many casual ways that you think
which the student can have uppermost in his mind might upset your relatives. Your friend is often
78
Brian Heaton
untidy and unpunctual, treats older people as writing. The scoring of tests of free-writing has, in
equals (in a very friendly way of course), arid fact, long been the subject of considerable research.
likes to argue about subjects such as politics Briefly, marks may be awarded on what the testee
and religion. Your friend probably be
will has written; on what thought the testee meant
it is
ignorant of the customs observed in your by what he wrote; on handwriting and general
country when visiting people. appeararace of the composition; or on previous
Write a letter to your friend, who may be knowledge of the student. Furthermore, it is poss-
male or female, whichever you prefer. Describe ible (and not unusual) for two markers to differ
your relatives briefly, and advise your friend widely in the spread of the marks they award,
how to behave towards them. Write between their strictness and their rank ordering of papers.
one and one-and-a-half pages. Indeed, whether the analytical method or the
multiple-marking method (often referred to as the
Example (ii)
impression method) is used, examiners award
The following table gives information about the
marks chiefly on the basis of their impression of
(jost of sending 27 tons of office machinery
the students' work.
from London to New York by air and by sea:
Imagine that you work for the company that Whatever method is used to assess written work,
wishes to sell this machinery to America. You care should be taken to avoid an excessive concern
have been asked to write a short report advising with the manipulation of language forms. At the
your company whether to send it by sea or by other extreme, of course, is the attitude typified
air. In addition to the above information, you by the statement 'I know what the candidate is
know that most other British companies that trying to say' — an attitude which too frequently
export office machinery send it by sea, and that reflects a desire to interpret on behalf of the
the American customers want quick delivery. student at all costs and which results in a neglect
The delivery time by sea is six weeks; by air it is of the grammatical and even the communicative
two weeks. Furthermore, the total value of the aspects of the written language.
27 tons of office machinery is £115,000. It is precisely in the attitude to grammatitical
Write your report to your company. It should errors where the testing of the writing skills differs
not be in letter form. Write between one and from the teaching of these skills. In the teaching of
one-and-a-half pages. writing, attention may be concentrated on the
correction of high-frequency errors or of those
errors which are least acceptable to native-speakers.
It is necessary, however, to sound a cautionary Alternatively, the teacher may wish to correct
note when devising writing tasks which stress the only those errors which are related to the particular
communicative nature of language. Creating language forms currently being taught. In tests of
cultural obstacles should be avoided when devising writing, however, attention ought to be paid pri-
realistic tasks: for example, however tempting it marily to those types of grammatical errors which
may be to instruct candidates to write notes for impede written communication. As a result of
imaginary milkmen, such a writing situation would examining such kinds of errors, Burt and Kiparsky
be quite alien to students in Thailand — or even in (1972) advocate classifying errors according to
Germany. whether they are global or local errors. They define
In many compositions, assessment of writing global errors as those which involve the overall struc-
performance is sometimes based largely on the ture of a sentence, causing the reader to misunder-
number of grammatical errors made. As will be stand a message or even fail to understand it at all.
readily appreciated, the resulting score using this Misuse of connectives, pronouns and other
relatives,
method bears little relation to the effectiveness of reference devices, wrong sequence of tenses, incor-
a student's" ability to express himself freely in rect word order and inadequate lexical knowledge as
79
Brian Heaton
well as serious mis-spellings and wrong punctuation Finally, the whole question of time should be
can usually be classed as global errors. Local errors, considered when administering tests of writing.
on the other hand, comprise those errors which While it may be important to impose time limits in
cause trouble in a particular constituent or clause tests of reading, grammar and lexis, such constraints
in asentence and which do not significantly hinder may well be very harmful in tests of writing, in-
the comprehension of the sentence. They include creasing the sense of artificiality and unreality.
misuse of articles and prepositions, lack of agree- Moreover, the fact that candidates are expected to
ment between subject and verb, incorrect affixes, produce a finished piece of writing at their very
wrong verb forms and the incorrect position of first attempt adds to this sense of unreality. How
adverbs. Burt argues that this distinction between often in real-life situations is anyone expected to
global and local errors provides the most useful write something without having a chance to pro-
criteria for determining the communicative impor- duce one or more preliminary drafts first? Not
tance of errors, claiming that the correction of one only should students be given sufficient time to
global error helps to clarify a message far more produce preliminary drafts of whatever they write
than the correction of several local errors. but they should be actively encouraged to do this
In a study based on the work of Burt and in any test of composition. If writing tests are
Kiparsky, Tomiyana examines the various ways in made far more realistic and relevant to real-life
which grammatical errors can distort written situations, emphasis will automatically be placed
messages. The results indicate that such local on writing as a communicative activity.
errors as those caused by the omission and wrong
choice of articles are easier to correct and hence
less crucial to successful communication than the
or even failing to understand a message and a local March 1968, July 1968.
Samonte, Aurora L., 'Techniques in Teaching Writing',
error as a linguistic error which renders a structure,
RELC Journal, Vol. 1, No. 1, 1970.
etc. awkward but which nevertheless does not give Sharwood-Smith, Michael, 'Courses in Written English -
rise to any real difficulty in understanding the in- Some Comments and Words of Caution', ELT Docu-
tended meaning of a sentence. The whole area of ments, (73/1), 1973.
Sharwood-Smith, Michael, 'New Directions in Teaching
global/local or communicative/linguistic errors is a
Written English', Forum, Vol. XIV, No. 2, April, 1976.
rich one for further research and may well provide Tomiyana, M., 'Grammatical Errors Communication
a systematic method for assessing fre-written work Breakdown', TESOL Quarterly, Vol. 14, No. 1, March
with deeper insight. 1980.
80
HB^^H M
JPB oyle
Passing your driving test means being able to drive 'And this function (of literature in the state)
the car not simply knowing the Highway
well, has to do with the clarity and vigour of any and
Code and a manual on engine maintenance. Lan- every thought and opinion. It has to do with
guage testing nowadays, like language teaching, maintaining the very cleanliness of the tools,
also stresses the ability to do something with the the health of the very matter of thought itself
language, not merely to know about its formal When work goes rotten —
their (writers')
characteristics: the rules of use are seen to be by that I do not mean when they express
important as well as the rules of grammar. Alan indecorous thoughts — but when their very
Davies' Survey Articles in Language Teaching and medium, the very essence of their work, the
Linguistics: Abstracts (0 outline well the new application of word to thing, goes rotten, i.e.
awareness of the problem of testing communicative becomes slushy and inexact, or excessive and
competence as well as formal knowledge. Inter- bloated, the whole machinery of social and
estingly, he ends his careful and wide-ranging survey individual thought and order goes to pot. This
with the question: 'Is communicative testing is a lesson of history, and a lesson not half
feasible?' learned.' («)
Whatever our feelings on this, most people with
experience in language testing would agree that a So the first justification for keeping language
good test will contain both questions of a general teaching — and therefore testing — in touch with
nature and questions on more specific details, in literature is that literature, being language at its
English. But with language, as with medicine, the every wth word, or deleting on a more rational
specialist must first be an expert in his general basis-, allowing only the exact word or accepting
field. And with language that general field is reasonable alternatives. For literature students the
human nature in action — the realm of literature. cloze is probably more valid, certainly more rele-
In this context, then, let us see what kind of vant, if something like the following can be given.
testswould be appropriate to students of literature. It is taken from the text of an interview with
Rather than speak in general terms, I will give Joyce Cary, the novelist, contained in the Paris
examples where possible. Review Interviews, Writers at Work.
1. Reading
a. short story is read, e.g. Frank O'Connor's My
A INTERVIEWERS: Have you read The Bost-
Oedipus Complex. Specific comprehension ques- onians? There was the spellbinder.
tions can be asked: CARY: No, haven't read that.
I
(i) What particular delight did climbing into his INTERVIEWERS: The Princess Casamassima}
mother's bed in the morning give the boy? CARY: I'm afraid I haven't read that either.
(ii) What incident brought the boy and his father Cecil is always telling me to read her and must. I
into this house, I'm going out.' Father stopped ization, cultivated and sensitive, fearfully
dead and looked at me over his shoulder. exposed to frauds and brutes and
'What's that you said?' he asked sternly. 'I was grabbers. This was tragic theme. But my
only talking to myself,' I replied, trying to world quite different — it is intensely
conceal my panic. 'It's private.' a world in creation. In world, politics is like
navigation a sea without charts and men
b. The must be able to discrim-
literature student
live the lives of pilgrims.
inate in between facts that are non-
reading,
essential and others which are of central symbolic
significance. In D.H. Lawrence's The Horse Dealer's In this example I have deleted mechanically
Daughter, for example, the relevance of the pool every sixth word. To guess the correct word in
(into which the doctor wades to rescue the some cases may perhaps seem extremely difficult,
attempted suicide) could be questioned. This type even for the native speaker, e.g. 'static', which fills
of question tests not simply accuracy in remem- the seventh blank, or 'exposed', which fills the
bering all the facts of a story — 'vacuum-cleaner fourteenth. However, two things must be remem-
reading', as it has been called — but the power to bered: first, that such a short passage is not a real
appreciate the importance of certain facts — like example of a cloze test; secondly, that with a
82
JPB oyle
longer passage, the same mental mechanics go on Another way is to give a picture of a face which
as in crossword puzzle solving — clue 3 across seems expresses a complex of emotions. The Mona Lisa
impossibly difficult, until you get helped by dis- would be good example. The student has to des-
a
covering the answer to clue 5 down. Similarly with cribe the face. Or a pair of pictures can be used,
cloze, a word guessed later on in the passage gives e.g. a self-portrait of Rembrandt and a self-portrait
a clue to an earlier word, e.g. 'dynamic', which fills of Van Gogh. Works of art are usually more useful
the fourth last blank, would not be impossible to for this, in that they are more enigmatic, less
get, particularly with the help of 'a world in cre- explicit than most photographs. But good photo-
ation'immediately following it. And once 'dynamic' graphs can readily be found too.
has been guessed, in the context of the whole pass-
age, the earlier seventh blank might well be cor- b. Students of literature are no different from
rectly filled too, 'static'. other ESL students in their tendency to make
grammatical errors. I am concentrating here on
INTERVIEWERS: Have you read The Bost- test-types which seem to be particularly relevant,
must have Proust — in his very different world down, because of the slip, 'felled' or 'failed'. I am
of change. The essential thing about James is not saying the student should not be 'felled' for
that he came into a different, a highly organ- such errors in a grammar test. Since simile and
ized, a hieratic society, and for him it was not metaphor are so important in literature, simple
only a very good and highly civilized society, tests can be devised. The student has to complete
but static. It was the best the world could do. the sentence imaginitively.
But it was already subject to corruption. This (0 'His face was wrinkled like
was the center of James' moral idea — that 'Her hair was flowing like
everything good was, for that reason, specially 'The old lady's teeth were black like
liable to corruption. Any kind of goodness, (fl) More difficult would be examples with two
integrity of character, exposed that person to blanks:
ruin. And the whole civilization, because it was 'His face was like
a real civilization, cultivated and sensitive, was 'Her hair was like
fearfully exposed to frauds and go-getters, 'The old lady's teeth were like
brutes and grabbers. This was his tragic theme.
(Hi) With less control will come more variety and
But my world is quite different — it is intensely
scope, as well as more difficulty:
dynamic, a world in creation. In this world,
'His was like
politics is like navigation in a sea without charts
'Her was like
and wise men live the lives of pilgrims.
'The old lady's was like
shades of human emotion. Basic feelings can be (ii) 'The butler bowed and (heaved/shoved/passed/
taken as the starting point. Then the student must
donated) over the letter.'
find words within the range of that broad
six
More creatively, the student can be asked to
feeling, and use the six words in a sentence. For
write a short passage which deliberately aims at
example, ANGRY might turn up such words as: Wode-
humour by means of mixing registers. P.G.
raging/cross/peeved/annoyed/furious/fuming. JOY:
house is the master and model for this.
happy /glad/overjoyed/pleased/cheerful/delighted.
FEAR : terrified / afraid / dreading / apprehensive / d. More global tests of writing would include com-
anxious/nervous. LOVE: adore/like/be fond of/be position, more or less controlled, or a free response
attached to/be devoted to/be infatuated with. to a text, tape or film.
83
J P Boyle
3. Listening d. Oiler makes much of dictation as a reliable test-
a. Drama on tape/radio. After a play has been type. For straight dictation, probably such occa-
listened to, response of a general nature can be sional essays as J.B. Priestley's Delight, for in-
required. What was Or more
the play really about? stance,would be the most suitable.
specific answers could be demanded: Match each A more difficult test, of the Dicto-Comp type,
of the following with one of the characters in the would be to read a section from a modern play,
play — wily/forthright/exuberant/discreet/retiring. preferably a radio-play, and then to ask the student
A problem with this type of question, of course, is to fill in the missing dialogue, as well as can be
that words describing character tend to be subtle remembered. The following example may brief
in their nuances. However, this brings us back to serve as an illustration. This example deletes too
the type of exercise under Writing, 2a. much in too short a space, and is therefore more
difficult than a real test; it merely gives the idea of
b. Story. Again, after hearing a story, a global res- the test-type. comes from
It a prize-winning radio
ponse can be required: briefly retell the story, play by Jennifer Phillips, Daughters of Men. (vi)
making sure you include what seems to you to be
the main point. A more specific question would KATE: Age. Photos, of course you can have
be: Which of the following titles best suits the retouched.
story you have heard? This would be a multiple- ANNE: What?
choice question, in effect, with one of the answers KATE: Bahama had her photo taken every
more obviously defensible objectively. week.
ANNE: She told you?
c. Poetry. I have said very little on the use of KATE: No, Boy did. But he doesn't under-
poetry in testing the language ability of literature
stand. You can't retouch the image in the
students. It is of limited value, it seems to me, to mirror.
use a poem for structural questions, asking the
ANNE: But you have so much else to draw on,
student to put the poem into 'plain English'. This
Kate inner strengths.
can take the heart out of poetry, though in some KATE: And shall I be much freer.
cases, with extremely difficult poets (e.g. Hopkins),
ANNE: Oh, yes.
this exercise will be almost necessary for ESL
KATE: Without a child.
students. A more relevant type of test question
ANNE: But you won't be without.
will test the literature student's ability to appreciate
KATE: Best to be prepared, isn't it? On the
the way poetry charges words with different levels
defensive. I always went round before exams
of meaning. 'To read poetry adequately a student
at school saying I'd fail, didn't you? Saying I
must not only have a command of lexis in the
had to fail because I hadn't worked at it.
sense that he knows how to use a number of words;
ANNE : The thought of exams struck me dumb.
he must also know a number of possible uses for
KATE: Your driving test! Oh, I'll never forget
any given word.' A good example would be W.B.
that. And the lead up to it. And then I prac-
Yeats' The Song of the Old Mother.
tically carried you there.
English Language Tests, (vii) For testing appreci- true of language teachers that, by
you their tests,
ation of the meaning of a text, his recommend- shall know them. This article has done no more
ation of combining specific features with general than suggest a few test types which will show our
fluency seems sound. For students of literature, students that we do respect the relevance of liter-
the dimension of reproducing feeling or emotion ature and that we do consider educational values
in a text should also be tested, with some sort of in our teaching and our testing.
dramatic reading. This kind of test will, of course,
be affected by all sorts of extraneous factors, and
these can hardly be overcome.
library of suitable video plays can easily be built Plays oj 1978, Eyre Methuen.
85
Patricia L McEldowney
world relationship between the two types of infor- position in a preposition group, indicates that
mation device must be altered if a valid language perianth refers to an object. Then, the linkers this
test is to be developed. In the examples cited and of associate it with the word flower. Similarly,
above, the two stand side-by-side to supplement corolla and calyx are marked as nouns by the and
86
IMMHMMMM
Patricia L McEldowney
their relative sentence positions while the language shows, in addition, an awareness of the function of
item of indicates that corolla and calyx are parts them to refer back to colls.
of the perianth. In addition, the language item In this way then we may be able to find some
inside indicates the relative positions of the calyx evidence of grammatical skill. Can we be sure,
and the corolla. In this way, language information however, that the correct response necessarily
signals the type of referent of each content item demonstrates anything more than mechanical
and also indicates the relationships between them. manipulation?
We note here that to have known the inform- We note that it is quite possible that correct
ation about the perianth, calyx and corolla as responses are triggered by information in the
expressed in the extract would indicate the poss- question and a familiarity with a manipulative
ession of learning. Not to have known but to have technique rather than being a demonstration of
been able to find out in the way described above any real processing of the information in the text.
indicates the possession of a tool for learning. Seen For instance, test questions of the type illustrated
in this way, therefore, the ability to use language in A will be no problem to candidates who have
to discover content seems to be a very basic com- had classroom practice (oral or written) similar to:
prehension skill worthy of testing.
Answer the questions.
How might we go about developing such a test?
Example: Is the rope around the parcel?
In light of the discussion above this can probably
Yes, it's around the parcel.
most clearly be illustrated with regard to a text
Is the table in the corner?
like Globbes in which the content items are
Is the tree in the garden?
unknown to us.
the pals which tote the calyth. The jigs inside Are the two boys in the tree?
this are the tals toting the colnth. The four cats are in the basket.
The three pencils are in the box.
In an attempt to test candidates' skills with regard
to using their language knowledge to discover con- Such practice constantly pairs question and state-
tent, we might ask questions like: ment forms so that, given one form, the correct
response is to produce the other and it is not
Test Type A clear from a comprehension test incorporating the
(i) What are the four trug jigs of the globbe?
same principle whether a candidate is capable of
(ii) Where are the soils?
producing the appropriate answer if he does not
(Hi) What totes the colnth?
have the relevant information supplied in the ques-
It can be argued that success in finding the tion. That is, it is not clear to what extent he really
answers to such questions demonstrates some skill 'understands' the text.
in using language knowledge. For instance, in (i), a In an attempt, therefore, to ensure a demon-
solution word which provides noun shows aa stration of some processing of the information we
response of the correct sort to the question word might develop:
what; the provision of four nouns shows a response
Test Type B
to the code item four; the solution colls, soils, pals
(i) components of the globbe.
List the
and tals shows, in addition, an ability to match the
(ii) What surrounds the colls?
general sentence structure of the question and the
(Hi) Describe the construction of the colnth.
statement in the text. In («), the provision of a
preposition group shows a response of the approp- In these questions a rephrasing of the concepts ex-
riate type to the question word where-, the candi- pressed the text demands a greater 'under-
in
date who provides the group around them demon- standing' from the candidate. So, in (i), for in-
strates further an awareness of general sentence stance, the use of the synonym components for
structure; and the provision of around the colls jigs eliminates a direct clue from the question. The
87
Patricia L McEldowney
knowledge of the appropriate response to the verbal response. Many candidates may make
instruction List together with a response to grammatical mistakes — a further potentially sub-
the noun + s form (components), an awareness jective decision for the examiner. We can ignore
that the question being asked about the globbe
is such grammatical mistakes in our marking and so
and a knowledge that, where globbe occurs before go some way towards isolating the comprehension
are, the relevant information follows the verb skill. Even if we do this, however, we are not going
are all factors that will help a successful candidate far enough towards ensuring that poor productive
to produce colls, soils, pals and tals. Thus, without skills do not hinder the demonstration of compre-
aknowledge, of the word components, a candidate hension. Though certain candidates may 'under-
who produces the appropriate response is more stand' the text, their productive skills may be too
likely demonstrating a spontaneous use of his lan- weak to enable them to demonstrate even a small
guage knowledge than was the case in Test Type A. proportion of their understanding.
We note, however, that if a candidate had It can be seen, therefore, that the rephrased
known the meaning of components and if he had question type of B should be made objective. A
known that it is a synonym for jigs, he would have common solution is that demonstrated below:
for finding content, the more content items a (i) The main parts of the globbe are the:
reader or listener has at his command, the more
wongs, soils, polnth, jigs Q
efficient is his comprehension. For instance, if in
the first sentence of the Globbes text we know colls, soils, pals, tals [_
our comprehension task would have been much (ii) In the globbe the sols surround the:
easier. That is, in real-life the efficient reader uses
tals
a combination of language information and known
content to discover unknown content. It would colls
seem, therefore, that a clue in the question of the
polnth [
type illustrated by components in B(i) and
surrounds in B(ii) is justifiable in a way that the calyth [
There are sometimes two wongs in the polnth. Though we have, in this way, allowed for object-
The outside one is the colnth. It is toted by ivity, Test Type C embodies another problem also
the tals. inherent in A and B. All three types of questioning
This, however, highlights a difficulty for the are fragmentary.
examiner that is inherent, to a lesser degree, in all On the whole, we read and listen for two main
of the other items illustrated in A and B. In B(iii), reasons. At times we wish to follow exactly what
though we might agree on the relevant number of is being said. On other occasions we wish to find
points, different candidates will express them in information that is incidental to the speaker or
very different ways and examiners' responses to writer's purpose. In the latter case we might, for
these will also be very different. This situation is instance, skim through an outline of the events
likely to demand subjective judgements from indi- leading up to the sinking of Bismarck in World War
vidual examiners as to whether responses are II merely to find the names of the ships involved,
correct or not. Moreover, the questions demand a ignoring the sequence of events. In this case the
88
Patricia L McEldowney
skill of isolating fragmentary detail seems to be of In Test Type C items (i) and (Hi) emphasise the
relevance and it may well be that Test Type C is firstconcept while (ii) deals with spatial arrange-
valid from this point of view. ment. The same is true of A and B(i) and (ii). We
It seems clear, however, that we also need to note in B(iii), however, there is an attempt to
test whether candidates can follow a writer or broaden the question so that it covers both of the
speaker's intent. In this case, we require a demon- author's purposes. The constraints of objectivity
stration of an awareness of the whole — some already discussed, however, make it very difficult
demonstration of how individual parts fit together. to construct global questions in the genre illus-
Test Type D
(i) Use the words in the box to complete the
diagram .
globbe
Inside Outside
calyth, colls
colnth, pals, polnth
soils, tals
89
Patricia L McEldowney
(ii) b. Find suitable words for the circles of information about coracles, the production of
labelled A, B,C and D in the diagram. making one, or a description of its
instructions for
Use words from the box above to com- appearance, or a classification of various types, or
plete the following. Write X if there is a narration of how one was used on a particular
no suitable word. occasion will each require use of a cluster of differ-
ent language items and a different organisation of
Circle A= the content information (see McEldowney, P.,
Circle B =
Test in English (Overseas) The position after ten
Circle C =
years, Joint Matriculation Board, OP 36, September
Circle D =
1976).
Circles A+ B = If we wish to assess the tools of such expression,
Circles B + C = it is important, as indicated above in the discussion
Circles C + D =
about the testing of comprehension, that we isolate
We note, at this point, that the display for (j) is
the thing we wish to test in such a way that our
more abstract than that for (ii). We could offer a impression of linguistic performance is not blurred
third alternative which would be even less abstract. by any extraneous factors.
We could provide a drawing of a flower and ask Let us consider how this might be carried out.
candidates to label the parts. Test Type E
In this way, we can test an awareness of the (f) Describe how to make a coracle,
whole as well as that of spatial arrangement at (n) Describe what a coracle looks like.
whatever level of abstractness seems appropriate to
our particular candidates. Moreover, Test Type D Items like though directed towards a specific
this,
does this while still maintaining the criteria of the purpose, demand
a prior knowledge of coracles.
elimination of overt clues from the question, of If we intend to test such knowledge, then these
objectivity and of the elimination of verbal pro- items might well be valid — perhaps in a local his-
duction. tory or general studies paper. If, however, we
We now note a further important advantage of intend to test language proficiency, a candidate
Test Type D. With appropriate language skill it is who has no knowledge of coracles has nothing to
possible produce correct verbal responses to
to write about and so cannot demonstrate his pro-
questions like those illustrated inC without there ductive skills.
being any assurance of a real-world knowledge of Does this mean our choice of topic is at fault?
the forms being used. For instance, we might read Can we, choose topics previously prepared
rather,
or topics known to be within the experience of
John drew a rectangle, coloured it green and our candidates? In either case we are asking candi-
added a cross flunger the bottom trig corner.
dates to depend on memory of content and cannot,
and respond correctly with flunger the bottom in fact, be sure that a lucky 'question-spotter' has
trig corner to the question not learned an essay or speech off by heart. We
Where did John draw the cross? have not, in fact, isolated the ability to use pro-
We would, however, be at a loss if asked to draw duction tools from some closely associated assess-
John's diagram. ment of content knowledge.
The move from verbal to non-verbal information It would seem, from this point of view, that, as
illustrated in TestType D thus gives the candidate is implicit in the discussion of comprehension
the opportunity to demonstrate testing above, the choice of a topic which is largely
his ability to
'visualise' the relevant spatial relationship and so unfamiliar to our candidates might well provide us
indicate a degree of real-world or content under- with a better means of isolating the language tools
standing. we wish to test. This suggests that our test item
needs to provide the basic information to be used
Production in the production task.
If comprehension can be defined as a means of We might do this by providing a text and asking
using language and known content to discover new questions of the type illustrated above in B(iii), or
content either for one's own purpose or to mirror by asking candidates to write a precis of the text.
the author's purpose, production, in both the Such tasks, however, place too great a reliance on
spoken and written modes, can be defined as the comprehension skills and are no more valid, there-
skill of using language and content information to fore, than Test Type E in isolating production
fulfil a specific purpose. For instance, given a body skills. Moreover, they allow for the (verbatim)
90
Patricia L McEldowney
copying of stretches of the original, and the organ- or (Hi) supply a sequence of pictures outlining a
isation of the original more often than not provides story with the rubric:
the framework of organisation for the candidate to Tell the story of Old Joe and his coracle
91
Arthur Godman
material which is extended to include communi- irrespective of the observer, hence the question
cation in academic subjects. Final exemplification requires an answer describing an accepted method.
will be given from science subjects, as, in these Answers 1 b,, c, d., e. show that this has been
subjects, the content of a sentence must conform understood.
with known concepts in relation.
1.3 Cohesion
1.1 Lexis Cohesion in a text necessitates several factors,
An academic subject has a linguistic register ex- mainly expressed in grammatical functions and by
pressed in grammar and a restricted vocabulary. A the use of adjuncts. As this paper puts forward a
sound knowledge of the lexis is the first requisite tentative suggestion for the analysis of sentences
of competence in an academic subject. Each aca- and not for the analysis of discourse, the subject
demic subject has its own lexis, and the same term is not discussed further. The evaluation of com-
can have different denotations in different subjects. petence in the construction of sentences must be
It is thus important to ascertain whether a student determined before discourse can be analysed.
associates the correct concept with a particular
term when he meets it in a specific situation. In 2.1 The question
para. 5.2, question 1 contains the term 'blood In most tests a question is presented to a student,
pressure'. The term 'pressure' varies with the sub- who then supplies an answer. The first stage of this
ject under discussion. In general speech it describes process involves the student understanding the
the concept of a force applied by means of an area, question, and this, too, has to be evaluated. The
e.g. as in a trouser press. In politics student is required to understand the lexis and the
it describes,
these days, the lobbying of politicians and acts implication of the grammar. These 'two interact to
which endeavour to influence them, but the produce a semantic content. This interaction can
meaning is diffuse. In science it describes force per be illustrated by question 1 in para. 5.2. The key
unit area, and force per unit area can be expressed lexemes are obviously 'measure' 'blood' 'pressure'
in millimetres of mercury. When confronted with with 'blood' and 'pressure' interacting. In several
question 1, how will the student react to the term Oriental languages, the sentence would be reduced
'pressure'? In order to be competent in science, for
to 'How measure blood pressure?' and the answer
example, he must know the precise definition. reduced to 'Doctor measure blood pressure', indi-
cating specialist knowledge is required and shifting
1.2 Grammar responsibility on to an acceptable observer. In
Competence in language depends on the under- English, the answer would be, baldly stated, 'By a
standing of the semantic implications of grammar. sphygmomanometer' (SPHYGMO — pulse; MANO-
Firstly, an understanding of the morphological METER — a pressure measurer). This, too, really
processes words can undergo is necessary. Secondly, evades the question, but would be an acceptable
the syntax of clause and sentence must be under- answer. on the other hand, the question was
If,
The student must first marshal his concepts and and hence different methods of scoring the evalu-
choose suitable terms for such concepts. He must ation could be produced. If competence in language
then select the necessary interrelationships between is to be evaluated, then it is first necessary to
the terms and choose suitable syntax and morpho- ascertain the knowledge of scientific
student's
logical structures to express the interrelationship. concepts in the semantic area to be examined. For
The reader of the answer, for perfect communi- example, in question 1 of para. 5.2, does the
cation, should have complete congruity of con- student have any knowledge about blood pressure
cepts with the writer. The sum total of the answer and its measurement? Such knowledge can be pre-
includes not only the lexis and the individual items tested by simple recall, using objective-type
of grammar, but also their interaction. The whole questions. This reduces the possibility of incorrect
of the answer is thus greater than the sum of the lexis through ignorance if the subject is shown to
parts, and this provides a semantic content which be known to the student.
is in addition to lexis and grammar.
4.2 Subjective evaluation
3.1 Difficulties encountered in evaluation Evaluation of language competence needs quantit-
Objective-type questions produce an objective ativemeasurement. Looking at the range of answers
score; they test passive as well as active vocabulary, in para. 5.2, a quantitative evaluationwould seem
but they do not test the student's ability to express extremely difficult. Yet there is a parallel in the
himself. Structured questions can be used to test evaluation of oral English in which a subjective
active vocabulary alone, and also to test sentence rating is made of different qualities. The factors to
structure. Structured questions do involve a degree be measured competence in written
in evaluating
of subjectivity in scoring. Essay-type questions language are (a) lexis, (b) syntax, (c) morphology
always test all stages of sentence composition, but and (d) semantic content. Factors (b) and (c) com-
are highly for scoring. Cloze tests
subjective prise grammar, (a) and (d) comprise the content of
examine and syntax to some degree, but tend
lexis communication. Competence in content is more
to test passive knowledge rather than active appli- important than competence in grammar as the
cation. following two sentences illustrate.
para. 5.2 indicates the degrees of success that can (A) the student knows the correct state of affairs
be obtained in eliciting a positive response. in relation to the question. In (B) the student has
no idea of the state of affairs in relation to the
4.1 Evaluation question.
The concept in an answer must be capable of being On the basis of the facts outlined so far, it is
marked objectively, and then the interrelationships suggested that 60% of a score should be given to
93
Arthur Godman
content and 40% to grammar. The mark for c. The blood pressure of a man measured is by
grammar can be split equally, 20%-20% for each using the checking of the diabetes disease
of morphology and syntax. The mark for content instruments.
has to be split between lexis and semantic com-
d. The blood pressure of a man measure by tieing
petence in indicating interrelationships. Three
a place of cloth like thing to the person arm
schemes are suggested for experiment. Lexis is
and plump it where it is related to a therm-
tie
considered for focal lexemes in a sentence. Syntax
atore in the other box where it is show.
is considered for logical order, and the use of pre-
Scheme A maximises content, perhaps more suit- b. Human urine contain large of salts this make
able for evaluation of an academic subject. Scheme the plant death.
C minimises content, perhaps more suitable for
c. When a human being eating the plant (veget-
evaluating language competence.
ables) with human urine that person will easily
be injected by a disease.
5.1
The questions and answers in para. 5.2 have been d. Because harmful germs are presented in urine,
selected to show average performance by overseas and it has not been properly washed before
students. With a mark out of 2, 3 or 4, and elimin- cooking it may transmit disease.
ating half-marks, a simple subjective scale is formed
e. It may be dangerous because if that person may
for each factor. By examining ten sentences, a sub-
have any infectious disease.
jective pattern of a student's performance can be
ascertained. By examining the average score for a / Because in human urine it consists of mineral
salts.
class, each factor can be evaluated to see which of
them is weak or strong in sentence structure. Eval- Q.3 Why does Table 2 (No. female mosquitoes
uations of the questions in para. 5.2 are given in trapped/time of day) give results for female
para. 6.2. mosquitoes only?
(Note-, diseases are spread only by female
mosquitoes.)
5.2 Questions and answer in science examinations
Answers:
Q. 1 How is the blood pressure of a man a. Female gives more mosquitoes and more son.
measured?
Answers: b. Because the mosquito landed on human beings
a. The blood pressure of a man measured is cooler to take their meals.
than the woman's blood. Because the blood in Q.4 Why does a farmer use a nitrogen fertilizer
the man body is very little than the woman's for the cereal crop under these conditions?
blood. Answers.-
a. Because they need fertile soil to growth.
b. The blood pressure of a man measured is by a
special of measurement which is put along b. Can grow better after long time, the long the
muscle of our arm. time is the more the crop grow.
94
Arthur God man
c. To enable the cereal crop for consumption. Q. 10 Why
does a female mosquito need a blood
3 meal from a human being or other mammal?
Q.5 Why did the water not rise to the 600cm
Answer:
mark?
A female mosquito rely human being for blood
{Note 25. cm 3 of air in 100 cm
3
of soil added
for reproduction.
to 500 cm water in a measuring cylinder)
Answers: The candidates answering these questions were
a. Because the water fill up the soil hole. overseas students and had been exposed to eleven
years' teaching of English language.
b. Because the soil completely press due to the
water which are shake.
6.1
Q. 6 What would be a suitable precaution to be
In para. 4.1 it was stated that the students' know-
taken by most people to avoid being bitten
ledge of related concepts should be tested by
by this species of mosquito?
objective-type tests. The following test questions
(Note-, this species bites in the middle of the
illustrate how this is envisaged.
night.)
Answers:
Q.l In what units is the blood pressure of a man
a. Throw unwanted can which contain water.
measured?
b. Most people would use mosquito nets before a. calories
Q.7 Explain why the information given by the This objective-type question tests the background
graph and histogram agrees with a suggestion knowledge required for question 1 in para. 5.2.
that female mosquitoes lay their eggs before The answers from such a question would show
seeking a blood meal. that answers la. and/, in para. 5.2 could be anti-
Answer: cipated, as these students would have chosen
Because the time lowest number of mosquitoes answer b. above.
are trapped has the highest number of eggs laid
and verse-visa. Q.2 What danger is there in eating unwashed
vegetables?
Q.8 What is the effect of a nitrogen fertilizer on a. The disease of scurvy may be spread
the root crop under these conditions? b. The high salt content can increase blood pressure
(Note: progressive use decreased crop yield.) c. Intestinal diseases may be spread
Answers: d. The dirt on the vegetables can cause gangrene.
a. The root crop does not take nitrogen fertilizer The choice of answer would show whether pupils
and after six years it has drops to —10%. are aware of the method of transmission of intes-
b. The root crops under a nitrogen fertilizer will tinal diseases, knowledge necessary to formulate a
Question 1 Question 2
Lexis 2.0 2.5
Syntactical order 1.0 1.2
Morphology 1.1 1.0
Semantic content 0.4 0.7
Total 4.5 5.4
Maximum 10 10
95
Arthur God man
The small number of items does not permit any speakers of English. The questions must be culture-
useful analysis to be made. With more students, free, which eliminates English literature, partic-
each contributing ten sentences, a qualitative ularly for non-Indo-European students. For native
evaluation of basic weaknesses in the four areas English speakers, a wide range of academic sub-
under investigation could be made for the entire jects would be most suitable, as this follows the
set of students. This information would be in advice of Halliday et al. (*), in which he stated that
addition to the evaluation of the efforts of indi- the teaching of English should not be limited to an
vidual students. exclusively arts subject.
6.3
Any academic subject could be used for the pur-
pose of evaluation of language competence. Cer-
(0 Halliday, M.A.K., Mcintosh, Angus, Strevens, Peter,
tain restrictions arise, however, in the evaluation The Linguistic Sciences and Language Teaching,
of language for overseas students or non-native Longmans, London, 1964.
96
AEG Pilliner
Evaluation
To evaluate is to make a judgement of the worth There is no doubt that for many educators and
or value of something. The dictionary definition is researchers the task of evaluating educational pro-
useful in high-lighting the subjective nature of the grammes is associated mainly with the construction
evaluative process. Different evaluators will not and administration of achievement tests. It goes
necessarily arrive at similar judgements of the same without saying that the assessment of student
educational programme. One may endorse a achievement or progress is an essential component
mathematical syllabus because it produces high of the evaluation process and that well-constructed
levels of concept mastery. Another may retort that achievement tests can contribute importantly to
children who have been exposed to it still cannot such evaluation. In this context, evaluators need
add, subtract, multiply or divide. A foreign lan- to consider which of the two test styles, norm-
guage course may be commended for the com- referenced or criterion-referenced, is the more
mand of vocabulary and control of structures it suitable for their purpose. Unless that purpose is
offers to students. It may be open to criticism if it to rank students (which is scarcely an educational
affords them few opportunities to develop com- objective), the evaluator will normally opt for
municative skills. Much depends on the values the criterion-referencing procedures.
evaluator brings to bear in arriving at his/her judge- To repeat, assessment of achievement is an
ments. important element in the evaluative process. But
We begin by asking three questions. What are it is only one such element. Others may have an
we to evaluate? How do we set about it? When do equal claim to importance: attitude scales, ques-
we do it? What we are to evaluate — to judge the tionnaires, probes of opinions of students, parents,
value or worth of —
an educational programme
is teachers, communities; and explorations of other
or project defined, with Astin and Panos, as 'Any some of these several
non-cognitive aspects. All or
ongoing educational activity which is designed to modes of obtaining information may need to be
produce specified changes in the behaviour of the deployed in the global task of evaluation. Other-
individuals exposed to it', (i) wise, the danger exists of painting an incomplete
Traditionally, educational evaluation has been or even a distorted picture if the search for inform-
identified with curriculum evaluation. The defin- ation is restricted to those aspects of the edu-
itionproposed above is both broader and narrower. cational programme which are more readily meas-
Examples of educational programmes are: a single urable at the expense of those less so. In technical
classroom lesson; a visit to a museum or factory; a terms, validity may be sacrificed to reliability.
particular method of instruction; the content of a Stake makes the point cogently: 'It is a great mis-
particular text-book; a remedial programme; the fortune that the best-trained evaluators have been
environment in which learning occurs; the study of looking at education with a microscope, rather
parental attitudes to the education of their chil- than with a panoramic viewfinder'. (ii)
dren; the re-organisation of a school system. The third question was concerned with when
Clearly, a massive programme such as the last evaluation should take place. In an important
mentioned above will comprise a whole range seminal article Scriven (Hi) has distinguished be-
of smaller and different programmes or sub- tween evaluation occurring during the educational
programmes, designed to modify people's be- programmes — he calls it formative evaluation —
haviour in different ways; and since the programmes and evaluation deferred until its conclusion — sum-
are different, the methods used to evaluate them mative evaluation. Broadly speaking, the distinction
will also be different. he makes is between 'How are we doing?' and
This brings us to our second question. How do 'How did we do?' More specifically, formative
we evaluate? We start from the premise that evaluation refers to data emerging on taking stock
evaluation involves the collection of information at some intermediate stage, leading probably to
about the impact of the educational programme. slight modification or possibly even to substantial
How should we collect this information? What design of subsequent procedures. Summative
tools are available?Which are appropriate in evalu- evaluation, on the other hand, refers to an evalu-
ating which programmes or sub-programmes? ation of the effectiveness or success of the pro-
97
AEG Pilliner
gramme as a whole after it has been completed. assess the extent towhich these objectives have in
Particularly with extensive programmes, both fact been accomplished. Third, decision-making
formative and summative styles are essential. It should occur either during (formative) or at the
would be unrealistic to suppose that no change conclusion (summative) of the programme.
need ever be made from initial plans. The pro- Let us look a little more closely at what is in-
gramme would be pointless if no-one were con- volved here. In regard to the first point, the planner
cerned to establish its overall and final effective- assumes an implicit causal realationship between
ness. the stated objectives and the means proposed to
The roles of the formative and summative promote them. In regard to the second point, the
evaluator are in strong contrast. Though both are evaluator assumes that the tools used in assessment
concerned in making judgements, their standpoints are valid indicators of the extent to which this
are very different. The essential thing in formative causal relationship exists and the objectives are
evaluation is close cooperation between evaluator achieved.
and programme developer, interplay and involve- In practice, neither of these assumptions is
ment in smoothing out difficulties as they occur .necessarily valid. On the planning side, the pro-
and in maintaining momentum. The essential thing vision of a computer
in every classroom will not
in summative evaluation is total independence on necessarily lead to anenhanced grasp of mathema-
the part of the evaluator, disinterest and uninvolve- tical concepts or better mathematics learning in
ment and commitment only to dispassionate general — though it may do so. The installation of
analysis and reporting. a well-equipped language laboratory may or may
Let us sum up so far. Evaluation was defined as not lead to an improvement in students' language
judging worth. We have discussed what is to be performance. Again, it is part of the folk-lore that
evaluated —
an educational programme defined as a more favourable staff-student ratio will improve
an on-going activity designed to modify people's the quality of school education. It may, or it may
(but by no means all) of the tools available for as a result of applying achievement tests alone,
evaluating a programme. We have drawn a distinc- that education has failed to benefit from the pro-
tion between formative and summative evaluation, vision of a new area school when there are in fact
the first occurring during the operation of the pro- handsome dividends in the way of improved
gramme, the second at its conclusion. relationships between community and school staffs
So much for 'what', 'how' and 'when'. There which other assessment techniques might have
remains the question "Why evaluate?' brought to light.
The fundamental purpose of evaluation is to Let us summarise again. An evaluation procedure
produce information and use it to make decisions has three aspects: an educational programme in
about an educational programme. The operative which there is an assumed causal relationship be-
word is decisions, stressed here in order to bring tween the stated objectives and the means proposed
out the distinction between, on the one hand, edu- to achieve them; an accumulation of relevant
cational evaluation used in making decisions which information about the extent to which the object-
may directly affect the futures of many people, ives are achieved by these means; and the use of
and, on the other hand, educational experiment- this information to reach a decision about how
ations aimed at extending the boundaries of know- best to operate the programme in the future.
ledge but without special regard to its immediate An educational programme comprises three
practical utility. components which for evaluation purposes it is
Evaluation, then, is about decision-making. A useful to keep conceptually distinct. These are:
decision might be to continue an existing pro- inputs, process and outputs.
gramme, to terminate it, or perhaps to modify it. Inputs, sometimes called antecedents, include
Or it might be to develop a new programme with a the talents, skills and other potentials for growth
view to possible adoption. and learning that the students bring with them to
In principle, the process of decision-making the educational programme. They also include the
should go something like this. First, the programme characteristics of the students' familes and of the
planners should specify some educational objective culture in which they live. The child who comes
or set of objectives and in due course devise and from a family or culture which values educational
implement some means of accomplishing these achievement is more likely to benefit from school
objectives. Second, the evaluator should bring to than one less fortunate in this respect. Regional
bear whatever tools are deemed appropriate to or cultural differences in input may give rise to
98
AEG Pilliner
quite different outcomes even with the same pro- Also to be taken into account are unintended
gramme. outcomes or 'side-effects'. For instance, loss of
Process, sometimes called operations, includes identity with family or community is perhaps too
those characteristics of the educational programme high a price to pay for high academic achievement.
itself which affect, or could affect, the outcome. Again, class grouping by ability, while enabling the
Process includes curricula, experimental treat- brighter children to achieve their potential may
ments, learning strategies, instructional techniques, discourage those in lower groups to the extent that
teacher styles, educational interventions, environ- their performance is uncharacteristically poor. On
mental experiences — in short, the whole range of the other hand, mixed ability grouping on egali-
environmental variables that characterise the edu- tarian grounds may hold back the brighter child.
cational programme — the means by which the The conclusions drawn from an evaluative study
educational ends are to be achieved. may be incomplete or even misleading unless the
Outputs are the ends or objectives of the pro- possibility of such unintended outcomes is taken
gramme, otherwise referred to as criteria, out- into account.
comes, goals, achievements or dependent variables. To summarise once more: an educational pro-
They are sometimes expressed at a high level of gramme has three components. First, an input: a
abstraction (for example, the development of condition existing at the start, the status of the
critical thinking). The trouble with such outcomes, student — his/her aptitude, previous experience,
desirable though they are, is the practical difficulty interest, willingness. Second, process-, encounters
of assessing the extent to which they are achieved. of student with teacher, student with student,
Evaluation is be more efficient if the out-
likely to student with environment, the succession of
comes are capable of more specific statement — engagements which the educational process com-
pupil achievement, knowledge, skills, attitudes, prises. Third, outputs: student achievements, atti-
aptitude for future learning, inter-personal relation- tudes, aspirations, resulting from the educational
ships. Such outcomes are more readily assessed experience: the consequences of education,
using currently available instruments — achieve- immediate and long-range, cognitive and effective,
ment tests, attitude scales, questionnaires, inter- personal and community wide.
views and the like. Analysing the programme in this way helps
These are strictly pupil-oriented outcomes, evaluators to pay due regard to each of these three
needing little justification. But there are other out- components. They have a dual role to play: first,
comes best described as intermediate: a reduction through accumulating information about the pro-
in operational cost, recruitment of highly qualified gramme they must provide a full
in all its aspects,
staff. These tend, only too become
easily, to description of and secondly, on the basis of this
it;
regarded as ends in themselves. There are two description, they must arrive at judgements on the
reasons for this. First, they are more readily speci- programme in order to reach decisions about it:
fied. Second, their achievement is more easily whether to recommend its continuation, modi-
measured. It iseasier to demonstrate a per pupil fication or abandonment.
reduction in expenditure than to monitor the Robert Stake (ii) has proposed a model which
possibly unfavourable consequences for the pupils. brings together all of these aspects of evaluation.
Administrators may proudly announce
an increase The diagram shows a layout of statements and data
in the proportion of graduate teachers. They have to be completed by the evaluator.
yet to show that pupils' development has improved
in consequence.
1 4 Inputs 7 10
2 5 Process 8 11
3 6 Outputs 9 12
99
AEG Pillmer
Inputs, Process and Outputs, the components of providing answers to the questions originally asked?
the programme already discussed, have their place In summary, the descriptive aspects of the eval-
in both matrices, Description and Judgement. The uator's task are: First, to assess the extent to
Description matrix is further divided into Intents which the educational programme as analysed in
and Observations; and the Judgement matrix into the Intents column reflects the basic educational
Standards and Judgements. Each matrix thus con- purpose stated in the Objectives box.
tains six cells. Second, to assess the extent to which the three
The first column in the Description matrix is a Intents aspects are logically connected — the extent
declaration of the educational programmer's to which the intended programme makes sense.
intent, a statement of the programme as originally Third, to describe the extent to which intended
planned so as to achieve the global objectives inputs, process and outputs correspond to what
specified in the box on the left. Cell 1, input, des- actually happened.
cribes the students to be included, their number We now turn to the Judgement matrix.
and distribution, their prior achievements, their First, the Standards column. Its purpose is to
backgrounds, their environments and any other indicate acceptable levels or standards for inputs,
information about them he/she considers should process and outputs. What is acceptable is partly a
be seen as Input. Cell 2, process, indicates the matter of experience and partly one of judgement.
processes he/she intends to operate with these The declaration of intent in Cells 1-3 of the Des-
students: the special teaching he/she hopes they cription matrix is translated, in this Standards
will receive, the new equipment, materials and column, into a statement of what the evaluator is
text-books he/she hopes will be available to assist prepared to accept. However carefully planned the
this special teaching: in short, the whole range of original programme may have been, it is unlikely
processes he/she hopes to engage the students in that all contingencies will have been foreseen and
so as to achieve the output hopefully specified in that no problems will be encountered in practice.
Cell 3. The Standards column is a statement of the ex-
The chief concern of the evaluator with the tent to which the evaluator is prepared to settle
Intents column will be the logical relationships for less than perfection —
always provided that the
vertically displayed in this column. Are the success of the programme is not materially preju-
intended processes specified in Cell 2 logical in diced by this degree of tolerance. In the inputs
the light of the intended input in Cell 1? That is, cell (7), a limited departure from the complete
are the lessons, learning experiences, equipment, randomisation of student input envisaged in the
etc. specified in Cell 2 appropriate for the students corresponding cell in the intents column (1) may
described in Cell 1? Moreover, is it logical to not be disastrous. In the process cell (8) a slight
expect the outputs specified in Cell 3 if the oper- fallbelow the specified teacher student ratio (2)
ations listed in Cell 2 are conducted with the may be tolerated. In the output cell (9) 85 per
students described in Cell 1? cent of students achieving mastery in a criterion-
Still in the Description matrix we move from referenced achievement test instead of the 90 per
the hopeful statements of the Intents column to cent hopefully specified in the intents column (3)
the harsh realities of the Observations column, is not to be despised. In short, the standards
which is a statement of what, in the event, actually column is a realistic statement from the evaluator
happened. Horizontal comparison of Cells 4, 5 and of the several criteria by which the educational
6 with the corresponding Cells 1, 2 and 3 will indi- programme's success or failure is to be judged.
cate the extent to which original intentions were Finally, Judgements in the last column are
or were not achieved in input, process and output. based on the degree of matching between Observ-
Cells 4 to 6 should indicate not only the extent of ation entries in the Description matrix and Stan-
these short-falls but also the modifications and dards entries in the Judgements matrix. The pro-
adaptations of the original plan these short-falls cedure is first, to compare, then, to judge. To what
made necessary. extent does the actual course of events, as recorded
making these horizontal comparisons, Cells 1
In in the Observation column cells, measure up to the
and 4, 2 and 5, 3 and 6, the evaluator should have criteria supplied by the corresponding Standards
in mind these questions. How far adrift is actuality cells? Against all the odds, maybe, some, though
from intention? How different from those origin- not all, community have
of the parents in a rural
allyintended are the inputs, processes and outputs been persuaded that their daughters would benefit
that actually occurred? Has the programme been from formal education. A hostel is built to accom-
so materially altered as to be no longer capable of modate women teachers. Potential success or cer-
100
am —^t^—^^^mm~m
AEG Pil liner
tain failure of this educational programme will marily descriptive, primarily judgemental, or both?
depend on whether women teachers can be per- Is it to emphasise input conditions, processes or
suaded to live and work within the community. outputs alone, or a combination of all three, and
The word 'criteria' has just been used in the their logical connections? Is it to be concerned
context of comparisons between Observations and with the degree of correspondence between what
Standards. We are becoming increasingly familiar is intended and what occurs? In seeking answers to
with the notion of criterion-referenced testing and questions such as these, the evaluator may hope to
the underlying concepts. It is suggested that an keep all his options in mind and to establish prior-
extension of the notion of criterion-referencing be ities among them.
made to the present wider context. It may be help-
ful to think of the comparison between corres-
ponding cells in the Observations and Standards References
columns as criterion-referenced. (0 Astin, A.W., and Panos, R.J., 'The Evaluation of
Educational Programmes'. In Educational Measure-
It is not to be expected that every evaluation
ment, (Ed. Thorndyke, R.L.), American Council of
plan will take account of every aspect of the Stake Education, Washington D.C., 1971, pp. 733-751.
model. The point is that this analysis indicates («) Stake, R.E., The Countenance of Educational Evalu-
twelve sub-areas within which and among which ation, Teachers College Record, 1967, Vol. 68,
pp. 523-40.
evaluation can take place. Emphasis will vary from
(Hi) Scriven, M., 'The Methodology of Evaluation'. In
one educational programme to another. The evalu- Perspectives of Curriculum Evaluation: AERA
ator must clarify his responsibility by answering monograph series on curriculum evaluation,
questions such as these: is the evaluation to be pri- Chicago, Rang-McNally, 1967, pp. 39-83.
101
Frank Chaplen
in Kuwait University. In our teaching situation, the (b) tests and examinations.
yearly intake of each group of students is divided For the first 15-week course in our programme
into 5 by different teachers, each
classes taught the weighting of the different components is as
class following the same weekly teaching/study follows:
programme as the others, and each class taking the
same tests and examinations as the others. Thus, Test 1 (after 30 class hours) 10%
each semester in the 4-semester premedical and Mid-semester Exam (after 70
paramedical English programme each premedical class hours) 20%
student and each paramedical student is assessed Test 2 (after 110 class hours) 20%
on virtually the same scale of achievement as every Final Examination (after 150
other premedical and paramedical student. To class hours) 40%
achieve this requires a rather more complex and Teacher's assessment 10%
systematized approach to evaluation than is
100%
necessary in some teaching situations. Nevertheless,
our experience should be of interest to any teacher The difference in the weighting of Test 1 (10%)
or administrator who is responsible for developing and the Final Examination (40%) is intended to
assessment procedures. take account of two facts. First, the students come
straight from secondary school, so few of them
know what is expected of them at university for at
2. Setting Target Dates for Tests and
least the first several weeks. Second, the compara-
Examinations
tively heavy weighting of the Final enables slow
For several reasons the dates for examinations
starters to compensate for low achievement earlier
need to be fixed far in advance. Teachers prefer
in the course.
this, students demandand administrators can
it,
The teacher's assessment is intended to provide
be extremely unsympathetic if you try to give
students with some incentive for working con-
them only one week's warning of the fact that you
tinuously both in and out of class throughout the
need a large examination room with film projection
course. It also provides the teacher with an oppor-
or video facilities from 8.00 to 10.00 a.m. In our
tunity to evaluate elements of the course which it
experience, a host of problems can be alleviated if
is difficult to measure in a formal test or exam,
a list of target dates and responsibilities such as
e.g. oral communication. In earlier years, we gave
that in Table 2.1, is routinely prepared at the
a weighting of 20% to the teacher's assessment,
beginning of each course. It reduces arguments
but this tended to have an adverse effect: the
later if this is prepared during a meeting of all the
weaker students copied their homework assign-
teachers involved in the course.
ments from those written by the more proficient
students (usually in other classes so that a direct
3. Weighting the different parts of the assessment check by the teacher was impossible). Since this
component defeated the purpose of the teacher's assessment
The term 'assessment component' is intended to element, the weighting was dropped to 10%. This
encompass all the forms of assessment on the basis seems sufficient to persuade the weaker students
of which a student's final grade for a course is to work reasonably consistently while not being
decided. In our teaching situation, these include enough to encourage them to go to the trouble of
the following: copying the work of more proficient students.
102
Frank Chaplen
In the second, third and fourth courses, the Consequently, when the students' total marks for
teacher's assessment element is increased to 20% a course have been calculated (column G in Table
because by this time all but a few students recog- 3.2), these have to be converted to letter grades,
nise that they will make little progress except grades.
through their own unaided efforts. When only one teacher is involved in deciding
The number of marks for each element in the which students should receive A's, which should
assessment component will vary, of course. The receive B's, etc., the conversion of marks to letter
maximum mark for Test 1 might be 115, that for grades is a relatively painless process. However,
Test and that for the Final, 135. Therefore,
2, 95, when 4 or more teachers are involved, each res-
marks obtained
to obtain the desired weighting, the ponsible for 15 or more of the total number of
by students in each element must be converted. students, decisions are far more difficult to make.
Let us assume that the assessment component Most teachers identify very strongly with their
of a course consists of the 3 elements listed in students, and would like to see the majority receive
Table 3.1, and that their maximum marks and high grades; but this is not always possible, partic-
desired weightings are as entered in columns 2 and ularly if the students are assigned to classes on the
3. (see page 105). basis of placement test results in order to produce
To convert Testmarks to the required weighting
1 fairly homogeneous groups. The procedure that we
(20%), multiply each student's mark by 20, then have evolved over the past 5 years to decide the
divide the result by 115. To convert Test 2 marks, students' final grades seems to satisfy both teachers
multiply by 30, then divide by 95. To convert and students.
Final Exam marks, multiply by 50, then divide by The first step in this procedure is to prepare a
135. marks for each class (columns
distribution of final
The task of calculating these conversions (i) is 1, 2, 3 and 4 in Table 4.1), and for the entire
considerably lightened if a class mark grid like the intake (column 5). Column 6 contains the cumu-
one in Table 3.2 on page 105 is constructed at the lative frequency of final marks, e.g. 43 students
beginning of each course. This grid also simplifies scored 73% or above, 56 scored 61% or above.
record keeping, and makes it relatively easy for a Column 7 contains
cumulative frequency
the
second person to check each teacher's calculations. percentage of the final marks, e.g. 66.2% of the 65
A cheap electronic calculator is an essential tool in students scored 73% or above, 86.2% scored 61%
these operations. or above. Column 7 provides a convenient check
in later years on the comparative standards of
4. Deciding a student's final grade for a course
successive intakes of students.
End-of-course results are rarely expressed as per-
Note that columns 1, 2, 3 and 4 contain tallies.
centages or raw marks because these are not very
These are entered on the distribution sheet in the
meaningful. A mark of 80%, for example, may rep-
following manner: one person reads out the final
resent an outstanding achievement in one course,
marks from each class mark grid (Table 3.2,
but only an average achievement in another. For
column G), a second person makes a tally on the
this reason, course results are commonly reported
distribution sheet in the appropriate class column
on a letter scale: a grade of A representing an out-
standing achievement, B representing an excellent
as each mark is called out. A decimal of .5 or
above is rounded up to the next whole number,
achievement, etc.
e.g. 64.51 becomes 65. A decimal of .49 or below
In our teaching situation, we are required to
is rounded down, e.g. 71.47 becomes 71.
report course grades on the following 10-point
The first person calls out the mark as it appears
letter scale:
on the class mark grid, e.g. 71.49. The second per-
A son calls out the rounded-up or rounded-down
A- OUTSTANDING figure, e.g. 71, before entering the tally on the dis-
103
Frank Chaplen
or totalling tallies, or calculating the cumulative their finalexamination papers are studied before a
frequency. This error needs to be rectified. final decisionis taken. In this case, it is decided to
The distribution of final marks is considered at leave the boundary where it is: between 84% and
a meeting of all teachers, and initial decisions taken 85%, but in other cases the boundary will be
on where to set the boundaries between the letter moved.
grades on the mark scale. The first year that a The most painful decisions concern the placing
course is taught, these initial decisions are necess- of the lower boundaries, particularly that between
arily somewhat arbitrary. It might be decided, for D+ and C. Inevitably there will be one or two
example, to base the initial tentative distribution students just below this borderline who have made
of grades approximately on the normal distribution extraordinary efforts during the course, and one or
curve, that is, to determine by purely statistical two students who have done very little work. Does
means what proportion of the students should fall one give all 4 a grade of C, and thereby risk con-
between each grade boundary. Diederich (i) vincing the lazy ones that they really do not need
suggests that teachers use a modified stannine towork in the remainder of the English language
score scale for this purpose; applying his suggestion programme courses? Or does one give all 4 a grade
to the letter grade scale described above, we get of D+, and risk convincing the serious students
something like this: that no amount of effort is worthwhile? However
% of Students
8% 4%
4% 8% 12% 16% 20% 16% 12%
in each Grade
The second year that the course is taught, the painful these decisions may be, one must resist
tentative grade boundaries can be based on those the temptation to 'find an extra mark somewhere'
finally decided the previous year. On whatever basis for the serious students; that would be certain to
the initial setting of the grade boundaries is done, create manner of problems in the future.
all
however, the next stage in the grading procedure is The main point to notice is that final decisions
the critical one: the initial grades for each indiv- about where to place grade boundaries are based
idual student are written on the class mark grid on consensus, and that this consensus arises only
lightly in pencil, and each teacher considers each after considerable discussion of the individual
student's tentative grade in the light of his know- students concerned. seems to us that it is only
It
104
Frank Chaplen
Course Assessment Date, time and Posting of Preliminary 1st Draft Final Draft Completion Posting of
and person place of notice for drafting for typing of marks provisional
responsible assessment students meeting processing grades
101 Test One: Mon. Oct. 6 Wed. Oct. 1 Tues. Sept. 10 Wed. Oct. 1 Sat. Oct. 4 Sat. Oct. 11 Sun. Oct. 12
A.M. 10.00 Room 101
Mid-Sem Wed. Nov. 12 Wed. Nov. 5 Sun. Nov. 2 Tues. Nov. 2 Sat. Nov. 8 Wed. Nov. 19 Sat. Nov. 22
P.S. 9.00 Room 312
Test Two Tues. Dec. 9 Tues. Dec. 2 Wed. Nov. 26 Sun. Nov. 30 Wed. Dec. 3 Sun. Dec. 14 Mon. Dec. 15
G.L. 10.00 Room 102
Final Sun. Jan. 4 Sun. Dec. 28 Tues. Dec. 23 Sat. Dec. 27 Mon. Dec. 29 Sat. Jan. 10 Mon. Jan. 12
S.A. + A.M. 9.00 Room 213
Table 3.
A B C D E F G H
1. Ahmed 85 14.78 81 25.58 101 37.41 77.77% B
2. AH 49 8.52 51 16.11 86 31.85 56.48% D+
3. Farced 53 9.2 60 18.95 89 32.96 61.11% C
etc. etc. etc. etc. etc. etc. etc. etc. etc.
105
Frank Chaplen
Table 4.1: Example of a Mark Distribution Sheet
1 2 3 4 5 6 7
2 4 6.2
A
94 / /
93
92 / 1 5 7.7 A-
91 / / 2 7 10.8
90
89 III 3 10 15.4
88 1 11 16.9
1
B+
87 / 1 12 18.5
86 // 2 14 21.5
85 1 / 2 16 24.6
84 / 1 / 3 19 29.2
83 II 2 21 32.3 B
82 / II 3 24 36.9
81
80 3 27 41.5
79
/ II
1 28 43.1
B—
1
78 / 1 2 30 46.2
77 / II 3 33 50.8
76 / 1 1 3 36 55.4
75 / 1 37 56.9 C+
74 / 1 1 3 40 61.5
73 1 II 3 43 66.2
72
71 / 1 44 67.7
70
69 / 1 45 69.2
68 / 1 46 70.8
67 / 1 47 72.3
c
66 / 1 48 73.8
65 III 3 51 78.5
64 / II 3 54 83.1
63
62 / 1 55 84.6
61 / 1 56 86.2
60
59 4 60 92.3
58
// II
D+
57 / 1 61 93.8
56
55
54 D
53
52 1 1 62 95.4
51
50 / 1 63 96.9
49 F
48 // 2 65 100%
106
JohnWOIIerJ,
Appendix: Research
A comment on specific variance versus global
variance in certain EFL tests
Perhaps the basic statistical problem in the deter- respond briefly to claims by Abu-Sayf, Herbolich,
mination of what a test measures is the assessment and Spurling (1979) concerning 'unique nonchance
of the sources of the variance across individuals variance' (which, according to the classical factor
that the test produces. Put in nontechnical terms, model, is specificity) in each of four parts of an
the deeper problem is to find out what factors in EFL proficiency exam recently developed by them
the behaviour of test-takers result in differences in at Kuwait University. In subscores of 139 adult
the performances of various individuals and non-native speakers of English, they claimed to
groups. According to the classical factoring model have identified (using a method recommended by
(cf. Harman 1976: 18-20), the standardized unit Davis 1968, 1972) four specificities — Grammar
variance of any test j can be composed (at least 22%, Listening Comprehension 41%, Reading
theoretically) into three uncorrelated components: Comprehension 32%, and Translation 24%. They
1) variance that is common to other tests, referred further suggested that these findings were in con-
2
to as the communality which is designated hj ; flict with 'Oiler's hypothesis (1973) of there being
2) variance that isj but nonrandom,
unique to only one global proficiency [test] such as a cloze
known as the specificity designated bj2 and 3) vari- ; or a dictation' (p. 117).
1
ance which is unique to j but random, referred to Actually, their paper raises two substantive
as error or unreliability designated e? . These three issues. First there is the question of test specificities
terms must add up to 100% of the total variance in in relation to the global factor, and second, there
j if the assumptions underlying the classical model
is the question of what is the most plausible ten-
are correct. According to that model, in order for tative conclusion regarding such a global factor. In
a test to achieve a satisfactory level of validity, we the first matter we may ask whether the claimed
should expect its communality with
aimed at tests specificities actually exist in the reported magni-
the same construct(s) to be high while its com- tudes, and in the second what the implication is
munality with tests aimed at disparate constructs for the existence of a large global factor of language
should be low. When we examine tests aimed at proficiency.
distinct constructs, we expect them to have rela- In regard to the question of specificities, Oiler
tively high specificities and low communalities. and Khan (1980) demonstrate that the application
Always we hope for low unreliabilities. of the modified Davis method of obtaining esti-
It is generally conceded (not quite gleefully) mates of unique nonchance variance (or specificity)
that there is no determinate single best solution which was applied by Abu-Sayf et al., is flawed in
for any given factoring problem. The variance in two ways: first it overestimates specificity by con-
any given test may be partitioned in an infinitude flating it with error variance, and second, it under-
of ways, as has been demonstrated in theory, and estimates communalities by overcorrecting for bias
arguments about the best possible solution are in squared multiple correlations. In fact, squared
probably misguided (Harman 1976: 27f). However, multiple correlations are already conservative
this is not the same as saying that all possible solu- estimates of communalities due to the fact that
tions are equal for all purposes. In some cases it is they are known to constitute the lower bounds of
possible to show that one solution is decidedly true communalities. Oiler and Khan show that the
better than a number of others. In most cases, the more probable limits of the specificities properly
arguments must be thrashed out by appealing to obtainable from the correlations in the Abu-Sayf
theoretical reasoning that goes beyond statistics et al. study are near zero — this compared with a
per se. Nevertheless, the application of statistical large global factor accounting for as much as 95%
methods seems indispensable. of the variance in the Grammar Test by one
With the foregoing as background, this note will method and never less than 75% of the variance
107
John W Oiler J
by any one of three different be to obtain all possible data on all possible tests —
in any of the tests
methods (squared multiple correlations corrected a clear impossibility. In fact, all of the evidence
for bias, communalities estimated by principal that I know of points to the conclusion that a
factoring with iterations, and communalities esti- multitude of language processing tasks aimed at
mated by Rao's canonical factoring with iterations). the kinds of things language users will actually be
Even by the most conservative method of esti- expected to do with language makes the best lan-
mating communalities, the respective specificities guage test in any given set of circumstances. If
were Grammar —.01, Listening Comprehension students are expected to learn to use the target
any test by any method amount to as much as half makes sense to use a plurality of testing procedures
the error variance in the test in question. for many reasons. I alsostill believe the remark
Coming now to the second question concerning that was written in 1977 though not published
the global factor of language proficiency, what until 1979, that 'it is probably safe to say
does the foregoing mean? In practical terms can that the best pragmatic testing procedures have yet
we conclude that there is only one factor, a general to be invented' (Oiler, 1979: 416).
factor? It seems to me that we cannot. The evi- The exigencies of practical life often force us to
dence suggests that following the classical factor leap beyond the empirical evidence. Theories in
model there is no reason to believe that any of the general are not based exclusively on substantiated
four tests produced by Abu-Sayf et al. generates a empirical findings either. Even if it turned out that
reliable specific variance. The variance generated there were only one general factor of language pro-
by any triplet of the tests pretty much exhausts ficiency, it would still make sense to use a multi-
the variance generated by the remaining single test. plicity of testing methods (as John Carroll, 1980,
That is, most of the reliable variance in each of and others have recently observed). Moreover, in
these four tests is common to the remaining three. spite of the pervasiveness and general strength of a
But these four tests do not by any means exhaust global factor of language proficiency underlying
the universe of possible language tests! Therefore, educational and psychological tests of all sorts,
we cannot on the basis of this study or any there is recent evidence that suggests a multiplicity
previous study conclude that there is only one of specific factors will yet be found (see Bachman
factor underlying the variance in all language tests. and Palmer, 1980, and Upshur and Homburg,
On the other hand, on the basis of many previous 1980). However, I personally doubt (at this
studies (see Oiler, 1979, Appendix for a brief and moment) that the general factor can be explained
already somewhat dated review), we can say that away satisfactorily by even the newer and more
there appears to be a large general factor of lan- powerful methods of confirmatory factor analysis.
guage proficiency in nearly all of the tests so far Nor have I seen any evidence as yet that would
studied (an exception is a narrowly defined spelling refute the claim that Spearman's general factor of
score, see Oiler, 1979: 281). intelligence may indeed turn out to be indis-
There is no basis, in spite of these findings, to tinguishable from proficiency in one's primary or
conclude that a single test such as a dictation or a strongest language (see Streiff and Oiler in press).
cloze test, is the best way to measure that general Still, we are speaking here of hypotheses rather
factor. In isolated cases of certain sets of tests, than proven facts, and it is my belief that one
results favour the interpretation that one or more of should not place too much weight on hypotheses
the input tests are better at measuring the general and hunches but carefully regard them as precisely
factor than other tests, but the only way we could what they are.
even theoretically find the single best test would
108
JohnWOIIerJ,
Perhaps the cited remark can be reasonably inferred composition, and oral interview, are Upshur's
from things I have said or written, but I do not believe test of productive communication (Upshur, 1969),
that I have ever actually advocated the use of any single reading aloud, and some multiple choice tests of
test or pair of tests as measures of language proficiency reading comprehension and other skills' (1973: 11). In
in an all encompassing general sense. While the evidence the reference to multiple choice tests I intended to
seems to suggest that dictation and cloze procedure include some of those developed in connection with
along with a number of other integrative tests, or more the testing of foreign students at UCLA during my
specifically pragmatic tests, are useful practical tools three years there, as well as tests like the Listening
for assessing language proficiency (whatever it may Comprehension and Reading Comprehension sections
turn out to be), I have long tried to stress that the con- of the TOEFL. Parish's Grammar Test (see Oiler and
cept of pragmatic testing extends to an implicit infini- Perkins, 1980, Appendix, item 22) is an integrative or
tude of test procedures (so do the terms cloze and dic- pragmatic test in this sense.
tation). While some tests appear to be better measures
of general language proficiency than others, there is no
reason to suppose that the class of best tests has yet By contrast, communality estimates (which may be
been identified. Nevertheless, in the article referred to read as indicants of a global factor in this case and
by Abu-Sayf et al., I indicated my advocacy (at that many similar studies) ranged from a low of 60% to a
time) of 'integrative testing' and suggested that 'some high above 85%. Further, in this particular case we are
of the types of tests that qualify as belonging to the referring to estimates based on the squares of multiple
integrative family besides dictation, cloze procedure, correlations, a lower bound for the true values.
References
Abu-Sayf, F.K., Herbolich, James B., and Spurling, S., 1979, Language tests at school: a pragmatic
Oiler, J.W., Jr.,
1979, 'The identification of the major components for approach, London, Longman.
testing English as a foreign language', TESOL Quarterly Oiler, J.W., Jr. (Ed.). In press, Issues in Language Testing
13, pp. 117-20. Research, Rowley, Massachusetts, Newbury House.
Bachman, Lyle F. and Palmer, Adrian S., 1980, 'The con- Oiler, J.W., Jr., and Khan, R., 1980, 'Is there a global
struct validation of oral proficiency tests'. Paper pre- factor of language proficiency?' Paper presented by
sented at the Fourteenth Annual TESOL Convention, the first author at the 1 5th Regional Seminar sponsored
San Francisco, March 1980. Also in TESL Studies 3, by the South East Asian Ministers of Education Organ-
pp. 1-20 (University of Illinois at Urb ana-Champaign). ization, at Regional English Language Center,
the
To appear in Oiler (in press). Singapore, April 1980. In the proceedings edited by
Carroll, John B., 1980, 'Language testing and psychometric John Read (to appear).
Second Inter-
theory'. Closing plenary lecture at the Oiler, J.W., Jr. and Perkins, Kyle, 1980, Research in lan-
national Language Testing Sumposium, Darmstadt, guage Rowley, Massachusetts, Newbury House.
testing,
West Germany, May 1980. Also presented at the Lan- Oiler, J.W., Jr. and Streiff, Virginia A. In press, The lan-
guage Testing Conference, Albuquerque, New Mexico, guage factor, more tests of tests, Rowley, Massa-
University of New Mexico, June 1980. To appear in chusetts, Newbury House.
Oiler (in press). Upshur, John A., 1969, 'Productive communication
Davis, Frederick B., 1968, 'Research in comprehension in testing'. Paper presented at the Second International
reading'. Reading Research Quarterly 3, pp. 499-545. Congress of Applied Linguistics, Cambridge, England.
Davis, Frederick B., 1972, 'Psychometric research on In G. Perren and J.L.M. Trim (Eds.), Applications of
comprehension in reading', Reading Research Quarterly linguistics, Cambridge, England, Cambridge University,
7, pp. 628-78. 1971, pp. 435-42. Also in Oiler, J. and Richards, J.
Guilford, J.P. and Fruchter, B., 1978, 'Fundamental (Eds.),Focus on the learner, Rowley, Massachusetts,
statistics in psychology and education', 6th edition Newbury House, 1973, pp. 177-83.
revised, New York, McGraw Hill. Upshur, John A. and Homburg, Taco J., 1980, 'Some lan-
Harman, Harry H., 1976, Modern factor analysis, 3rd guage test relations at successive ability levels'. Paper
revised edition, Chicago, University of Chicago. presented at the Second International Language Testing
Oiler, J.W., Jr., 1973, 'Pragmatic language testing', Lan- Symposium, Darmstadt, West Germany, May 1980.
guage Sciences 28, pp. 7-12. To appear in Oiler (in press).
109
Profi les
Profi les
JOSEPH BOYLE teaches in the English Depart- since 1976. Listening comprehension is to him the
ment of the Chinese University of Hong Kong. He most fundamental of all four language skills. Lan-
has previously taught in South America, India and guage acquisition he regards as an essentially intui-
the Philippines. He studied English Language and tive, emotional and non-cerebral experience. He is
Literature at Oxford and has done the Leeds ESL suspicious of 'academic postures'. Being himself a
Postgraduate Diploma Course. He works with linguist, he prefers to call the other kind 'linguis-
Chinese students who have chosen English as their ticians '.
major subject. He also runs extra-mural courses in PENNY FRANTZIS has taught English in Spain
Business English and Medical English. and Saudi Arabia, and directed an English Language
BRENDAN CARROLL had extensive ELT exper- Course in Switzerland. For the last eight years she
ience in Kenya, India and Nigeria before becoming has been lecturing and teaching at the University
Director of the British Council English Language of Leeds. Her work has involved the preparation of
Teaching Institute in London. He left his last post course materials for the academic needs of overseas
in the Council as head of their English Language students in Britain and ESP (English for Specific
Testing Service Liaison Unit to take charge of Purposes) programmes for such diverse groups as
Pergamon English Testing, Oxford. He also works Kuwaiti hospital administrators, overseas psychia-
as a private consultant and
the author of several
is trists, engineering students from the Middle East,
110
imm
Profi es
Overseas in the Department of Adult and Higher language testing project for overseas doctors. Her
Education at the University. Apart from various current research interests are related to language
short courses, she also runs the In-Service teacher- testing, curriculum development, and ESP materials
training course for the Lancashire Education production.
Authority. She is Chief Examiner for the Joint JOHN ROGERS is a Senior Lecturer at the English
Matriculation Board's Test in English (Overseas), Language Institute Victoria University, Wellington,
,
the North West Regional Examinations Board's New Zealand, where he has been teaching on Dip.
English as a Second Language and is Moderator for TESL courses for teachers from Southeast Asia,
the Yorkshire Regional Examinations Board's the South Pacific and New Zealand since 1971. He
English as a Second Language. Her new book Eng- spent two years teaching English to adults and
lish in Context is due for publication early in 1982, secondary school students in Sweden from 1955
published by Thomas Nelson & Sons, Walton on to 1957, and from 1957 to 1961 he helped to
Thames. train secondary school English teachers at Univer-
KEITH MORROW is an Assistant Director of the sitasAirlangga, Indonesia. He worked for the British
Bell Educational Trust, based in Norwich. He was Council in Nigeria (1961-1963) and in Ethiopia
formerly a lecturer at the University of Reading. (1963-1969), where he was the co-adaptor of
He is the Chief Examiner for the Royal Society of several books. From 1976
to 1978 he was seconded
Arts 'Examinations in the Communicative Use of to the SEAMEO Regional Language Centre, Singa-
English as a Foreign Language '. pore, as Specialist in the Psychology of Second
PAUL NATION is a senior lecturer at Victoria Language Learning and Applied Linguistics. In
University in Wellington, New Zealand. He has also Singapore he compiled Group Activities for Lan-
taught in Indonesia and Thailand. His special inter- guage Learning (RELC Occasional Papers).
ests are in teaching techniques and code-based IAN SEATON is Head of the Liaison Unit for the
approaches to language teaching. English Language Testing Service in the British
JOHN W. OLLER, Jr. received his doctorate in Council. He taught ESP programmes for two years
general linguistics from the University of Rochester, at the University of Tripoli, Libya and for two
in Rochester, New York in 1969. He has served on years at the University of Helsinki, Finland before
the faculty at UCLA and the University of New joining the Council in 1976.
Mexico and has held visiting appointments at BILL SHEPHARD. Academic training consisted of
Southern Illinois University and Concordia (in escape from the Cambridge English
systematic
Montreal). From 1971 to 1976 he served on the course via non-compulsory Old English, linguistic
Committee of Examiners for the Test of English gossip (no department at that time), phonetics and
as a Foreign Language at ETS. Presently, he is dialect research at Leeds. This was followed by
Professor of Linguistics at the University of New EFL teaching and finally adminstration of the
Mexico. Cambridge EFL examinations. With colleague
ALBERT PILLINER was, until his retiral in 1978, Harold Otter, he has tried to absorb usefully into
Director of the Godfrey Thomson Unit for Educ- the examination structure the successive waves of
ational Research and Senior Lecturer in the revolution and counter-revolution in EFL teaching
Department of Education, University of Edinburgh. and testing.
He is especially interested in the testing of English NIC UNDERHILL. Educational Co-ordinator,
as a foreign or second language. Sponsored by International Language Taught EFL at
Centres.
UNESCO and by the British Council, he has taught various schools in London and Sussex and then
(and continues to teach) in Europe, West Africa, worked for ILC for two years at the Kuwait Oil
South America and in the Middle and Far East. He Company Training Centre before returning to
has also directed language testing courses for England to do an M.A. in Applied Linguistics at
international groups in UK on behalf of the British the University of Reading.
Council and, more recently, the University of CHRISTOPHER WARD is head of the Testing
Edinburgh Institute of Applied Language Studies. Department, International Language Centre (Japan)
PAULINE M. REA is Senior Lecturer in the in Tokyo. After obtaining a Diploma in English as
Department of Foreign Languages and Linguistics, a Second Language at Leeds University, he taught
and Co-ordinator of the Communication Skills immigrants Bradford, Yorkshire for two years.
in
Unit at the University of Dar es Salaam. She has Then he went to Japan and taught at ILC for three
EFL experience at secondary level and in teacher years before taking up his present post six years
training programmes in Africa and Europe. She has ago.
worked on the General Medical Council's English SIDNEY WHITAKER has directed the TESL
111
Profines
training course at University College, Bangor, since shorter assignments in India, Bangladesh, China,
1964. He previously taught French at Glasgow Egypt, Jordan, and Yugoslavia. He regularly collab-
University,and English and language-teaching orates with English teachers in Spain as well as
methodology in Vietnam and Venezuela, with with teachers of immigrant pupils in Britain.
Bibliography
Alderson, C. and Hughes, A. (Eds.) (1981) Issues and Language Teaching, Special Issue on Lan-
in Language Testing, ELT Documents 111. guage Testing, No. 4, Hong Kong: Language
London: British Council Centre, University of Hong Kong
Allen, and Davies, A. (Eds.) (1977) 'Testing
J. P. B. Grieve, D.W. (1964) English Language Examining .-
and experimental methods', Edinburgh Course Report of an Inquiry into English Language
in Applied Linguistics, Vol. 4. London: O.U.P. Examining, Lagos African Universities Press
:
Beardsmore, H.B. (1974) 'Testing oral fluency', Harris, D.P. (1969) Testing English as a Second
IRAL, 12, 4, pp. 317-26 Language, New York: McGraw-Hill
Briere, E.J. (1971) 'Are we really measuring pro- Heaton, J.B. (1975) Writing English Language
ficiency with our foreign language tests?', Tests, London: Longman
Foreign Language Annals, 4, May Ibe, M.D. (1975) 'A comparison of cloze and
(1969) 'The main stages in the
Burstall, Clare multiple-choice tests for measuring the English
development of language tests', Stern, H.H. reading comprehension of South-East Asian
(Ed.) Languages and the Young School Child, teachers of English', RELC Journal, 6.2. Singa-
London: O.U.P. pore: SEAMEO Regional Language Centre
Carroll, B.J. (1980) Testing Communicative Jones, R.L. and Spolsky, B. (Eds.) (1975) Testing
Performance Oxford: Pergamon
, Language Proficiency Washington, D.C.:
,
Clark, J.L.D. (1972) Foreign Language Testing: Centre for Applied Linguistics
Theory and Practice, Philadelphia, Pa, Centre Lado, R. (1961) Language Testing: the Construc-
for Curriculum Research tion and Use of Foreign Language Tests,
Crocker, A.C. (1969) Statistics for the Teacher (or London: Longman
How To Put Figures in their Place), Harmonds- Lee, Y.P. and Low, G.D. (1981) 'Classifying tests
worth: Penguin of language use'. Paper presented at 6th AILA
Davies, A. (1968) Language Testing Symposium, World Congress, Lund, Sweden
London: O.U.P. Moller, A. (1975) 'Validity in Proficiency Testing',
Davies, A. (1978) 'Language Testing (Survey ELT Documents, 3, pp. 5-18, London: British
Articles)' Language Teaching and Linguistics-. Council
Abstracts, Cambridge: Cambridge University Morrow, K.E. (1977) Techniques of Evaluation for
Press, Vol. II a Notional Syllabus, Reading: Centre for
Davies, S. and West, R. (1981)The Pitman Guide Applied Language Studies, University of Reading
to English Language Examinations for Overseas (for the Royal Society of Arts)
Candidates, London: Pitman Morrow, K.E. (1979) 'Communicative language
Douglas, D. (1978) 'Gain in reading proficiency in testing: revolution or evolution', C.J. Brumfit
English as a Foreign Language measured by and K.J. Johnson (Eds.) The Communicative
three cloze scoring methods', Journal of Approach to Language Teaching, London.-
Research in Reading, 1, 1, pp. 67-73 O.U.P.
English Speaking Board (1981), Oral Assessments Munby, J.L. (1978) Communicative syllabus
in Spoken English as an Acquired Language, design, Cambridge: Cambridge University Press
Southport Oiler, J.W. (1971) 'Dictation as a device for testing
Fok, A.; Lord, R.; Low, G.;T'sou, B.K.; and foreign-language proficiency', English Language
Lee, Y.P. (1981) Working Papers in Linguistics Teaching, 25, 3, pp. 254-9
112
Bibliography
Oiler, J.W.(1972) 'Cloze tests of second language Read, J. A.S. (Ed.) (1981), Directions in Language
proficiency and what they measure', Language Testing, RELC Anthology Series 9, Singapore:
Learning, 23, 1, pp. 105-18 SEAMEO Regional Language Centre
Oiler, J.W. (1979) Language Tests at School, Schulz, Renate A. (1977) 'Discrete-point versus
London: Longman simulated communication testing in foreign
Oiler, J.W. & Streiff, Virginia (1975) 'Dictation: a languages', Modern Language Journal, 61, 3,
Language Testing, 1967-1974, Washington Language Journal, 58, 5/6, pp. 239-41
D.C.: TESOL Upshur, J. A. & Fata, J. (1968) 'Problems in foreign
Perren, G.E. (1967) 'Testing ability in English as a language testing', Language Learning, Special
second language', English Language Teaching; Issue, No. 3
21, l.pp. 129-36;21,2,pp. 99-106; 21, 2, Upshur, John A. (1971) 'Objective evaluation of
pp. 197-202 oral proficiency in the ESOL classroom',
Perren, G.E. (Ed.) (1977) Foreign Language TESOL Quarterly, 47-60
5, pp.
Testing: Specialised Bibliography, Centre for Valette, R.M. (1977) Modern Language Testing
Information on Language Teaching and Research (2nd ed.), New York: Harcourt Bruce Jovanovich
Rea, Pauline M. (1978) 'Assessing language as Valette, R.M. & Disick, R.S. (1972) Modern Lan-
communication', MALS Journal, New series, guage Performance Objectives and Individual-
No. 3, University of Birmingham, Department ization, New York: Harcourt Bruce Jovanovich
of English
113
r
New JMB Practice Books
Patterns of Fact Text to Note
Practice in reading and writing English for Study skills for advanced learners
academic purposes Alex Adkins and Ian McKean
Judith Kennedy and Susan Hunston * Reading exercises teach students to read for
* Reading and writing tasks are grouped
texts gist and for specific information
* Listening exercises teach students to extract
in functional areas and progress in difficulty
* Grammar is consolidated in cloze exercises salient points from lectures
* Vocabulary expanded in labelling exercises
is
* Note-making from written and spoken texts
* Two JMB practice papers are included is taught and practised
Longman «s
SOURCE BOOKS FOR TEACHERS
Discover English
ROD BOLITHO and BRIAN TOMLINSON
This invaluable book helps to sensitize teachers to the English Language, making them more
acutely aware of the difficulties experienced by foreign learners.
Cambridge UT
Practice Test s for the
Cambridge Examinations in EFL
2 new sets of practice tests containing authentic papers from the June
1979 to June 1981 First Certificate and Proficiency examinations.
* All the papers are reproduced exactly as they originally appeared.
* Teacher's Books provide answer keys, marking schemes and instructions on how
to assess the candidates' performance.
* A new feature of the Teacher's Books is the inclusion of sample essays written by
students the examination. This is intended to serve as a guide to teachers who
in
find it difficult to assess students in non-objective Paper 2.
FCE Practice Tests 2 Student's Book £1 .75 CPE Practice Tests 2 Student's Book £2.25
Teacher's Book £2.25 Teacher's Book £2.75
Also published:
FCE Practice Tests and CPE Practice Tests (sets of papers from the June 1976 to June 1978
examinations).
Available from all leading ELT booksellers or. in case of difficulty, direct from Cambridge University Press.
Cassette
M Archer and E Nolan-Woods Stage
1
2 Students' Book
Series 1 Students' Book Stage 2 Teacher's Book
Series1 Teacher's Book Stage 2 Cassette
Set of 2 Cassettes Stage 3 Students' Book
Series 2 Students' Book Stage 3 Teacher's Book
Series 2 Teacher's Book Stage 3 Cassette
NELSON ENGLISH LANGUAGE PRACTICE TESTS FOR MICHIGAN
TESTS
W S Fowler and Norman Coe CERTIFICATE ENGLISH
Students' Book 1 George P McCallum
Students' Book 2
Students' Book 3 Book
Teacher's Book Cassette
ENGLISH TESTS FOR DOCTORS PRACTICE TESTS FOR TOEIC
Dick Alderson and Vivienne Ward
George W Pifer
Students' Book
Teacher's Book Students' Book
3 Cassettes 3 C45 Cassettes
Order from your bookseller, or from Cashpost Service, Book Centre, Southport, PR9 9YF
quoting the ISBN number 273 01592 3, and making your cheque/p.o. payable to Pitman
Books Ltd. The price is £3.50, post and packing are free.
Cambridge Examinations
The principal world-wide course target and definition of
standard. Taken by over 80,000 candidates yearly in over
60 countries.
Tests
Highly researched, quick and easy to administer,
consistently reliable in its results, this test will
place, any number of students in order of rank
from 'false beginners' to post-Proficiency on the
first day of a course or term.
The components of the test are: * Test pack containing 50 copies of the test;
* Marker's Kit containing plastic masks as an
aid to quick marking, an introduction to the
placement test theory, and an administrative
guide;
* Cassette of the listening and reading test.
For further Information please write to English Language Teaching Department, Oxford University Press, Walton
Street, Oxford OX2 6DP
Testing Services
Specialist English language testing in all our centres to meet your specific needs
In Japan — BETA — Businessmen's English Test and Appraisal — has already been
administered to 28,000 businessmen. It is designed to tell clients when their company
staff can operate successfully in English abroad or in an English speaking environment
If you have any practical testing queries, consult LC. Whether you want a piece of
I
advice or a tailor-made testing programme, LC can help you to develop a better system
I
of evaluation. Jm
86 Maryleborte High Street 86 Marylebone High Street 20 Passage Dauphine P.O. Box 7647 Iwanami Jimbo-cho Btdg
London W1M3DE London W1M3DE Paris 75006 Feheheel 2-1 Kanda Jimbo-cho
Tel: (01) 486 1760/1770 Tel: (01) 486 1222
FRANCE KUWAIT Cbiyoda-ku
r
Centre for
Applied Language Studies
University of Reading, Engbnd
Consultancy Services
The Centre undertakes consultancy and advisory services in such fields as syllabus and course
design, testing and materials production.
Information
For information on Centre courses and services, unite to:
The Administrative Assistant,
Centre for Applied Language Studies, Language Resource Centre,
Language Testing
Courses
• Courses in Language Testing at all levels
Short summer testing courses
Initial testing training courses
Masters degree programmes
PhD programme
Production
• Test writing, validation and analysis service
Consultancy
• Testing Consultancy Service from the country's leading experts in testing
Technical Support
• Full University Computing and Statistical Back-up
LANGUAGE TESTING
A collection of articles written by testing
specialists, teachers and teacher trainers involved
in the testing of English in a number of different
countries.
906149 29