0% found this document useful (0 votes)

149 views

Language Testing

Uploaded by

KKO TGI

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views

Language Testing

Uploaded by

KKO TGI

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 132

edited by J. B.

Heaton
.,„
Language
Testing

edited by Brian Heaton

ISBN 906149 29
MODERN ENGLISH PUBLICATIONS LIMITED 1982

All rights reserved. No part of this publication may be reproduced, stored

in a retrieval system, or transmitted by any form or by any means.
electronic, mechanical, photocopying, recording or otherwise without the
prior permission of the Copyright owner.

First published 1982

Set by Illustration Services Limited

Printed in Great Britain by New Avenue Press. Hayes. Middx.

Cover design by Martin Miller

Table of Contents

PREFACE

1 Language testing — is there another way? 56 Testing Spoken Language Keith Morrow
Brendan J. Carroll 59 Questioning some assumptions about cloze
11 Criteria for evaluation of tests of English as a testing Robert Keith Johnson
Foreign Language Alan Davies 73 Getting information from advanced reading
17 The great reliability/validity trade-off: tests Paul Nation
problems in assessing the productive skills 77 Writing in perspective: some comments on the
Nic Underbill testing and marking of written
24 Examinations — why tolerate their communication Brian Heaton
paternalism? Peter Fabian 81 Testing language with students of literature
28 The Cambridge Examinations — an exercise in ESL situations,/. P. Boyle
in public relations W.G. Sbephard 86 A place for visuals in language testing
32 Proficiency testing for tertiary level study and Patricia L. McEldowney
training in Britain Ian Seaton 92 Competence in English used for academic
36 Progress testing: preparation and analysis subject examinations Arthur Godman
C.S. Ward 97 Evaluation A. E.G. Pilliner

40 Tennis plays Nha — or how to humanise tests 102 Measuring student achievement: some
John Rogers practical considerations Frank Chaplen
43 An alternative approach to testing grammatical
competence Pauline M. Rea 107 Appendix: Research
48 Dictation as a testing device S.F. Wbitaker A comment on specific variance versus
52 Listening comprehension: teaching through global variance in certain EFT tests
testing techniques Penny Frantzis John W. Oiler Jr.
re race
Pref,

The publication of this collection of articles on terms direct, semi-direct and indirect to classify
language testing comes at a very opportune time, tests of speaking and writing ranging from highly
as recent developments in communicative language realistic tasks to unrealistic tasks. In the next
teaching are now resulting in a widespread re- article, Fabian reminds us that studying a language
appraisal of language tests and techniques. Not is from acquiring its communicative
vastly different
only have the many shortcomings of the structur- facility, and suggests that teachers can exert a
alistand behaviourist approach to language teaching beneficial influence on examining bodies by a
been strongly attacked, but the whole psychometric more critical and creative participation in the
basis of language testing has been seriously design and construction of examinations.
questioned. Fierce arguments still rage over such However, the solutions to the many problems
established criteria as test validity and reliability. facing test constructors and administrators are
Consequently, it is now necessary to take stock of neither as simple nor as straightforward as many-
our present-day tests and examinations in English might at first imagine. This is illustrated in an
and to re-examine many of the basic premises so article by Shephard, who traces the development
much cherished in the past. However, we should of the Cambridge EFL examinations from 1913,
take care not to discard every well-tried and with their emphasis on the formal correctness of
proven method of testing in our search for some language use, literature and translation, up to a
magic formula or new technique which will functional test currently under consideration, with
enable us to solve our problems in the assessment its one-third oral component. Throughout the
of language used as communication. All too often article, he shows the importance which the Cam-
in language testing, as in language teaching, there bridge Syndicate constantly attaches to public
seems to be a tendency for many to jump on any opinion, referring to questionnaires, correspon-
current bandwagon, accepting half-formed theories dence, queries, and important public relations
and applying them hastily and uncritically. As aspects of the Syndicate's work. Seaton also refers
most articles in this special issue of MET clearly to the problems encountered in constructing
demonstrate, considerable critical judgement is examinations which will be administered on a large
necessary in evaluating all the various types of scale throughout the world. He describes the
tests and the principles on which they are based. It various difficulties met and overcome in the
is always important to keep the best of established specification and design of the battery of tests set
methods while, at the same time, seeking to develop up and administered by the British Council and
where necessary new and more appropriate tech- the University of Cambridge Local Examinations
niques to reflect the different emphases now being Syndicate.
placed on language learning. Practical considerations in the use of progress
In the first article in the collection, Carroll tests by teachers and people directly concerned
points out that testing for any programme should with the running of particular courses are touched
be compatible with the ideas behind the teaching upon by Ward in an article on the preparation and
method used: hence communicative teaching pro- analysis of progress tests, while Rogers approaches
grammes should be assessed by communicative this topic in a different way by providing several
tests. The main aspects of communicative tests examples of ways in which test items can be
treated by Carroll are: the test-curriculum relation- devised so as to provide interest and even amuse-
ship, a purposive test framework, test content and ment on language courses.
procedures, levels of performance and methods of should be emphasised at this stage that a
It

analysing test data. Davies pursues this topic in concentration solely on such aspects of communi-
discussing criteria for the evaluation of tests of cative competence as authenticity, appropriacy
EFL, and examines three kinds of validity with and register is wrong if the testing of the gram-
examples of published tests
special reference to six matical system of the language is neglected and
in Britain. The question of validity and reliability regarded as subordinate or inferior in any way.
is taken up again by Underhill in an article which Language still grammar, as Rea points
consists of
identifies problems in assessing the productive out in her article on an alternative approach to the
skills of speaking and writing. Underhill uses the testing of grammatical competence within a model
rerace
Pref.

of language learning. She gives examples of ways in and the implications of such
local/linguistic errors,
which the selection and production of a language a marking of compositions.
classification for the
form is determined not only by its grammatical He concludes by questioning the validity of com-
correctness but also by its function within a given position tests which concentrate only on first
communicative area. drafts written within severe time constraints. This
While it is always important to experiment with subsection is then concluded by Boyle with an
new testing techniques, it is also essential to article discussing various kinds of tests of language
attempt to develop existing techniques much for students of literature. Examples of types of
further. After falling into disrepute in the 1960's, items testing reading, writing, listening and speaking
the long-established dictation test has been the are given in an attempt to foster an awareness
subject of considerable research during the past among teachers of the relevance and educational
decade. Whitaker examines the flexibility of values of the literature they teach.
dictation as a testing device and the various lan- Inter-dependence of verbal and non-verbal
guage skills involved, giving numerous practical information both in the classroom and in

suggestions for making dictation and a relevant the world outside suggests that visuals have a
realistic test. In a similar way, Frantzis touches on — provided
valuable role to play in language testing
the skills involved in understanding spoken English, that can be ensured that candidates respond to
it

describing in detail the use of a radio news bulletin language rather than find the meaning in any
for improving such skills in a systematic way. accompanying visuals. McEldowney shows how
After discussing problems experienced by visualscompensate for fragmentary verbal com-
teachers in constructing tests of spoken language, prehension questions on a text by helping to test
Morrow gives some suggestions for devising appro- objectively an awareness of the text content as a
priate realistic tasks, recommending the testing of whole, without overt clues from the questions
students in groups. He then gives a number of and eliminating the need for verbal production on
criteria for evaluating spoken language before the part of the candidate. McEldowney concludes
providing an example of the way in which these by showing how visuals can also be used to test

used in a basic level examination.

criteria are production skills, providing basic information for
Following Morrow's article is a com- candidates in a non-verbal form and thereby
prehensive account of cloze testing by avoiding situations in which prior knowledge of
Johnson, who questions several assumptions the subject is tested.
commonly held about cloze procedure. Johnson Godman deals with problems of assessing
draws our attention to the intellectual, cultural performance on academic subjects examined in the
and linguistic biases which may militate against medium of English, providing specific examples of
attempts to measure a candidate's understanding the types of difficulties encountered by examiners
of a particular text. He then proceeds to show how assessing science examination scripts written in
it is possible to identify much more precisely what English. In his article, Godman proposes three
each item in a cloze test is measuring, arguing that schemes for the scoring of such answers on a
a random selection of items is not desirable. He subjective scale, taking into account lexis, syntax,
adds that such a selection has in any case been morphology and semantic content.
largely responsible for the development of tests Pilliner next examines important aspects of the
which are and which do not dis-
far too difficult evaluation of language programmes, reminding us
criminate amongst
sufficiently candidates, that student achievement is only one element in
especially as far as actual comprehension of the the process of evaluation. In showing that the fun-
text is concerned. An article by Nation then deals damental purpose of evaluation is to produce
with advanced reading tests and begins by exam- information in order to make decisions about an
ining multiple-choice items and reference-word educational programme, Pilliner cites the work
items as techniques for assessing performance on carried out by Robert Stake, describing in detail
advanced reading tests. These and other test items the way in which his model for evaluation can be
described in the article direct attention towards applied.
the structural features of a reading text and to In a concise article with highly practical appli-
analytical Heaton draws attention to
strategies. cations, Chaplen relates his experience of measuring
the more communicative aspects of writing, first student achievement in ESP programmes at Kuwait
examining types of controlled composition before University. He shows how it is possible to convert
discussing briefly the classification of errors raw scores by giving appropriate weighting to
according to global/communicative errors and take account of the different components in a
rerace
Pref.

programme, and how a student's final grade can be mance in all the various tests studied, Oiler states
calculatedand decisions taken when several that there is no basis to conclude that a single test
teachers are involved in the process of assessment. such as a dictation or a cloze test is the best way
It is only fitting that this collection of articles to measure that general factor. He concludes that
should end with some brief comments by Oiler, methods concentrating on
a multiplicity of testing
whose research has contributed so much to the the kinds of language tasks which language users
development of language testing over the past will be expected to perform makes the best lan-
decade. Although maintaining that there appears guage test in any given set of circumstances.
to be a large general factor of language perfor-

ACKNOWLEDGEMENTS
We wish to express our gratitude to the following for allowing the inclusion of copyright material.

SEAMEO Regional Language Centre for permission to reprint, R. Keith Johnson's article Questioning
Some Assumptions About Cloze Testing, which originally appeared in Directions in Language Testing,
edited by John AS Read, Singapore University Press 1981 TESOL for permission to reprint
;

Dr. John W. Oiler's research notes A Comment on Specific Variance Versus Global Variance in Certain
EFL Tests, appearing in TESOL Quarterly 14, 1980; the Joint Matriculation Board for permission to
use two essay-type questions taken from their Test in English (Overseas), March 1968 and July 1968.

HIH
Brendan J Cairo

Language Testing

s tnere
th anotner
th. way ?
THE COMMUNICATIVE APPROACH repeated by observers of classroom practice in
Over the last three years, I have been taking part in many subjects and in many countries.
an evaluation consultancy Middle East
for a A second area of comment by the evaluation
teaching programme in which new communicative team which bears on our problem is the important
teaching materials are being devised to replace the question of correctness of children's language per-
structural materials in use there for many years. It formance. The team mentions the tension between
is interesting to look back on an early report of fluency of language use and accuracy of language
the evaluation team which commented on the pro- usage, recommending a more sensitive and tolerant
gramme as follows: attitude to student error. Language accuracy and
'The communicative approach stands or fallsby language fluency, it is maintained, should rightly
the degree of real life, or at least life-like, com- have varying priority at various points in the
munication that is achieved in the foreign lan- teaching/learning sequence and the final aim should
guage in the classroom situation. In our obser- be that the learners perform both accurately and
vations of classes in progress, was not often
it fluently to certain agreed levels.
that pupils communicated naturally and unself- In the programme we are discussing, there was
consciously with one another. Perhaps this is an obstacle to achieving these enlightened aims of
not surprising and should not be expected at greater pupil activity and a balanced attitude to
such an early stage in the project. However, it is correctness in that the tests being used to measure
worth stating that pupil-initiated communi- the children's progress tended to be traditional
cation should be one of the project's key aims, ones such as those focussing on the accurate
and the necessary provision should be made in mastery of lexical and structural items by indiv-
the materials and teaching methodology that it idual pupils. We thus had the position in which the
comes about We feel that there is still a children were being taught, as far as was possible,
tendency for teachers to talk too much and, by one approach — a 'communicative' one — and
correspondingly, for the pupils to talk too little being tested in terms of another approach — a
and that major aim of the teacher training
a 'structural' one. In truth, there were at the time no
programme should be to correct this imbalance. suitable, properly constructed tests for the pro-
We noted that the teachers whose classes were gramme, and this case is just one example of the
visited dominated the flow of communication dilemma we face at present. To test the accuracy
and allowed too little communication by and of a learner's knowledge of lexical and grammatical
between pupils.' patterns is a very different matter from testing the
These words are no doubt true and are often degree to which he has acquired the language so
Brendan J Cairo
that he can use it in the communicative settings he So far so good. But the testing problem has not
is likely to face. The effectiveness of a person as a yet been completely solved. For one thing, it is

communicator will depend on awide range of lan- often very difficult to decide if a sentence is correct
guage and non-language skills, test and an effective or incorrect without knowing the context in which
will have to specify and assess them — a much it was said. There are, for instance, certain com-
more complex task than the assessment of his munities in which
/ be and / were are accepted

mastery of lexical and grammatical items, which forms. For another, an utterance can be quite
can be much more easily pinned down and flatly incorrect from any formal point of view, and
counted. yet be perfectly intelligible to the ordinary listener.
It was at one time widely believed that a per- The French speaker who says 'I have been in Lon-
son's language competence could be adequately don since three days' is certainly incorrect in his
tapped by requiring him to respond to a string of English usage, but there will be little disagreement

separate multiple-choice items, sometimes as many among about the length of time he has
his listeners

as 200 in one sitting. Later, it was hoped that been in London. Much will depend on the amount
the assessment needs could be met by presenting of tolerance we are prepared to extend to such a
testees with the task of filling in randomly-selected speaker. Finally, the suitability of any utterance
spaces in a written or spoken text. We will examine will be closely related to the relationship between

more fully later the value of such techniques for the speakers: whether it is very close and informal,
testing language, all we will say now is: would that or distant and formal, and so on; and what would
measuring communicative performance were so be quite proper in one circumstance could be
easily done! To approach the problem method- quite offensive in another.
ically, I would like now to examine major prob- Who, then, was our sample sentence spoken by,
lems raised by the current structural-objective and who was the listener? Who are the mysterious
approach to language testing under five headings. 'she's', 'he's' and 'John's' of these pseudo-
utterances? Where are they? When was 'yesterday'?
1. The counting of bits. The plain fact is that, from the point of view of
2. The four-skill model. the tester, it just doesn't matter; these are not real
3. The place of correctness. utterances at all, but just vehicles for providing
4. The role of purpose. lexico-grammatical traps for the unwary.
5. Justification by correlation.
All this is just to say that an independent, de-
contextualised sentence is a very tenuous basis for
Later on, I will put forward five considerations for
making accurate judgements about a person's
broadening the approach to testing.
mastery of a language. It may well be that such
snippets will allow us to examine certain limited
1. The counting of bits
details of language performance, but I believe
If language performance is to be described by that any adequate test must consciously encompass
means of numbers, it would be most helpful if
in its design wider strategies and purposes of lan-
such performance could be broken down into guage use. Adding up the separate bits of perfor-
small, discrete parts which could be easily judged
mance, however many, cannot tell us the whole
as to correctness or incorrectness. Then the bits story.
could be put together in a test to provide a numer-
ical score, such as 35 out of 50. The tasks are 2. The four-skill model
unambiguous, the marking introduces no element One readily intelligible model of language des-
of capriciousness and a person's final score is clear
cription, and one widely used for many years, is
for all to see. Here is an example of such a test
that of the four skills of listening, speaking, reading
item:
and writing, certainly observable aspects of lin-

Yesterday I (A. be B. were C. was) very tired. guistic performance. It is tempting, therefore, to
(Instruction — choose the correct option in the specify our test content in terms of these skills —

brackets.) productive/receptive and oral/graphic. We could,

then, devise separate tests of listening, speaking,
Clearly, option C is a good candidate for choice reading and writing, and thus map a person's lan-
according to standard English usage. A test made guage competencies. The trouble is that language
up of a chain of such items could be accurately is interactive, so that there is interplay between

and objectively marked, possibly by mechanical speaking and listening, between reading and
scanning methods. writing, and so on. The total communicative
2
Brendan J Carrol
situation cannot be adequately described by these tain agreed levels of performance, and within
separate categories. Furthermore, any test tasks agreed levels of tolerance. It is one thing to differ

must elicit some responses. Even an objective about ultimate aims regarding accuracy, it is quite
listening test required the testee to record some another to differ about emphases given to accuracy
response — verbal or graphic; a speaking test must and fluency whilst pupils are still struggling to-
be a response to some verbal or written instructions. wards those ends.
To resort to 'objective' techniques of ticking What features other than formal accuracy might
alternatives must trivialise the interaction, debasing one consider in building up assessment criteria?
one side of what is an essentially double-sided It seems to me that we can work at three levels,

process. macro-scopic, meta-scopic and micro-scopic along

The bases of four-skill assessment — usually a spectrum from the broad effectiveness of the

spelt out in terms of pronunciation, spelling, message in given settings, through the strategies
vocabulary and grammar — have also been deficient used in achieving this level of effect, to the lin-
insofar, as they rely on features of formal usage guistic minutiae of spelling and pronunciation
rather than on effective use, and thus can be faulted through which the strategies are realised. This
•

on the grounds of the criticism we have already three-level hierarchy, arbitrary though it may be,
made of discrete-item, objective-type tests. can provide a framework for a comprehensive
assessment of performance. Below are listed a
3. The place of correctness number of factors subsumed under each of the
We mentioned at the beginning of the chapter the three levels.
matter of the correctness of children's language In one of our recently-framed writing assessment
performance. There are few more emotive educ- scales, a number of performance levels were spelt
one camp are the purists for
ational topics. In the out in terms of this hierarchy. The macro-scopic
whom any mistake — in spelling, grammar, pro- features, referred to as Message, included clarity of
nunciation, punctuation — is regarded as a per- presentation, coverage of points, validity of con-
sonal affront. To them the learning process boils clusion and the overall 'flow' of the work. The
down accompanied by
to the rooting out of errors, meta-scopic features, referred to as Text, included
indignant letters to The Times-. 'What is our educ- format and lay-out, coherence of theme, use of
ation system (or our country) coming to?' they cohesive devices, appropriacy of style and neat-

Macro-scopic Meta-scopic Micro-scopic

purpose strategies spelling
message interaction pronunciation
setting text grammar
effectiveness appropriacy vocabulary
affect style correctness
fluency
extra linguistic size of text intra-sentential
performance range of skills
pragmatics flexibility
cohesion
coherence
(Communicative value) inter-sentential (Signification)

cry. In the other camp are the permissive ones who ness of appearance. And the micro-scopic features,
have little and who see any attempt
time for rules, referred to as Language, concerned the control of
to insist on their observance to be an assault on grammar, suitability of vocabulary, accuracy of
the liberty of the individual and his right to free spelling and intelligibility of handwriting.
expression. Merely to describe these two extreme Few would deny that such features as those
positions in this way is to imply that somewhere listed are very pertinent to the measuring of lan-
between them lies a sensible attitude to correct- guage performance. To confine our assessment
ness; the ultimate aim is to produce students who criteria to formal correctness would subtract greatly
can perform both accurately and fluently to cer- from the breadth and depth of our assessment. To
Brendan J Carrol
sum up on accuracy, then, we can say that it is an good at another (say, Essay Writing), and many
important criterion, but by itself cannot form the correlations fall in the area from +0.4 to +0.9 or

basis for an adequate judgement of performance

— thereabouts.
it is necessary but insufficient as a criterion. such correlation coefficients, we may
Using
make statement about language abilities going
a
4. The role of purpose
something like this: 'In our sample of testees,
believe that any measurement of language com-
I
scores on Test A (Writing) correlate highly with
petence which lacks a detailed and systematic
scores on Test is therefore no
B (Reading). There
specification of the purposes for which, and the
need to have a special test of Writing because per-
contexts in which, the language is to be used by
formance on the Reading test will tell us what we
the person concerned is highly suspect. At the
want to know about the person's language com-
best, ignorance of such purposes and contexts will
petence and, moreover, the test of Reading can be
risk a waste of time and resources; at the worst, it
marked much more objectively'. This argument is
will pervert the very object of our testing. And yet
also extended to matrices of correlation coeff-
there are those who, believing that 'language is
icients which, when factor analysed, produce
one', think it can be adequately tested by a single
factor patterns of the abilities being studied.
set of procedures and with a common content
To me, this argument is highly suspect, not
regardless of the specific aspirations of the indiv-
only because of its practical, educational dangers,
idual testee; it is immaterial to them whether he
but also on theoretical grounds. When I was a
requires to use the language to practise Medicine,
teacher, I knew that Sarah Williams, aged 10, was
to teach Architecture, to fly Concorde or to sell
likely to be at the top of the class, or near it, in
Jelly Babies.
Arithmetic, Geography, History and English. Had
But there must be some purpose behind a
we introduced classes in Industrial Relations or
person's choice of the language to be learnt. If he
Atomic Physics, I have little doubt that Sarah
is learning English, he paying his money?
why is
would have been at or near the top of the class in
Why is he devoting considerable time and energy those subjects as well. By the same token, John
to the task? Why has he chosen English and not
Bennett would, in every test (except swimming),
Telegu or Spanish? There must be some reasons
have consistently trailed behind the rest of the
for his actions; these reasonsmust be specifiable
class. To conclude from such observations that it
in some terms, however broad; and his purpose
was thus necessary only to set one or two tests and
can thus be taken into account in the test con-
even to teach only one or two subjects, such as
tent and procedures. Even if one were to accept
Reading and General Intelligence, in order to
the monolithic, unitary theory of language, there
foster and measure the children's ability would, I
would still be massive practical reasons for devising
am convinced, be the height of educational naivety.
context-sensitive testing, and surely no educationist
If one were to cut out most courses in Medicine
would hope to earn extra points for deliberately
on such 'correlation' grounds, we would finish up
ignoring discoverable factors which must be highly
with a singularly dangerous generation of doctors.
significant for his pupils.
It may well be that for rough, snap decisions we

5. Justification by correlation can reduce our testing to the lowest common

In this section we will, unfortunately, have to turn denominator of elements but, if we want our tests
to a discussion of statistical issues — not usually a to make accurate and sensitive decisions about our
popular topic, but certainly at the centre of any learners, we can ill afford to rely on this type of
discussion of measurement. Even to those not correlational thinking. We must decide a priori on
interested in statistics, the following section could educational grounds what our test must contain,
be useful. and use statistics merely as an ancillary process to
The most commonly quoted statistic in dis- check whether the tests are doing what we wished
cussing the analysis of abilities is the correlation them to do.
coefficient. A high degree of positive correlation It is not always appreciated that correlations
between traits is shown by a coefficient of +1.0, a depend on co-occurences of variance in the res-

high degree of negative correlation is shown by a ponses of the sample being tested. If there is little
coefficientof —1.0, and absence of correlation variance in one or more traits, then their inter-
would be indicated by a co-efficient in the region correlations are most likely to be small. Further-
of 0.0. In practice, however, human traits and more, even if a correlation is sizeable, it is extra-
abilities tend to be related positively, that is, a ordinarily difficult to trace the precise reason for
person good at one task (say, Maths) tends to be its being so. It is often found, for example, that it

4
Brendan J Carrol
is similarities in testing method between the two their contribution to make in exploring the nature
tests rather than any relationship between the two of linguistic-communicative transactions. By all

traits which lie behind the correlation. Even with means, let us use them whilst recognising their
the use of factor analysis techniques, there are limitations, but let us not give up the more radical
many ways in which a factor pattern can be ex- probing of the and linguistic
social, psychological
plained. It is only recently that sophisticated entities involved in inter-personal communication.
methods of analysing factor patterns have been There is a place for descriptive and observational
used in language studies (see Palmer, A. and studies as well as quantitative, experimental ones.
Bachman, L., 1981). If these techniques are not We have now discussed five problem areas
used, unless every precaution is taken to identify associated with traditional language testing and
sources of trait and method variance and unless reach the following conclusions:
discriminant as well as convergent features are a What we need to do is to look at the whole
taken into account, it is very easy to leap to field of communication in devising our tests
unjustified conclusions from our correlationally and not restrict our task to the counting up of
handled data. easily-devised and easily-assessed bits of language
One particularly disconcerting assertion is one performance.
which says that such and such a correlation is 'a b We need to give systematic consideration to
good one'. But how can we tell what is a good, or purpose and strategy in our task design and not
high, correlation and what is a poor, or low, one? just rest with the simplistic framework of four
Is, for example, a correlation of 0.60 high, medium de-contextualised skills.

or low? And what about 0.85, or 0.52? There are c Our important assessment feature of accuracy,
three basic ways of answering these questions: or 'correctness', must be put in the context of
One-, to ascertain whether the coefficient is a range ofbroad and context-specific criteria.
statistically significant and unlikely to be due to d We a direct study of the dynamics of
need
chance. A coefficient of 0.60 with 100 subjects linguistic-communicative behaviour and should
would be highly significant in that the odds against not just rely on the circumstantial and post hoc
it having occured by chance are better than 100 to evidence of correlational statistics.
1. Clearly, then, there is almost certainly some- The question now is — if a new emphasis is
thing behind a correlation of this kind. needed — what should it be, and will it produce
Two : to calculate the percentage of shared variance workable tests? It is all very well to list the alleged
between the two variables. This is done by squaring defects of a current method of setting about things;
the correlation, so that the 0.60 correlation it is also expected that the critic outline an alter-

indicates that 36% of the variance is shared. By native approach and show that it has the potential
this criterion, the correlation is looking decidedly of producing better results. Is .there another way?
shaky as we have now left unexplained no less
than 64% of the variance.
Three-, to estimate the forecasting efficiency indi-
THE COMMUNICATIVE ALTERNATIVE IN
cated by the correlation. Using Kelley's coefficient
TESTING
of alienation, we find that a correlation of 0.60
The main aim of the new approach must be to
20%; that is,
has a forecasting efficiency of only
widen the basis of our tests from a narrow
we are only 20% better off than we would have
grammatico/statistical focus towards a broader,
been if guided by pure chance. (Indeed, even a
multi-disciplinary and multi-level approach which
0.90 correlation is only 56% better off than pure
can yet maintain essential features of measurement,
chance.) Our 0.60 correlation is now looking most
always remembering that language testing is much
decidedly chancy.
too important to be left to grammarians and
All this goes to show that it would be wise to
statisticians! The way we achieve this broadening,
treat apparently 'high' correlations with the great-
or opening up, is outlined below under five head-
est caution. A correlation is useful for comparative
—
ings to help provide answers in the problem
exploratory studies, but too flimsy a basis for
areas already discussed:
absolute value assessments. And, if the individual
correlations themselves are open to question, so 6. The test-curriculum relationship
much more so are the elaborate factor analyses 7. A purposive test framework
based on them. This is not to say that producing 8. The test content and procedures
correlational statistics and analyses is necessarily 9. Levels of performance
a fruitless operation; correlational approaches have 10. Methods of data analysis
Brendan J Carrol
6. The test-curriculum relationship justified on correlational grounds, such tests are
The close relationship of tests with other elements only a littlemore credible than the separate-item
of the curriculum has already been touched on. test described above. Who, for example, would
One way of ensuring their relevance to each other write a book, report or letter omitting every 7th
is to see that they both come from a common word, expecting the omissions to be remedied by
source, such as a specification of linguistic- the unfortunate reader?
communicative needs; thus the programmes of the If we are keen to see that our tests, in their

curriculum aim to meet those needs, and the tests content and tasks, are meaningful in themselves
aim to indicate how far the needs are being satis- and can be seen to reflect features of the settings in
fied. In this way, there is no 'general proficiency' which the testee will operate, we will, I fear, have
test, as this title would imply lack of specification to look further than these barren, eduationally-
of the language skills actually needed. Nor would destructive techniques.
there be a separate 'achievement test' element,
such as when a test springs exclusively from a 7. A purposive test framework
syllabus, because the spelt-out needs are there for To meet and to ensure a benign
learners' needs
direct use in both syllabus and test design. relationship between the
and other elements
test
Without some spelling out of language- of the curriculum, a justifiable approach is to
communication needs, and of related student make a clear and detailed statement of the purposes
aspirations, it is all too easy to base the language and settings of language use, and of the skills and
programmes on existing tests or examinations, functions to be called on, and from this statement
which are usually described in the vaguest of to generate the language, content and tasks which
terms, so that students spend their time either on a comprehensive test will have to encompass. All
past examination papers or on a course book these design operations must be carried out with-
designed to help students to pass the examination out losing sight of the specified purposes for learn-
with the minimum of effort. A recently published ing and using the language.
book of tests, for example, contains strings of The framework for approaching our test design
items (disguised here for security reasons) such as: may be adapted from such models as those of the
Council of Europe (see Van Ek, 1975) and J.Munby
(a) Her mother died when she was young, so she
(see Munby, J., 1978), to name but two. One of
was by her aunt.
the problems of these models is that they are
(4 options presented, the accepted one being
extraordinarily detailed and lengthy, and it is
'brought up')
difficult to fit even a fraction of the specified
(b) The lion fell into the set for it.
elements into a test of anything like an acceptable
(c) When he the age of 65, he will retire.
length. Thus, a decision has to be made at an early
(d) I stood in a for half an hour to get my
stage in test design as to how accurate and how
theatre tickets.
delicate, or detailed, the instrument must be. If we
(e) Roses are the flowers in our garden.
required a quick decision for rating students in
(f) You could tell from his big feet that he
broad categories for programme placement pur-
his father, (etc.)
poses, if any misplacements caused by the tests
The hapless examinee, had he not been conditioned could be remedied easily and if the applicants had
to such a might well ask who these
sequence, a wide range of purposes for learning the language,
mysterious and 'she's' are, and how he
'he's', T's' then a fairly short general test of competence
can plausibly be expected to consider in one could well fill the bill. If, however, the decisions to
breath a dead mother, a retiring employee, an un- be made are very refined ones, if the chances of
fortunate lion, a slow-moving ticket queue, a rose remedying them are small, and if the applicants'
garden and a boy with big feet. Apart from the language needs are very job-specific, then a detailed
inherent absurdity of this juxtaposition of topics, needs specification, careful test development and
the testee has to struggle through masses of options an allocation of time and resources for test appli-
which are either inappropriate or quite incorrect cation will be called for.
in the context of the sentence. It will, therefore, be for the organisers of the
Nor is the suspension of disbelief much less in testing operation to decide what scale of delicacy,
demand with those so-called integrative tests what tolerance of mistakes and what resources
which ask the testee to insert suitable fillers in shall be devoted to that operation. The object is
texts from which words have been eliminated at to devise tests which will give the necessary answers
random. Although much in vogue at present, and more economically.
6
Brendan J Carrol
To assist in making a reasonably exhaustive test 39 Distinguishing the main idea from support-
needs analysis, we have prepared working papers ing by differentiating: the whole
details

under the following headings which we illustrate from its fact from opinion, and a
parts,

by excerpts from an actual specification for a test proposition from its argument, (and so on
in English for Computer Programmers. for this and the other events).

*The skills believed to be critical to each event are

specified along the lines above, in this case in terms
of the Munby model (as described in Munby, J.L.,
JOB: Computer programming (English for occu-
1978 and Carroll, B.J., 1980). The numbers are
pational purposes)
those in the Munby taxonomy of language skills.

1. Main Events, with their Activities

I , Attending classes in principal subjects 3. Language functions
— listening to lectures To appreciate the functions:
— observing demonstrations certainty, probability, conjecture, intention,
— taking notes for further study obligation, evaluation; to inform, report, agree,
— questioning and discussing lectures/ endorse, prove, assume, ratify, conclude,
demonstrations generalise, demonstrate, explain, classify, define,
exemplify.
II Making field visits to computer centres
(Taken from the categories of function as des-
— understanding functions of computer
cribed in the two references above.)
centre
— touring various divisions
— observing sample running programmes 4. Topic areas
— questioning programmers in centre As in computer programmer training syllabuses:
— reporting conclusions of visits
— special English for computers
— discussing results in groups
— introduction to computer hardware
— advanced Mathematics
III Studying at home and in library — systems analysis
— carrying out reading assignment -COBOL
— doing written exercises — management
— writing reports and critiques — human relations.
— preparing for classwork and examinations
Text books and instruction manuals will contain
IV Carrying out practicalcomputer assignments the details of the above topics and provide approp-
— selecting appropriate guidance infor- riate samples of the language likely to be needed.
mation These topic areas may be general for the pro-
— testing and amending each phase gramme or allocated to particular events if appro-
— general run of programme priate.

Note: the events are the main focusses of work

5. Medium
during the course, the activities are com-
The events are now categorised according to the
ponent parts of the events. Further speci-
medium involved, i.e. listening, speaking, reading,
fication is done in terms of each event.
writing and mixed media.

2. Language skills (for Event I)

50 1 and 2*. Transcoding information in
-
6. Target performance level
diagrammatic display involving conversion On a 1-9 Band system, the level of performance

of diagrams/tables/graphs/ into speech or for each medium is specified. In this case, a mini-

writing. mum target level of Band 7 in all media was estab-

lished (i.e. 'Good user' on a continuum from
48-1, 2 and 3. Maintaining discourse; how to 'Expert user' to 'Non-user' — see below).
respond, to continue and adapt as a result
of feedback.
7. Channel
40 Extracting salient points to summarise: the Selected from the following channels:
whole text, a specific idea or topic, the Face-to-face, phone, radio, TV, film, tape, groups,
underlying point of the text. public address, print, telex, print-out.
7
Brendan J Cairo
8. Other parameters Restricted-response items, which allow a response
— socio-cultural: (roles and relationships) to be composed by the testee, but on very restricted
learner — instructor grounds. Probably the answer will consist of one
insider — insider or two words or, at the most, of ashort sentence.
native speaker — non-native speaker
The most effective test instrument will contain a
professional — professional
good balance of the above three types of item,
— attitudinal tones allowing the authenticity of the open-ended tasks
assenting — dissenting to be supported by the objectivity (in scoring) of
cautious — incautious the closed-ended items, bearing in mind the serious
formal — informal limitations of such 'objective' items discussed in
friendly — unfriendly earlier sections.

inducive — dissuasive The computer programming test derived from

respectful —disrespectful the above specifications included tests of:
patient — impatient Listening Comprehension, using lecture texts with
certain — uncertain multiple-choice items.
Group discussion, with individual roles provided
— dialect
by guide cards; performances to be assessed by
Understand British English dialects includ-
observers according to a graded scale.
ing RP or near RP accents.
Study skills, in which a range of information was
Produce standard English dialect of own
provided both in ordinary English and in computer
area with appropriate regional accent (but
language (with explanations), using multiple-choice
generally intelligible to colleagues).
items.
Report Writing, responding to written and oral
By compiling the above compendium of com- presentations of information; rated by assessors
municative needs, the tester will have gained con- according to a graded scale, with examples.
siderable insights into the demands of the computer We have found that, by these test design and
programmers' course, its content, the various development techniques, we can gain insight into
be undertaken and the language skills
activities to the language needs of specialist groups of testees
and functions which will have to be mastered to and give ourselves an opportunity of selecting an
the appropriate target level. authentic basis for devising test tasks and items.

8. The test content and procedures 9. Levels of performance

The events and activities already specified will pro- For centuries, examiners have made use of scales,
vide the basis for the balance of tasks contained in or grades, of performance describing successive
the actual test, and the topic areas should point to levels of behaviour in the particular areas being
the main semantic fields to be covered. Similarly, examined. We have, for example, examinations in
the other parameters of the eight-fold specification Pianoforte which range from Beginners to Advanced
will give guidance about skills, functions, tones, Level specifying as many as 12 levels of piano
dialect and so on. Of course, it will not be necess- piece and the kinds of performance an examiner is

ary, or possible, to provide in any one test a com- to expect when awarding grades. Although basically
plete coverage of all the specified features but, subjective, thisexamining procedure is surprisingly
having selected the main events to be replicated in reliable and has a good backwash effect on the
the test, we will be able to include key features piano-playing of the pupils. To attempt to improve
implied by these events. reliability by breaking up a piece of music into
Test items themselves will fall broadly into small countable bits would, of course, lead to all
three main categories: sorts of musical absurdity, as the unit of perfor-
Open-ended items for which the testee is allowed a mance is the piano piece, which must be judged as
fair measure of latitude in carrying out the task, an artistic whole.
his performance being assessed on a graded scale In language assessment, probably the most
probably with accompanying actual samples of widely-used scale has been that of the American
different levels of performance. Foreign Service Institute (FSI) oral interview,
Closed-ended items, where the testee selects from which has proved to result in reliable, consistent
a given set of responses the one he considers the judgements whilst retaining some of the naturalness
most appropriate, from a 'Yes-No' dichotomy to of oral interaction. In the same tradition, we have
anything up to five possible options. devised scales of all sorts of communicative
8
Brendan J Cairo
activity described at several levels according to the deciles or centiles. Thus the most accurate infor-
main parameters of level, activity and descriptor. mation obtained is about where in the population
The levels range from 1 to 9. The activities are, in any individual lies — in the top 10%, or at the 50%
effect, important communication focusses as des- point, and so on. What are so often lacking in
cribed in our category of event/activity above, and such approaches are detailed descriptions of test
can be based on the 'four skills' categorisation if content and of the exact nature of a 'good' and a
appropriate. The descriptor consists of a label 'bad' performance. Briefly, the basis of perfor-
(e.g. 'non-writer' through to 'expert writer') mance assessment has been the relative perfor-
followed by a brief thumbnail sketch of the critical mance of individuals in a sample rather than on
features of a typical performance at that level, links with pre-determined behavioural criteria. We
which can be elaborated in terms of up to a dozen thus have the constant danger of vague and circular
performance criteria for detailed design purposes. reasoning behind our testing.
The scales are usually accompanied by photostats It would be wrong, however, to decry these
or tape recordings of performances judged to be 'norm-referenced' techniques because, as methods
representative of each level. For objective, scored of exploration, they can give us many initial

tests, conversion tables are provided to convert insights into behavioural problems. For instance,
raw scores into levels on the 1 to 9 band system. some years ago it began to be suspected that
Initial figures for reliability and validity for smoking and lung cancer were related.
cigarette
such tests compare very favourably with those for What doctors wanted to discover was the actual
current widely-used language tests and, more causes behind the observed correlations of cancer
important, authenticity and testee motivation are and smoking. So there came a time when corre-
much more in evidence. lational, circumstantial evidence had to be rein-
forced by more precise, and direct, techniques
10. Methods of data analysis for describing, observing and measuring the related
A perennial problem in measuring human behaviour phenomena. In short, it is crucial that we move as
is how to describe the target behaviour unequiv- soon as possible to a more authoritative test basis
ocally, and to assess accurately how far an indi- and do not remain shackled in the permanent
vidual reaches, or falls short of, that level. The relativity of norm-referencing.
demands are for valid, systematic description, for
accurate observation of responses, and for reliable CASE STUDIES IN COMMUNICATIVE
measurement and data analysis procedures. In the TESTING
main, language-communication tests have suffered There are now interesting developments in the
on all three counts, description having too often direction of communicative testing in various parts
been vague or based on limited, linguistic, features of the world, and I will mention some which I
of performance, observation methods being such have been connected with. Others are no doubt
as to destroy the very communication processes being reported on as I write.
under scrutiny, and data handling being focussed
The Royal Society of Arts examinations in the
on counting the bits with analysis by correlational
communicative use of English. (RSA London,
means. Thus the measuring devices have tended to
1980).
demean the very processes they were trying to
The English Language Testing Service set up by
measure causing the results, however precise in
the British Council and the University of Cambridge
appearance, to be of very little significance.
Local Examinations Syndicate, (descriptive Hand-
In the absence of direct measurement based on
book in preparation, 1981).
carefully spelt out, relevant criterion behaviour,
The 'Crescent' English Course and the Mid-East
much reliance has been placed on sampling and
projects associated with it. (O'Neill, T. and Snow,
probability theory, the scores obtained from
P., Oxford University Press, 1979 onwards).
selected samples of performers being arranged in
British Council Course in Testing, testing for
order.
communicative programmes, Course 040 (report
Crucial importance is given to measures of cen-
in preparation by the British Council, 1981).
traltendency (mean, median, mode) and dispersion
The Pergamon English Tests in both general and
(standard deviation, range, quartile and mean
specific areas of English, (in production by Per-
deviations). Norms of performance are established
gamon Press, Oxford starting in 1981).
on internal criteria, a 'good' score being expressed
in terms of standard deviations above the mean, a It is suggested that the above reports be referred
poor by standard deviations below the mean; or by to and, when early versions of the tests are released

9
Brendan J Carrol
from security, the actual tests be studied. It would Carroll, B.J. (1980), Communicative tests for communi-
who would claim to have resolved cative programmes. Paper given at TESOL Convention,
be a rash person
Detroit, 1981.
the incompatibilities between the terms 'test' and
Fishman, J. and Cooper, R.L. (1978), 'The socio-linguistic
'communication', but at least we can claim to foundations of language testing'. In Spolsky, B. (Ed.),
have made genuine efforts to produce tests which Advances in language testing series No. 2, Washington
contain identifiable communicative features. D.C. Center for Applied Linguistics.
Morrow, K.E. (1977), Techniques of evaluation for a
notional syllabus, Centre for Applied Language Studies,
University of Reading. Study commissioned by the
Pertinent references Royal Society of Arts.
Canale, M. and Swain, M. (1980), 'Theoretical bases of Munby, J.L. (1978), Communicative Syllabus Design,
communication approaches to second language teaching Cambridge University Press.
and testing', Applied Linguistics 1.1. Palmer, A.S. and Bachman, L.F. (1980), Basic concerns in
Carroll, B.J. (1978). An English language testing service: Test validation. Paper for RELC Seminar of April
specifications, The London.
British Council, 1980 (to appear in proceedings). More recent develop-
Carroll, B.J. (1980), Testing Communicative Performance, ments of this project were reported in TESOL, '81,
Pergamon Press, Oxford. Detroit.

10
Alan Davies

Criteria for evaluation of tests of English

as a Foreign Language

In this paper I want to make some general com- and more optimistically, the test can be used as a
ments about test choice and test construction, and check on the guesswork. I do, of course, admit
then to discuss six examples of (British) published that in the sense in which I have used guesswork,
tests referring particularly to use in an English as a language proficiency tests are also guesswork; but
Foreign Language testing situation. it does make sense, I suggest, for new courses,
I plan to begin by making a series of very large syllabuses, etc. to be piloted — which they are —
but, I hope, gnomic generalisations. Even if nothing and tested — which they are only rarely. Testing
else in this paper sticks, I hope that one or two of and teaching have the same interests if not the
these statements will. same purposes.

1. Most linguistics is normative.

3. Language testing levels are intuitive
2. Language teaching samples by guesswork. I am thinking here of the ease with which exam-
3. Language testing levels are intuitive. iners allot test papers to categories (High, Mid,
4. Criterion and norm referenced tests are the
Low, etc) and then, commonly, convert into some
same thing. numerical score. It is part of the professional
5. Language teaching operations need language
judgement of the teacher to grade thus. Even when
level systems.
we provide the apparatus of an itemised test which
6. Language level systems need external vali-
distributes students widely, it is still necessary to
dation.
decide the meaning of a particular score or band.
Let me examine each of these briefly. Sometimes this is done internally (or historically):
such and such a score will allow 50, 60, 70, 80?/
1. Most linguistics is normative
to pass (or fail) and that's what is wanted or how
The grammar, the observations of language
rules of
it's always done. Or it may be possible (it is a pity
acquisition and discourse analysis, the selections
it isn't done more often) to make use of expectancy
for language teaching and testing, all present
tables and an external criterion of some kind, and
language as if it were unvaried and thus they are all —
show the effectiveness of a particular cut off
a kind of necessary idealisation: they all set a limit
this is using a norm referenced test for criterion
on what the language is (and by definition isn't).
referenced purposes.
Typically, all such descriptions, all such norms,

hold up internally, but, as we shall see, what is 4. Criterion and norm referenced tests are the

urgently needed is some measure of external same thing

validity. Here, paradoxically, it is easier for the Again an overstatement. But what I mean is what I
more applied parts of linguistics in language have just explained. Intuitively examiners do work
teaching and language testing to provide such to a Pass/Fail (etc.) criterion, i.e. we always do
buttressing for the hypotheses of descriptive work in a criterion referenced way. Imposing a
linguistics. series of equal interval scores (1-10, 1-100, etc.)
just extends the curveand distributes types of Fail
2. Language teaching samples by guesswork
and Pass. So from this point of view a norm refer-
This is even more of an exaggeration, but it is likely
enced test is a use of a criterion referenced test
that, until we know far more about second lan-
just as we have noted the reverse. They do the
guage acquisition, the construction of a syllabus
same job, in the one case emphasising the distri-
and a textbook will be more art than science. It is
bution, in the other the cut off.
not surprising that so many language teaching
materialsshould resemble one another. What is 5. Language teaching operations need language
noteworthy for language testing is that first, the level systems
achievement test cannot improve on the syllabus: What criterion referenced test uses may have, but
the test simply reports, it doesn't teach; second, what language level systems lack, is a criterion or

11
Alan Davies
series of criteria outside the system. Of course a course of textbooks, or a series of volumes on a

language teaching operations do need internal logic similar theme.

so that it is possible to relate each level to the ones

Validity 2 is the acceptance of a new test as the
below and above. The recent new life in foreign
equivalent of an existing measure. There are two
language teaching in the UK has come precisely
sides to this — the first the matter of content vali-
from the inspiration of the Council of Europe unit dity, i.e. the 'experts' must be satisfied when they
credit courses leading to the graded levels of
look at the test that it consists of the right bits,
achievement schemes (which are rather like music —
units, combinations and so on in other words,
examinations, as we shall see later). Educational
that it is a test of what it says it is. The second
systems characteristically operate in terms of aspect is the reputation of whose test it is: and as
educational Language teaching operations
levels.
in other things, the more prestigious that insti-
need this ladder security just as much.
tution, the more quickly will the new test gain its
equivalences. This must be the case — otherwise
6. Language levelsystems need external validation you or I would write a new test (series of tests)
It is striking that most of the EFL exams I will be perfect in terms of content validity, but we don't,
discussing set themselves up as systems of language
because it won't gain acceptance. Similarly, it
levels, through grades, etc. This is — as I have just
would be possible for a prestigious institution (like
suggested — excellent. But what is lacking is the
produce any old test and
a successful author) to
validation have called for. In spite of the emphasis
I
claim it valid. And if
is the Cambridge
the producer
I have given to guesswork, intuition and so on, I
Syndicate or whatever, it would work, but not for
urge that language tests can provide some necessary long because on the whole we rely with good
objective indication of the validity of a system of
reason on the integrity of our prestigious insti-
language levels — and should attempt to do
tutions (PI) — and we recognise, as they do, that
so.

The first question to ask in language testing is they would be found out. Such institutions have
always for what? There are degrees of query, from everything in their favour — the reputation, the
the most lay (I want to know how much English contents, the money (and staff) for research and
they know) through the more refined (How good development and also the acumen to be at the
is their spoken English? Have they mastered that forefront of new ideas (although on the whole
communicative language syllabus?) to the precise testing is fairly conservative). It is probably the
(Does she/he have adequate English to work as a case that test development is much more a matter
secretary in a travel agency? What are the main of administration than of ideas. If you or I have
errors that need remedying in this unit of the those ideas, we'd be much better off selling them
syllabus?). Note that the precision is more apparent expensively to one of the prestigious institutions I

than real, in that the content of a test is necessarily have mentioned so often. This makes it really very
based on the earlier sampling of the syllabus or difficult for an organisation which is only partly
other universe of discourse. The test specifications (or locally) prestigious to see its way forward.
need to be as precise as possible in language con- Without government authority (as in state
full
tent validity, and they must attempt some kind of education systems, which tend to be very national-
external validation. Thereafter,
the question is istic) such an attempt has no chance of success

properly circular, and the test exists as its own unless it can cleverly gain some sort of tie up or
criterion, viz. Do they have enough English for sub-contract with one of the major, internationally
this test? known prestigious institutions. The West African
I want to propose that we use three kinds of Examinations Council did this, as have others,
validity in our evaluation of EFL tests. They are as with the Cambridge Syndicate; and it is interesting
follows: to see a more recent tie up between the Oxford
Examinations in EFL (Oxford Delegacy of Local
Validity 1 is concerned with the internal logic of
Examinations) and ARELS (of which more later).
the system, making use of levels or grades, each
However good an examination may be produced,
one dependent on and interlocking with the others,
it will just not gain equivalences, acceptance, credi-
such that the use of one element implies all the
bility unless it has such backing. There may be one
others (as in, for example, a printed circuit). It is
exception to this, and that is by making use of
no doubt the case that there is more at stake than
Validity 3 — though I am sceptical even here.
harmony or use in setting up a system of levels;
there is also the political, publicity, financial Validity 3 — the examples of EFL tests I will men-
aspect in the sense of capturing an audience, as in tion are all examinations rather than tests, and
12
Alan Davies
while these terms tend to be used quite often the what we look for is some evidence that justifies
» one for the other, it is in Validity 3 that they the cut off, the test and the levels of which they
differ. Validity 3 is the appeal to some external are part.
criterion, i.e. the establishing of validity statistically So what we look for in our tests of EFL are
through concurrent or predictive validity. I am these three validities. (Incidentally we also look
unaware of this being done — normally — for EFL for indications of reliability, and I can say now
examinations; it is, however, done as a matter of that no indication is given — the word is not men-
course for EFL tests. There are various reasons — tioned.)
examinations tend to be less clearly directed at a So back to a 'typical' EFL situation and against
given population; they change more frequently; such a situation I want to consider five tests of

they are less objective (in terms of items); and EFL, viz.:

they are more public altogether, in the sense that

there is attempt to control their use, and less

less 1. RSA Examinations in the Communicative Use
concern generally with their security. The kinds of of English as a Foreign Language.
criterion that may be appealed to (which would 2. Cambridge Examinations in English as a Foreign
provide exactly the external assessment of the self Language.
justifying system I have referred to, and would 3. Trinity College, London: Examinations in
provide exactly the support needed for promotion Spoken English as a Foreign or Second Lan-
of a new test from a non-prestigious institution) guage, and Written English (Intermediate).
are (concurrent) already existing tests or measures. 4. Oxford Examinations in English as a Foreign
I am sure it would not be too difficult for a PI like Language.
the Cambridge Syndicate to provide this kind of 5. ARELS Oral Examinations.
evidence, but I am unaware that it has done so. 6. PALSO Examinations.
(The annual statistical survey of Cambridge Exam-
inations in English for 1978 certainly does not do 1. RSA Examinations in the Communicative Use
anything like this.) Cambridge could use its older of English as a Foreign Language
formats and other Pis could use Cambridge. Of Characteristics of the Examination (quotations
course, it will be said, why bother to match (or from RSA information)
correlate) a new test against an existing one? New 'The examinations are designed for students wishing
tests are new because (e.g. the present Cambridge to operate independently in Britain. They may be
FCE or the new RSA exams) they are trying to be living, working, or studying here, or intending to
different. Correlation with existing formats — visit the country on a long or short-term basis. The
which have been rejected — is exactly what is not language and skills tested are thus clearly and
needed. Yes and no; although it will not be very explicitly those of 'in-Britain English' the
satisfactory, the new test and the old one both test audience envisaged is adult (16+).'
English. But since this is a problem, then the other 'The exam is offered at three levels — Basic,
criterion type, established by (usually though not Intermediate and Advanced it is difficult to
always) predictive validity, can be employed. Here match the levels to an external yardstick. Instead
we require a criterion which is accepted by we see the levels as being defined internally.'
'experts' and which is regarded as an operational '
Candidates will be able to choose dif-
measure of the required English. Teachers' judge- ferent combinations of levels and areas to meet
ments, observation of language use by judges, their own requirements or abilities (therefore three
performance on various defined scales, even school levels X four skills).'

and college exams, these can all be made to work. 'The tests are operational in nature, i.e. they are
Although it is time consuming to do the work, it is intended to measure whether or not the candidates
essential because, if new tests are needed, then can do certain things in English. The 'things' they
presumably they are doing something better than are asked to do are specified at each level and
the old test — and what they are doing better is represent authentic tasks of the sort which con-
predicting the criterion that one has selected. The front language users in real life.'

alternative is to fall back on Validity 2, and say I 'Authenticity is thus of major importance. It

know it's better and I'm an expert. This is certainly affects not only the tasks which the candidates are
respectable, but I want to maintain that it is not asked to perform ('Is this the sort of task which a
enough. It is, perhaps, a little better to provide real person in real life might want to do?) but also

verbal descriptions of performance at various cut- the type of texts which the candidates use in the
off scores (e.g. students can ) but once again Reading and Listening tests and produce in the
13
Alan Davies
Writing and Oral tests ('Is this a real English text the First Certificate in English (FCE) containing
being used for a relevant purpose?').' Reading Comprehension, Composition, Use of
English, Listening Comprehension, and Interview;
Content Areas What are the general areas in

which candidates taking this exam the Certificate of Proficiency (CPE) in English

will want to use English?'

containing the same sections; and the Diploma of
English Studies comprising Language and Appre-
Levels What will candidates want to do ciation, Literature, Life and Institutions. We are
in each general area? How well do told that the CPE alone bears the official approval
they need to do these things? of the Department of Education and Science and
What sort of texts will this involve is recognised by the University of Cambridge as
the candidate in handling?' part of the examination requirements for matri-
culation.
This is a new examination, the 'result of a con-
scious efforton the part of the RSA Examinations Description (quotations from the Cambridge infor-
Board to develop new testing procedures to match mation booklet)
recent developments in the communicative teaching 'Provision for the foreign learner began specifically
of foreign languages'. at the teacher rather than the pupil level, with the
In terms of our three validities, it has Validity 1, introduction of the Certificate of Proficiency in
with its three levels of Basic, Intermediate and 1913. The modest examination then offered
Advanced, not defined, as is pointed out, by an test in phonetics, translation tests in French and
'external yardstick', but 'defined internally': the German, and its Literature paper (now) a
interlocking becomes clear when we look at the range of examinations providing a highly
Reading tests at Intermediate and Advanced levels, structured assessment of essential communication
and of the 30 questions at Intermediate level, 15 skills.'

(at least) are repeated exactly at the Advanced

The new syllabus introduced in 1975 'has been
level.
received with general approval making the
much so: it has the backing of
Validity 2, very
best use of machine scoring and computer facili-
its own PI (the RSA) and the rubber stamp of con-
tiesand at the same time maintaining those features
tent validity from its own 'experts'. The desire to
which have characterised the Cambridge exam-
'match recent developments in the teaching of
inations over the years: an accurate, valid and
foreign languages' seems to me very proper and
comprehensive test; an examination standard in
may be one way of evaluating ideas about com-
form, marking and general procedure wherever
municative language teaching.
taken; an administrative procedure which allows
But there is — so far — no Validity 3, though
for a reasonably short time between entry exam-
we are told the Exam Board intend 'to monitor
ination and issue of results, yet gives each candi-
closely the validity of the tests'. I hope this will
date individual attention where needed'.
come and it is, of course, too early to expect it

yet since this is a new examination. However, we 'Although only Proficiency and the higher
are not expected to see the examination as relevant examination for the Diploma of English Studies
to the typical EFL situation since, at the beginning have any official recognition or equivalence with
we are told — 'the language and skills tested are the GCE or foreign counterparts, the main demand
clearly and explicity those in in-Britain is for the lower level examination and in its own
English'. This is fine if you intend to visit Britain right rather than as a preparatory stage for Pro-
in the near future, but I am less sure of using that ficiency as it was designed in 1939. The attraction
as the criterion. It is interesting to observe the ten- seems to be its simplified yet adult character and
dency in recent years to make life in Britain, inter- its attempt at an international flavour and freedom
action among native-speakers
least with or at from over-literary emphasis.'
them, the focus of attention in language teaching
materials, again perhaps one of the unexpected '
Proficiency (is) recognised as a
(and not altogether welcome) results of the com- quite demanding test of general literacy and
municative movement. maturity.'

2. Cambridge Examinations in English as a '

has state recognition as an English
Foreign Language qualification for teachers in a number of countries,
The most PI and still the one best known. The notably in Southern Europe and South America,
Cambridge examinations are offered at three levels: and it is in these areas that demand is biggest.'

14
Alan Davies
How do the Cambridge examinations measure 'Particularly in the lower grades the syllabus is

up to our Validation? It looks, in spite of dis- formed chiefly on a sequence of basic English
claimers, as though Validity 1 is there: we are told structures and usages. Because they are basic they
that there is no connection between FCE and CPE are important to communication by means of
and that very few people go in for the Diploma, English.'
but the system exists, and although FCE is not As we might expect from a College of Music
officially required for CPE, its parameters will there is very clever use of the internal system of
certainly be used as informal measures when interlocking grades — 12 in all. 'The grades are
assessing a potential CPE entrant. (It is interesting regarded as milestones along the road towards an
that Cambridge think too many people enter for advanced oral command of the language'.
CPE who shouldn't. Perhaps an intermediate level So 'yes' to Validity 1, the most valid (in this
between FCE and CPE is needed, and a lower Pre- sense) of all our EFL examinations.
liminary examination in English is now under dis- As to Validity 2, Trinity College is less of a PI,
cussion.) but still well-known, and so it scores reasonably
What of Validity 2? The PI backing is certainly there. Does the test have content validity? It

there. What of content validity? I am less sure claims that it reflects the theory and spirit of
here. Little indication is given, and we need to modern foreign language teaching, and the syllabus
look at past examination papers, or trust the Cam- information for each grade is the most detailed of
bridge judgement. But I am not too happy about all five examinations. The problem Iwonder about
being told that this is and com-
'an accurate, valid is whether the syllabus is really a spoken English
prehensive test' without being told why and how. syllabus or a written one. 'The principal aim', we
Validity 3 — lacking again. As I have said before, are told, 'is to find out how well the student under-
I think there is less excuse for Cambridge precisely stands educated spoken English within the limits
because they are the leaders and the most PI. Per- of each Grade, and how well he or she can speak
haps they do validate in this way, and I should be it'. Now there is a dilemma here. For native
grateful to be pointed to those statistics. speakers, there are substantial differences between
the spoken and the written language: what the
3. Trinity College, London. Grade Examinations communicative language teaching mission has
in Spoken English as a Foreign or Second Language attempted is to extend this difference to EFL
and Written English speakers. I am not sure (and I suspect others are
The following quotations have been extracted not) that this is a proper or feasible aim for TEFL;
from the Trinity College information booklet: it may be that we should accept for EFL spoken
'The examinations are recognised by the British English a limited goal which is written English
Council as comprising a useful series of graded spoken aloud. This is what (I think) Trinity College
tests in oral communications ability.' is doing, and I wish it would say so.

'This syllabus has been compiled to meet Again there is no evidence of Validity 3.
the needs of children and adults learning English as
a Foreign or Second language. It reflects the theory 4. Oxford Examinations in EFL (the Oxford
and spirit of modern foreign language teaching.' Delegacy of Local Examinations) (quotations from
Oxford Delegacy information sheet)
'The principal aim is to find out how well the
This is another new examination.
student understands spoken English
'educated'
principally con-
'The examination is
within the limits of each Grade, and how well he
cerned with assessing performance in a very practi-
or she can speak it. Importance is attached to the
cal way by using test items from among the reading
candidate's pronunciation, readiness, fluency, and
and writing tasks candidates might be expected to
comprehension, and to the appropriateness and The
have to perform whilst in England (sic.)
grammatical accuracy of the English used, but
level of the exam is below that of the Cambridge
above all to the ease with which the candidate can
FCE.'
communicate by means of English.'
'There are two papers:
'Progress is made by small steps from a very 1. practical writing skills
elementary level of achievement (Grade 1) to a 2. practical reading skills

very advanced one (Grade 12) the grades 3. ARELS preliminary Oral Examination
are best regarded as milestones along the road (optional)
towards an advanced oral command of the lan- Validity 1 can be claimed only by the attempts
guage.' to equate the test to other measures, e.g. 'the level

15
Alan Davies
of the examinationis below that of the Cambridge 6. The PALSO Examinations (Panhellenic Asso-
and again 'the Delegates' English
First Certificate' ciation of Foreign Language School Owners)
Committee is happy to recommend (the Prelim- Last year the PALSO organisation carried out its

inary Oral Examination of the ARELS Exam- own trial examinations at three levels, Basic,

inations Trust) as a counterpart to its own written Standard and Higher, so there was the attempt to
examination. achieve Validity 1. The other validites were more
Otherwise there is no internal system. difficult: for reasons I discussed earlier, PALSO is

Validity 2: the PI backing is there. As for con- not a PI — examination to gain equi-
if it wishes its

tent validity, again there specimen items

is (in the valence widely; and the content validity of the test
shown) an attempt to on the communi-
capitalise needed earlier expert advice — though that is
cative movement — 'test items from among understandable in a trial examination. So far as I
the reading and writing tasks candidates might be know there has been no attempt to establish
expected to have to perform whilst in England'. Validity 3. The PALSO attempt showed what can
(Notice that, as with the RSA, we are right away be done with enthusiasm (not unlike the ARELS
from the 'typical' EFL situation.) enthusiasm inspired by Peter Fabian). But if my
Again, Validity 3 is not mentioned. argument is accepted, then enthusiasm is not
enough.
5. ARELS Oral Examination (ARELS Examin-
ation Trust) (the quotations below are from the Conclusion
ARELS information booklet) The question what for remains central. Is the test

There are three levels, Preliminary, Certificate and for life in Britain, for communicative interaction,
Diploma, 'designed specifically as a reliable means or is it meant more generally (with the implications
of assessing ability in the use and comprehension of perhaps a Trinity College-like syllabus)? And
of Spoken English'. does it provide the validities I have listed — ways
The ARELS examination has a lot going for it. of helping us to evaluate and choose a test for our
It has Validity 1 (three interlocking levels), and it own use?
has Validity 2: PI backing and content validity, I have not mentioned other examinations like
which is certainly approved of by teachers and is the Joint Matriculation Board, the new Associated
not limited to Britain or, as Oxford quaintly put Examining Board, the Stages of Attainment Scale
it, England: so it is available for the typical EFL from the English Language Teaching Development
situation. I must enter a caveat here though, which Unit, the English Speaking Board, the Regent's
is that the cultural requirements of some parts of School Test and the new English Language Testing
the ARELS examination make it quite difficult Service of the British Council (interesting, I
for someone who is not actually in daily contact suspect, because it represents a move away from
wtih native speakers in their own environment. the previous test to a more examination-like
Validity 3: if there is none, it is not for want of instrument, more comparable therefore to the
trying, since I did myself agree to look into the measures I have spoken of today).
of the ARELS examinations
statistical validation There is a paradox what for question. A
in the
and I hope to get round to this soon. In the mean- test must be demands — but it
related to local
time I offer my apologies and note that the ARELS must also have wider currency: the best test man-
Examinations Trust are aware (unlike the others ages to serve both ends, the local and the global or
we have discussed) of the need for Validity 3. international.

16
Nic Underhi

The great reliability validity trade-off:

problems in assessing the productive skills

The two principal criteria for evaluating any kind Compared with the characteristics listed above,
of test are reliability (whether it gives consistent (a) this is a discrete-item test — it aims to test

results) and validity (whether it measures what — only one component of language (vocabulary)
you think it does). — through only one skill (reading)
The main problem with tests of speaking and — and one aspect of that skill (receptive recog-
writing may be stated simply: high reliability and nition)
high are seemingly incompatible. The
validity An oral interview, on the other hand, requires
situation complicated by the existence of several
is the testee to listen (receptive) and speak (pro-
different kinds of validity, some theoretical and ductive), using many components of language —
intuitive and others empirical and quantifiable. As grammar, vocabulary, pronunciation, stress, all at
a result, what may be valid for one school of the level of discourse rather than the single word
thought may not be for another. or sentence.
If you are of the 'onward march of science' (b) item is objectively scored — the teacher/
this
frame of mind, you may be convinced that it's testerdoes not need to exercise his judgement in
just a matter of time before the ultimately reliable marking it; the decision has already been taken for
and valid productive test appears. If you believe him about the correct answer. In tests of productive
that real language use only occurs in creative com- skills, by contrast, he may have detailed guidelines
munication between two or more parties with to help him assess the testee's performance, but
genuine reasons for communicating, then you may ultimately he must use his judgement to decide on
accept that the trade-off between reliability and the value of a particular response. (It is worth
validity is unavoidable. Testing is an inherently noting that of the three stages common to language
artificial situation; the question is, how artificial
tests, viz.
can it be, and yet still be considered valid?
1. the compilation/construction of the ques-
This article outlines some of the attempts to
tions
resolve the reliability/validity trade-off, and con-
2. the answering of the questions by the testee
siders the influence of a third criterion, practical-
3. the marking/scoring of the testee's answer,
ity. Itexamines the chronological development of
only in 3. can the objective/subjective distinction
productive testing and the test types in vogue at
be maintained; all tests are 'subjectively' compiled
each stage. For the sake of brevity, examples of
and 'subjectively' answered) («).
test items are kept to a minimum — all the standard
works on testing contain numerous examples (t). (c) although any particular question must be tried
First, in the interests of successful communi- out in practice, this type of item is generally con-
cation, some definitions are in order. sidered highly reliable — from
one administration
WHAT MAKES A PRODUCTIVE SKILLS TEST to the next, without intervening tuition, the same
DIFFERENT? testee will answer the question in the same way

Considering for the moment only the oral inter- and the marker will mark in the same way.
view and written composition, the following In an oral interview or written composition,
characteristics distinguish tests of productive skills: there are three principal sources of unreliability:

(a) they are integrative tests 1. the testee may produce different answers to
(b) they on the whole, subjectively scored
are, the same task from one day to the next — he
(c) there are serious doubts about their reliability may be feeling uncommunicative, morose,
(d) they are, or can be, direct and pretty realistic uncomfortable, deaf, antagonistic to a part-
measures of performance in real-life situations; icular interviewer or composition topic, or
therefore just lacking in the confidence necessary to
(e) they have high face/content validity. produce connected self-expression.

So how do they contrast with other tests? 2. a single marker may score a particluar res-
Consider this item from a multiple-choice test: ponse differently from one day to the next
'The opposite of strong is 1. short 2. poor (for similar reasons!) (a problem of intra-
3. weak 4. good' marker reliability)
17
Nic Underbill

3. two or more markers may give different exercises, etc.); an indirect test requires no writing
scores to the same response (a problem of at all.

inter-marker reliability).
(e) because of its lack of realism, the discrete-
item, for the majority of students and teachers,
(d) the item is a thoroughly
multiple-choice
has low face validity — it is decontextualised, non-
unrealistic measure of language performance. It
creative and generally doesn't look like a valid test
does not reflect actual language use — there is no
of language ability. While criticisms can certainly
real-life situation in which we go around asking or
be made of specific productive tests in this respect,
answering multiple choice questions. Productive
— the option is open to the tester to make the testing
skills tests are not necessarily ultra-realistic few
situation as realistic as he has time and money to
adults have the inclination or the incentive to
spare. The more effort is expended in this direc-
produce compositions, and the atmos-
written
tion, the better the testee will respond by treating
phere an oral interview can be notoriously
in
as a valid task.
strained — but they should nonetheless be more
it

realistic than discrete-item tests.

The question of content validity is more com-
plex, and depends on the theoretical assumptions
One way of tackling the question of realism is
to use the terms direct, semi-direct and indirect.
of the tester; if you believe that language can

These can be defined for speaking as follows:

reasonably be broken down into discrete com-
ponents and skills, and that language learning can
'Direct speaking tests include any and all be tested in the same way, then you may well
procedures in which the examinee is asked to consider the discrete-item given above to have high
engage in a face-to-face communicative exchange content validity.
with one or more human interlocutors The important point to make about face/
do not require
Indirect are those tests which content validity is that it can be determined only
any active speech production on the examinee's by non-quantitive criteria, such as introspection
part and informed judgement; unlike predictive or con-
Semi-direct tests, although eliciting active current validity, it cannot be assessed in statistical
speech by the examinee, do so by means of terms. This non-quantifiability should not in any
tape recordings, printed test booklets, or other way be allowed to detract from its importance. Of
'non-human' elicitation procedures.' (Hi) all forms of validity, it most directly answers the

central question: does this test item measure what

While these labels cannot be maintained as rigor-
it's supposed to measure?
ously clear-cut categories — and this problem will
be discussed below — they form a useful classifi-

cation for comparing tests. 'AS YE TEACH, SO SHALL YE TEST':

The straightforward oral interview, such as that (i) COMPOSITION AS AN ESSENTIAL SKILL
used by the US Foreign Service Institute (iv), the Traditional methods of language teaching ('the
BETA Test and many schools as an
(v), informal grammar method') emphasized the
translation
placement procedure, is a direct test; the use of importance of written rather than spoken language.
a conventional written cloze test to assess speaking Learning a language involved learning a set of
abilitywould be indirect; and tests such as the grammatical rules which were applied to the
Cambridge FCE and Proficiency oral interviews, analysis of certain texts. These texts, which were
the Ilyin Oral Interview (vi), Upshur's Oral Com- usually of a literary nature, were supposed to
munication Test (vii), and many others would be embody those aspects of 'good' writing which the
semi-direct, as they elicit speech by means of learner was to imitate, or at least show awareness
visual, recorded or printed stimuli. Because the of, in his own written production. The ability to
nature of the language is thus controlled, such write English clearly and effectively was seen as an
semi-direct tests are easier to score than direct essential skill, and written composition was con-
tests and show higher reliability, and hence their sidered the most valid test of this skill. (No clear
popularity with examinations boards. distinction was drawn between teaching and infor-
In production terms, one could call
written mal testing; dictation, translation and composition
direct a composition where only subject, audience were simultaneously teaching strategies and testing
and purpose are specified; a semi-direct test would exercises.)
be any task that requires writing, but elicits it by was realised that serious doubts could be
It

means of taped, visual or written cues (e.g. guided raised about the reliability of composition as a
or picture composition, gap-filling, re-ordering testing device from the point of view of the testee

18
Nic Underhil
the task and the marker. Few would argue that Such marking schemes are in wide use today; they
their performance on a creative writing task will are difficult to construct and often hard to inter-
vary more from day to day than their performance pret, but many assessors feel happier with some
on a multiple-choice test; or that a particular com- sort of written protocol for assessing both oral and
position subject will suit some people better than written work. In 1949, the battle lines were already
others, irrespective of their writing ability. From drawn up:
the examiner's point of view, a partial solution is
'Among teachers of English, a constant battle
to offer a choice of composition topics; but this
iswaged between supporters of analytic
compounds the problem of the comparability of
marking, and those who believe wholeheartedly
student responses.
in general impression it should be noted
In 1930s and 40s, when TEFL scarcely
the
that the analytic schemes were born out of a
existed as an independent profession and took its
realisation of the general unreliability of essay
methodology from first- and foreign-language
marking, and some schemes have gone to extra-
teaching in schools, examiners of English school-
ordinary lengths to achieve 'objectivity' and
children agonised for many years over how to
thus consistency', (viii)
reduce these sources of unreliability, and their
conclusions laid the foundations of the composition Already an association, if not an equation, was
marking schemes in use today. being made between reliability and objectivity.
Although neither method was proven in experi-
ments to be clearly superior, the analytic method
1. Standardisation of marking schemes
tended to be more popular, partly because of the
Two principal marking schemes were used: the
appearance of greater objectivity. To counter this,
method of general impression and the analytic
was recommended that three or four markers,
method. The first is exactly as it sounds — the
it

whose scores are averaged out, should be used to

marker reads the composition and awards it a
improve the reliability of the impression method.
mark on a single scale, without picking out any
special features for consideration. The analytic
method uses a pre-determined marking scheme 2. Standardisation of markers

listing a number of sub-scales and specifying the As a corollary of standardising the marking
weight to be given to each in totting up the overall method, the principle was established that regular
score. For example, the FSI Oral Interview scales meetings should be held to brief assessors on the
are Accent, Grammar, Vocabulary, Fluency and marking scheme, and by 'test-marking' sample
Comprehension, weighted approximately in the compositions and comparing results, to standardise
ratio 1:9:6:3:6 respectively. the way all the markers applied that scheme.
Within the analytic method, there are again
marks within each
different systems used to assign 3. Increase the number of questions
category: (i) Impression; (ii) Additive (giving one (This is a well-established way of improving the
mark for each of a number of pre-selected feat- reliability of any kind of test, within the limits of
ures); (iii) Subtractive ('from a sub-total of 10 for fatigue and endurance.) For composition, it was
mechanics, subtract h mark
x
for each error of suggested that two or three shorter pieces give a
spelling or punctuation'); and (iv) Marking Pro- more reliable indication of the testee's performance
tocol,which is a pre-defined scale of levels, e.g. for than one long one.
Vocabulary:
4. 'Task-realism'
10 points — no errors The importance of task-realism was another notion
8 points — occasional misuse, but expression to be discussed long before it became fashionable
hardly impaired in the field of TEFL. Hartog said:
6 points — fairly frequent misuse, which may
'In real life, a person does not just 'write'. He
limit full expression
writes for a given audience and with a given
4 points — limited vocab. and frequent errors
object in view, which may be to explain, to
clearly hinder expression
persuade, to give an order or indeed
2 points — vocab. so limited and so frequently
fulfil any other purpose or combination of
misused that reader must often rely
purposes.' (ix)
on own interpretation
points — limitations so extreme as to make He carried out an investigation to compare perfor-
comprehension impossible. mance on 'Directed' versus 'Undirected' essay
19

HH
Nic Underh
subjects; for example, one of the Directed essay syllabus, and which is an open invitation to the
subjects for English schoolchildren was: testee to commit all kinds of errors. The interview
fared slightly better by virtue of being oral; but as
'Describe a school speech day at which you
unstructured elicitation procedures, niether was
have been present as if you were writing to a
acceptable to audio-lingualism.
boy or girl who has been prevented by illness
Both could be made more acceptable by asking
from being present.'
a pre-determined series of questions designed to
compared with the Undirected essay topic 'A elicit one-sentence answers containing specific
School Speech Day'. (Note that, for the purposes structures or functions. This technique of struc-
of this paper, I am not drawing a distinction tured interview and guided composition is in wide-
between 'essay' and 'composition', as some authors spread use today and has the advantage of eliciting
have done), (x). Hartog concluded: comparable speech samples from each testee while
being productive; but in the process, of course,
still
'The majority of examiners were decidedly of
the exercise becomes progressively less life-like,
the opinion that a Directed essay subject
ending up with a string of unconnected, decon-
yielded an essay of better quality than the
textualised stimulus-response questions.
corresponding Undirected essay subject, and
Another big disadvantage in audio-lingual terms
that it could be marked with greater
is that the unstructured interview and unguided
confidence.' (xi)
composition are not amenable to the objective-test
(ii) COMPOSITION AS THE PERPETRATION format, such as multiple choice, where only one
OF INJUSTICE correct answer is possible. This was another incen-
With the advent of the audio-lingual methodology tive to restrict the creativity involved in production
in the fifties and sixties, there were a number of tests; written gap-filling or sentence transfor-
important changes in the teaching and testing of mation, and simple oral question-and-answer
the productive skills. Principally: could be marked objectively and thus reliably.
There was certainly an awareness of what was
1. Speech was considered primary; the aural/
being lost in the process:
oral skills became the main objective of
language teaching. '
often we have to choose between more
2. Language was learnt by habit formation, apparent validity but less objectivity and more
mainly through repetitive oral practice. Written objectivity but less apparent validity' (xii)
work, of a rigorously controlled nature, was
but reliability was, and stillis, considered to be
permitted only after the patterns had been
logically prior to validity. Following the assump-
properly established. This was because of 3
tion of the discrete nature of language skills,
below.
numerous were devised to assess particular
tests
3. Making mistakes set up wrong patterns and
elements of language via oral production — pro-
could lead to the formation of wrong habits.
nunciation, grammar, vocabulary, intonation,
As far as possible, materials were constructed
stress,etc. — Valette (1977) gives many good
to reduce the chances of error, by careful
examples of this genre. Because these questions
sequencing of structures and adequate prac-
were composed of unrelated questions with a
tice of each structure before moving on to the
single correct answer, e.g.
next.
4. Not only was language proficiency composed 'The man who flies an aeroplane is a
'

of a number of discrete skills, but tests could

they fitted the desired objective test format, tested
and should be constructed to assess these
one item at a time, and left little room for serious
skills and their sub-components separately. (In
error.
fact, genuinely discrete-item tests were a rarely
The solution to the reliability/validity trade-off
achieved ideal.)
being offered here was part of the audio-lingual
5. Little emphasis was given in the classroom to
package — if you accepted the discrete-item philo-
meaning, and hence to the role of genuine
sophy on which it was based, then the content
communication.
validity of such tests could be very high, due to
The unguided composition, in these terms, is all the close correspondence possible between the
wrong; meaningful written task, employing
it is a items on the syllabus and the items in the test.
numerous inseparable language elements and skills This argument is fine for a test of achievement
which cannot be constrained to fit a structural of a specific syllabus, but not for a proficiency

20
Nic Underh
test. In order to maintain the same content vali- Inevitably, there was a resurrection of the argu-
dity, it had to be shown that the items tested were ments about the best way to score such tests;
in some sense representative of the testee's overall Rivers, along with many others, was in the analytic
proficiency; a lot of hot air was generated trying camp:
to establish adequate sampling techniques before
'an overall intuitive grade for written compos-
it was realised that this task was a linguistic as well
ition can be seriously influenced by neatness
as a statistical impossibility.
and clear writing. The grade should be a
Written composition became an outcast — one
composite one (xvi) '

author saw its only value as avoiding an undesir-

able 'washback' effect: Heaton was in the overall impression camp:

'Tests of composition are necessarily unreliable 'It is impossible to obtain any high degree of
and of doubtful validity. Since, however, it is by dispensing with the subjective
reliability
important that composition should be taught, element and attempting to score on an
and since if not examined, it may not be 'objective' basis.' (xvii)
taught, it should be included in a language
During the seventies, the realisation^ spread that
examination.' (xiii)
language could not be divorced from its contexts-
Others were more extreme in their rejection of of-use for purposes of teaching and testing. New
composition: tests of productive skills were consciously task-
oriented, aiming for the best possible face/content
'
attention should be drawn to the con-
validity; for example, letter writing became a
sensus that injustices are prepetrated every
popular choice for written composition, and situ-
time an essay is set at an examination
ational role-plays for oral interviews.
it is widely recognised by linguists that an essay
Experiments to find ways of improving the
isnot an adequate test of knowledge of the
reliability of the more direct and realistic tests
language. If the student is cunning, he will
came up with the same results as before — increase
avoid constructions he is not sure of and create
the number of markers, hold standardisation
situations in which he can use his pet
sessions, use an analytic scheme — and the reli-
phrases.' (xiv)
ability claimed for some productive tests is high
This, surely, is the essence of communicative by any standards. However, such tests are expensive
ability: to make the best use of those language and time-consuming to administer and mark. This
elements and structures which one does command criterion of practicality is especially important if
and to avoid those which are likely to be com- you want to construct a test battery to be given
municatively ineffective. Some people do this to large numbers of testees all over the world and
better than others; they are better communicators. then marked as economically as possible.The
See the cunning student twist and turn to avoid search was on to find a valid and reliable method
the third conditional! of testing productive skills accurately without the
practical disadvantages of highly realistic tests.
Two solutions were offered: semi-
possible

RETURN OF THE PRODIGAL

(iii) which give much greater control over
direct tests,
INTEGRATIVE TEST what the testee says or writes; and indirect tests,
As the communicative emptiness of the audio- where he doesn't actually say or write anything at
lingual approach was recognised — students being all.

able produce complex structures given the

to
(a) Semi-direct tests
correct stimulus, but unable to transfer this fluency
The distinction between direct and semi-direct
to real communicative situations — integrative pro-
tests, as defined above, seems at first glance intui-
ductive tests made a cautious return. There was
tively reasonable; a face-to-face conversation
now a place for both objective and subjective tests
without printed or recorded stimuli is more natural
in a proficiency battery:
than one in which one participant asks questions
'
the more subjective evaluation of the about a picture and the other answers them.
composition will complement the grade for Again, the reliability/validity trade-off can be
the more circumscribed items, just as the mark clearly seen; by asking questions about a picture
for the oral interview was shown to do in the or other stimulus, the tester is restricting the num-
area of oral production.' (xv) ber of possible correct answers, and taken to its

H
Nic Underbill

limit, there will only be one correct answer. There An enormous amount of research has taken
is an immediate gain for reliability; with only one place, mostly in America, into constructing and
possible answer, all the marker has to do is to using easily administered but indirect tests. For
decide whether that answer has in fact been given example, a lot of work has been done to promote
or not. At the same time, the testee is deprived of cloze tests, both conventional and in many vari-
the opportunity to be creative or to display his ations, as tests of global proficiency, including
communicative proficiency in a realistic manner. speaking and writing skills.

However, there are many real-life situations in The new Test of English for International
which we dq hold a natural conversation about a Communications (TOEIC) is entirely receptive; it
visual, recorded or written stimulus; and such a consists of two hundred multiple-choice items,
conversation in an oral interview can be a lot half reading comprehension and half listening com-
more realistic than a so-called direct speaking test prehension. But the Educational Testing Service
in which the interviewer is controlling and struc- felt able, by means of concurrent validity corre-
turing the conversation so as to elicit particular lations, to interpret the scores in terms of speaking
structures or functions. and writing ability:
Direct and semi-direct tests should be regarded
'The correlation between the TOEIC listening
as being on a continuum from the most realistic to
part score and the direct Language Proficiency
the and the position of a particular test on
least;
Interview is 0.83. This high degree of
this scale can only be determined by an intuitive
would seem to indicate that the
correlation
examination of the test itself. Consider the
TOEIC part score is a good predictor of the
following description of an oral test: '
candidates' abilities to speak English
'The examinee is presented with four pictures
and again
differing significantly on one or two conceptual
dimensions. These may represent, for example, 'The direct writing measures correlated 0.83
a person performing four different actions, or with the TOEIC reading part score. This high
the four conjunctive possibilities of a man with correlation suggests that the TOEIC reading
or without a hat walking up or down a score is a good indication of the examinee's
staircase. ability to write in English' (xix)
The examinee is instructed to provide a single
sentence description to a visually remote This use of concurrent validity studies to justify
the interpretation of indirect tests of productive
audience of one picture which is randomly
selected from the set.' (xviii)
skills has become common. What are the argu-
ments for and against this procedure?
The audience (i.e. the examiner) then decides
which picture he thinks is being described, and FOR
compares this with the instruction given to the (i) All tests are artificial situations. The testee is

examinee. It is genuinely productive, arguably under a strain to perform well, and the interviewer/
communicative and highly reliable. How valid marker is under pressure to make the best assess-

would you consider it? ment in the short time available. Neither is acting
naturally. In these terms, there is little to choose
(b) Indirect tests between direct and indirect tests.

The other solution to the trade-off is to stop

(ii) Where the aim is to measure achievement
worrying about face/content validity, take a step
rather than proficiency, indirect or semi-direct
backwards from realistic, life-like tests and take
tests give the tester more control over the language
refuge in concurrent validity. The argument goes generated and hence permit a more accurate deter-
like this: you can show that there is a high
if
mination of the testee's mastery or otherwise of
correlation between the same students' scores on
the specific syllabus contents.
a realistic oral interview and on an indirect but
more easily administered test, then you can safely (iii) They are more practical and usually more
say that the second test is measuring the same as reliable than direct measures.
the first, and can be used as a test of oral pro-
AGAINST
ficiency in its place. The relationship is entirely
(i) Intuitively, direct tests are better.
statistical; there is no place for intuition, except
in subsequent attempts to justify the relationship (ii) The value of the concurrent validity depends
theoretically. entirely on how good the criterion test is — if the
22
Nic Underhill

direct test used to validate the indirect measure for using direct tests of productive skills. Secondly,
is open to question terms of either its reliability
in the question of availability of resources (time,
or validity, then the concurrent validity correlation people and money) may exert a significant
is meaningless. influence on the choice of tests.
The last word has been reserved for a researcher
(iii) Although the nature of the correlation co-
who, apparently under pressure from all the con-
efficient itself forms the basis of all numerical
current validity studies giving the kiss of death to
reliability and validity calculations, the inter-
direct productive tests, felt it necessary to preface
pretation of correlations is far more complex (and
her findings on improving the reliability of oral
subjective!) than the calculations themselves.
interviews with the words:
Especially in the field of language testing, a per-
son's interpretation of a set of statistics may '
it has generally been recognised that the
depend entirely on the assumptions of his parti- best way to test for oral proficiency is to have a
cular theoretical viewpoint; the statistics haveno subject speak.' (xx)
inherent meaning other than as a purely math-
ematical relationship between two sets of numbers.

CONCLUSION
The only possible conclusion to be drawn on the
virtues and problems of different methods of (»') e.g. Lado (1961), Heaton (1975), Valette (1977),
assessing the productive skills is that what kind Oiler (1979).
(«) see Pilliner in Davies, A. (ed.) Language Testing
of test you use should be determined pragmatically
Symposium, Oxford University Press.
by the purpose for which the test is to be used, Clark, J.L.D. (1979) in Briere, E.J. & Hinofotis,
(*b)
the resources you have available for construction, F.B. (1979) Concepts in Language Testing:
administration and marking, and what you intuit- Some Recent Studies, TESOL.
ively feel will have the highest face/content validity (iv) U.S. Foreign Service Institute Oral Interview.

for testees and testers alike.

(v) Businessmens' English Test & Appraisal Inter-
national Language Centre, Japan (1977).
Prognostic/Predictive Tests, by definition, are (vi) Ilyin, D. (1976) Ilyin Oral Interview, Newbury
not concerned with present ability but with some House.
future criterion performance, which may itself (vii) Upshur, J. A. (no date) Oral Communication Test,
not be particularly Ann Arbor; University of Michigan.
realistic. Therefore, whatever
(viii) Wiseman, S. (1949) The Marking of English Com-
test most successfully predicts that performance positions in Grammar School Selection, B.J.
is the best test to use — and only experimentation Ed. Psych., p. 204.
will reveal that. (ix) Hartog, P. (1941) The Marking of English Essays,
Achievement MacMillan, p. 1.
tests are easier to set up and con-
(x) Heaton, J.B. (1975) Writing English Language
trol using indirect or semi-direct measures. Since
Tests, Longman, p. 127.
the syllabus itself is an artificial construct, there (xi) Hartog, op. cit., p. 17.
is no point going to great lengths to reduce the (xii) Lado (1961), p. 29.
artificiality of tests of attainment of that syllabus. (xiii) Grieve, quoted in Pilliner in Davies, op. cit.

On the other hand, any experienced interviewer

(xiv) Forrest, R. (1968) 'ELT versus the Examiners'
ELT,V. 22, No. 2, p. 121.
can tell countless tales of trying to elicit a parti-
(xv) Rivers (1968), p. 305.
cular structure from a testee who answers the (xvi) 257.
ibid., p.
question perfectly well while avoiding that struc- (xvii) Heaton, op. cit., p. 135.
ture completely. Does he get a point or doesn't (xviii) Clark in Briere & Hinofotis, op. cit., p. 36.
(xix) Woodford, P.E. (1980) The Test of English for
he?
International Communications paper delivered ,

Proficiency tests are concerned with the testee's

at ESU Conference, London, November 1980.
ability to carry on sensible and realistic communi- (xx) Mullen in Oiler, J.W. & Perkins, K. (1980)
cation, and this is where the argument is strongest Research in Language Testing, Newbury House.

23
Peter Fabian

Examinations-why tolerate their

paternalism?

An examination is a formal and unnatural ritual. aren't getting the people they need. It is more
In theory, it sets out to gather information about a tragic when the foreign employer, say, discovers
person and passes it on to interested parties. It is no inadequacies because he has to trust the exam-
to all intents and purposes a dialogue between ination implicitly.
Supplier and Consumer. The suppliers are those A community gets the examinations it deserves.
who, in response to a need, create examination If it fails to take an interest, to be vigilant about
structures and strategies: the Examining Boards what the examining boards are up to, it will soon
and their fellow-conspiritors (i.e. teachers, schools, find that someone else has taken over: a vacuum is

publishers, course planners, educational insti- invariably filled.

tutions),and the meek and passive victims of their Examinations —democratic institutions —
like
paternalisticpower (i.e. parents and student candi- do not thrive in When the consumer and
isolation.
dates). The consumers are those who need the the community at large surrender to academic
information: employers, selectors, Further Educ- technicians their right and duty to be involved,
ational establishments, governments — and again they also surrender the right to check on the
parents and student candidates, though for dif- teaching strategies that are the direct result. They
ferent reasons. cannot then turn round complain about
and
Examinations are commonly expected to reflect 'straightjackets', or pontificate about pressures,

attainment as well as potential in further skill unfair competition, the conduct of examinations
development. They thrive on faith. So long as the and their 'remoteness' to real life. Such are the
consumers believe that an examination is a mirror frustrated grumbles of children and they have
of something, it can withstand the grumbles and about as much impact. Whether we like it or not,
bloody-minded 'objections in principle' that examinations are here to stay. Their rejection for
abound these days. But when examinations cease the time being in revolutionary or reactionary
to be a communal act, in other words when some situations — times of social upheaval — does not
of those involved opt out, as they will and do, alter this a bit. They always come back again. No

then examinations begin to deteriorate, to become one has come up with a really workable alternative
distorted and eventually irrelevant except to those way of measuring attainment on which the selector
who get their living out of them, whether they be must depend to get round pegs into round holes.
examiners or schools. In our society, obsessed as it Parents, by and large, accept meekly, if not
is with qualifications and specialisation, even the exactly with reverence, the requirements of GCE.
worst examinations — and there are many — Perhaps they do not realise — or do not want to —
manage to survive long after they have ceased to how much their children have to sacrifice to the
be firmly rooted in the community's needs. This annual ritual, and what a huge chunk of a never-to-
happens because those who have resigned their be-repeated educational experience is affected,
active, creative and critical participation do unfor- even destroyed. There is a reluctance to cross
tunately continue to give their unquestioning con- swords with experts, though it is abundantly clear
fidence to an examination as a qualification; that experts all too often suffer from the con-
mostly they no longer know anything about it: strictions of their own expertise. But they do
how it is run,what it contains, how it is weighted, often have an awesome weapon: professional jargon
validated, what it is really for. With blind but dan- and impressive woolliness.
gerous faith they surrender their vital role to dis- Yet that jargon is often surprisingly imprecise,
tant and increasingly isolated 'experts'. Only when even in language teaching. Take the words 'exam-
they find themselves at the receiving end of inade- ining' and 'testing' which are so often used as if
quate applicants for a job, do they discover how they were synonymous and interchangeable. How
dubious the syllabus must have been that led to many people are aware that they are in fact very
such a qualification: then they complain that they different in kind and have different aims?

24
Peter Fabian

Tests are almost entirely diagnostic; they (schools are jealous of their independent role), the
belong in the classroom and are the teacher's most indifference of schools — of consumers generally —
constantly used tool. They help to close gaps, has isolated the boards more and more. Since they
clarify, prescribe individual treatment, and build are not perpetually exposed to the sort of criticism
confidence. You can act on tests. They do not which they bound to note or act on, that iso-
are
shape a syllabus but draw on existing techniques lation extends far beyond educational establish-
with the clear-cut purpose of edging and nudging ments so that finally the boards go one way while
the learning process along some controlled path. In the rest of us go another. Only when there is
examinations we have nothing so constructive wholesale disenchantment amongst thousands of
because they are final. Based on a controlled students does the whole problem come to light.
system of spot-checking, they are more superficial Many British schoolchildren do not feel motivated
and global. Examinations tell us if a student has to learn foreign languages because their courses
succeeded rather than why he has failed. It is often and examinations seem remote from their real
very difficult to get specific diagnostic information needs. The extremely costly exercise of foreign
from examinations. language teaching (despite gimmicks like language
laboratories and the sparse contingents of 'assis-
Who Improves What? tants') is largely a waste of money; and this is due
As we have learned from bitter experience, mostly to that friendly conspiracy between pro-
teachers, directors of studies and principals do not fessionals.
as a rule take an active part in engineering the
structure or contents of examinations. They are Unworkable Progress
content to be passive and to follow a leader. They Until 1965, the emphasis in language teaching was
may moan, of course, about what examinations on Writing and Reading, on definitive grammar
contain and how they interfere with what they even within the 'direct method', and on what in
want to do, but on the whole they accept what is the case of the native tongue might be called 'lan-
profered and are reluctant to interfere. If for some guage study and definition in retrospect'. It tended
reason — worthy or otherwise — an examination is to pull in the same direction as the treatment of
considered prestigious* impressively mounted, and the native tongue. The reason why we all put up
professionally validated, that seems good enough with this lop-sided prescription was that it was
for most of them. The American TOEFL test is easier and more economical to handle. If exam-
universally admired in the States, not for what it inations are best conducted in Writing and Reading,
does to teaching but for its technical and admini- then the preparatory work is bound to take on the
strative gloss. Cambridge examinations are gener- same slant. One supposes that if the driving test
ally supported because 'everybody wants them'. could be carried out by answering questions from
Much of the teaching syllabus takes its lead both an examination paper, driving schools would sell

from them and from their numerous textbook their cars, sack their instructors and revert to
satellites because it is convenient, not because the the classroom for driving instructions.
learning strategies they prescribe are realistic or The pretence that the major part of communi-
evenly balanced. cation was on paper, as it were, was bound to lead
The view
that a school is the servant of a string to the neglect of oral work. Apart from not being
of examinations is deplorable. A school
such required for examinations, teachers were often not
simply cannot opt out of its special responsibility too hot in speaking and understanding the language
for this link in the Learning-Teaching-Testing- they were teaching. The miserable appendages pur-
Examining chain. It owes that much to its students, porting to examine speaking and understanding in
whose faith lies in the entire teaching-testing EFL commanded little respect. They were — and
system. Nor can a school ignore the simple fact continue to be — tolerated because the school
that examinations influence to an unacceptable follows the leader, and the leader found it too
and unreasonable degree what is being taught on complicated and too expensive to make the
its premises and why. urgently needed improvements. The examining
The result of this easy-going relationship, this boards which enjoy the greatest respect and pres-
friendly conspiracy between examining boards and tige today know full-well that their oral 'interviews'
schools, is that in the end examining boards are are comparison with their written
negligible in
practically all-powerful. They invade the classroom sections. But instead of improving them, they
and saturate the staffroom without a murmur of simply down-grade their status in the examination.
protest. And for all the pretence to the contrary In the act, however, they also down-grade the
25
Peter Fabian

associated learning-teaching activities in schools. 'audial' springs from an urgent need and was intro-
Cambridge, for example, still awards only 25% of duced by the architects of the Arels Oral exam-
the total weighting to speaking and understanding; inations. Mentioning this small point may look
and that in a situation and at a time when oral pedantic but in fact it is symptomatic of an atti-
skills are most urgently sought after throughout tude.
the world.
Such examples from the past protrude into the A Lever — Not Just an Exam
present; they illustrate the power and influence Arels Oral examinations 1967 —
— introduced in

wielded by examining boards. The need, therefore, are an attempt to rock the foundations oforthodox
for upgrading oral/audial skills to the same level of academia-bound language learning. As examinations
thorough examining as written and reading skills are powerful, the attempt had to be made through

must be painfully obvious to anyone. But so long them. But it should not be thought that they were
as examining boards fail to give the lead here, all introduced merely to offer yet another examin-
attempts to infuse systematic training will remain ation. The role of the Arels Oral is still widely
sporadic, localised and superficial. misunderstood. Of course, the field was wide open
There are those today — and I am one of them — for reform oral testing had not yet progressed far,
;

who suggest that oral/audial skills are really the and any improvement was a purpose in itself. But
central and determining aspect of language acqui- it was wrong — and still is wrong — to say that

sition and that no one should be put in a position these are no more than specialised 'extras' which
of having to decide whether or not to 'study' a you can decide to use or ignore. Or that in the
language until they are orally and audially pro- context of English in Britain the student can pick
ficient up to a point. and choose because it really does not matter which
Fads and fashions like Programmed Learning examinations he takes; it has to be bluntly stated
or Skinnerian lab drilling have come and gone. that schools claiming to teach English in Britain
Since 1965, the classroom has accepted speaking should very seriously consider making these exam-
as the real goal: a spontaneous development and inations their central theme. Preparation for most
for once independent from examinations; but it other examinations can be had at home; the advan-
has remained unreinforced by the disciplines of a tages of taking them in this country are marginal.
recognisable end-objective. The effects of the shift But not so with the ARELS examinations. Practical
have thus been abortive to some extent because, and sustained exposure to the language, which is
after all, the pressures and traditions of orthodox then systematised and monitored in a school, can
examination-based language teaching proved too only be achieved in the country of the language
great. and in the context of a multi-national student
body.
The Fourth Skill: Aurality? The ACEs are a manifesto, a statement of
In all this hullabaloo, the fourth skill — listening practical objectives which are today indispensable.
comprehension — was the Cinderella. The spon- True, the examination makes additional demands
taneous Speaking Revolution has itself down-graded on a school. An oral/audial examination cannot be
it. Neither 'Threshold' nor 'Waystage' has given textbook-bound in the way written examinations
it much attention in depth. Few schools have done are. It can only give a lead; through its past exam-
much about installing such things as Listening ination scripts and tapes it can say: 'Here are
Libraries to counter-balance the Reading Library. samples of all the oral and audial skills we test,
While ad hoc syllabuses in speaking have been NOT because we find them easy or convenient to
devised, the entire problem, of listening com- test but because you cannot perform in practice
prehension has remained largely unfertilised, un- without them. Now you go and elaborate on these;
researched, and neglected. Such aspects of listening draw on your own daily experience and observ-
comprehension practice as audial drilling, ear con- ations in how we use the native tongue; use our
ditioning and aptitude testing, exposure to accents past papers as models and to remind you of what
and dialects, intuitive comprehension and guesswork must not be forgotten, and the syllabus will look
are still in their infancy. People do not talk about after itself.' Is that asking too much? Yet, any
them: linguisticians write each other the odd paper school claiming to take advantage of English in
which explains why there is still no universally Britain which fails to cover the Arels Oral range
accepted word to replace 'aural', which is in effect may actually be guilty of gently and innocently
indistinguishable phonemically from 'oral' and misleading its students. And that is putting it

therefore has to be mispronounced. The new word mildly.

26
Peter Fabian

An Oral Century to bring oral and audial work up to the lavish levels

We live in an essentially oral century. The telephone of the other two.

has made the entire world more accessible; corres- There is a theory that man's ability to practise a
pondence becoming too expensive and incon-
is multiplicity of language disciplines is infinite; that

venient when you can ring your business partner in we are of us capable of practising a dozen lan-
all

Sydney. International conferences are increasingly guages at least. This seems to be supported by the
conducted in English and that means a smooth millions of bi- and tri-linguists who acquired their
command of social English as well; if the asymetric skills by accident or circumstance, not because
language system adopted in the EEC, the only
is they were gifted or in love with language. They
languages for listening comprehension will probably speak these languages naturally in their multi-
be French and English. Negotiations, social con- lingual environment, and the only inhibition that
tacts are both drifting fast into English, while the exists comes from nationalistic pressures. It is

sophisticated hardware technology from tape likely, therefore, that the vast majority of language
recorders to video and internationalised TV is learners are not students so much as people
helping to shift the emphasis further away still naturally receptive to the right environmental
from writing and reading. A school which ignores treatment.
these trends is in danger of getting out of touch,
Democracy in Examinations
while its brochure may be promising by vague
Arguably the most democratic virtue of an exam-
implication what it is ill-equipped to do.
ination concept like the Arels Oral lies in the fact
It is not a matter of forcing examinations down
that it aims truly to reflect current usage: it has
students' throats: far from it. The Arels examin-
to do this. Written language remains static for
ations are a mere by-product designed to set off a
several decades. Oral language changes quickly.
new and systematic approach to the two neglected
English especially adapts constantly to the ever-
skills, just as the Cambridge examinations are
changing flow of influence across the English-
admirably designed to initiate the 'study' of
speaking world: not by decree of ossified gram-
English, which points them elegantly in the ESP
marians but by popular inclination. Usage precedes
direction.
acceptance. The pedant and the linguistician may
No more than a fraction of students coming to
deplore this as a concession to fashion or vulgarity.
Britain have the slightest wish or desire to 'study'
But, then, pedantry is always a little ridiculous in
English. Twenty years as Principal have taught me
oral language which springs from spontaneity and
this. It is a modern tragedy that they are so often
from the heart. Examinations must respond
encouraged to think that studying language is
swiftly to these changes and if they do, then lan-
synonymous with acquiring communicative skills.
guage will never be consigned to the museum.
Pendulum of Fashion But they will only adapt if everyone who is

Nor is it a question of a pendulum going this way involved in the process of language acquisition,
and that. A pendulum only exists where there is whatever it may be, participates inmaking exam-
uncertainty of purpose. To suggest as some do inations come close and stay close to the reality of
these days that 'we have gone too far in oral the day.
directions' is nonsense. In fact, we have not begun

H
W G Shephard
The Cambridge Examinations-an exercise
in public relations

Examinations stultify and constrict, emasculate these appear more humane and practical than
the new and perpetuate the old, and appeal to the august authority is expected to be.
lowest motives on the part of all concerned. Every For Cambridge, the specifically linguistic prob-
teacher is sufficiently convinced of the efficiency lems involved in the absorbing and regurgitating
of his methods, and every student sufficiently full of knowledge came up early, through the expansion
of faith that the standard reached will guarantee of activity in what are now called the ESL areas.
him the ability to function in the job of his choice, The Cambridge School Certificate examination
for examinations not to be necessary. All this is survived in these areas both the UK transition to
believed, and repeatedly asserted. Yet entries for the GCE in 1951 and in many cases the far-reaching
the Cambridge EFL examinations are increasing at political changes, and is still one of the Syndicate's
an average rate of nearly 5% on already large num- major activities. A specifically EFL commitment
bers (80,000 plus annually since 1979) in over began 1913, with the introduction of the Certif-
in

sixty countries. Their central position as a target icate of Proficiency in response to the demand for

and a definition of standards can be seen in any a qualification for non-native-speaker teachers of
school or college brochure or publisher's catalogue. English.
The organisers of the examinations, in Cambridge In 1935 this examination was first set in

and at the 400 local centres, are in a unique pos- December as well as in June. In 1937 it was recog-
ition to observe day-to-day and year-to year the nised by the University of Cambridge as an English
interplay of attitudes, assumptions and desires in qualification for matriculation purposes and by
the fields of teaching and testing. Some details of Oxford in the following year. The Lower Certificate
this experience, and particularly of a large-scale was also introduced in response to demand in
consultation with the centres just concluded, may 1939. The more appropriate title of First Certifi-
be of interest, therefore, read in association with cate was adopted in 1975, when both examinations
the discussion elsewhere of basic aims and models appeared in a totally revised form following a long
in testing. period of research and consultation.
The Cambridge examinations have reflected the Long-standing collaboration with the British
fluctuations in views ofwhat a relevant and effect- Council was formalised in 1941 by the establish-
ive test in a foreign language should be since ment of the Joint Committee of the Syndicate and
extremely primitive times. The Local Examinations the British Council. The Diploma of English
Syndicate was one of the first two examination Studies was introduced in 1945 at the request of
boards established in Britain, based autonomously the Council. The Executive Committee for these
on what Victorian social and educational conser- examinations includes representatives of a wide
vation termed 'the universities' and providing range of interested bodies; institutions of further
quality control of the range of schools existing education, ARELS, universities, the British Council
before compulsory education came in. General and, when possible, visiting overseas represent-
education, geared to the needs of the world's most atives.
highly-industrialised state in respect of operatives The early pattern for EFL testing was merely
and was tested in line with contemporary
officials, to carry over the skills thought appropriate for the
notions of content and method, and had its first-language or second-language aspirant towards
Dickensian moments. A properly conducted local 'educated native speaker' status. The expression of
examination, leading to a Cambridge certificate, proper sentiments with formal correctness and the
began with the ceremonial arrival by train in a study of literature were the main features, with
given town of a solemn don in charge of a black translation as the revolutionary 'element of func-
box. A little of this aura clings even to today's vast tional relevance'. The Syndicate is still sometimes
operation, coming out in awestruck responses to involved today in consultation with education
suggestions or instructions, particularly where officials from countries who are applying the same
28
W G Shephard
axioms in their developing local examining systems. teristic teaching pattern, to an integral place in the
Very gradually came the move to the present scheme of the examination. Other minority com-
system of a five-paper series of written and spoken ments argue equally pressingly backward to dic-
tests, covering productive and receptive skills and tation, or forward to video. Information retrieval

as internationalised and functional as it can be exercises on a variety of visual stimuli are suggested,

made by teacher participation in syllabus design, together with criticisms of every such exercise so
marking and test setting. far setand pleas to 'keep the examinations Cam-
Translation and literature remained integral parts bridge' and 'avoid railway timetables'. Above all,
of the syllabus until 1975, with a quasi-compulsory the clamour for individual paper grades instead of
status not much affected by widening the range of the present aggregate result, formerly widespread,
alternatives. was heavily favoured
Translation has died down significantly. The replies give a

overseas because was a softer option for candi-

it strong impression of approval of the present sylla-
dates and teachers than literature, and the reverse bus and its aims, though linked with forthright
applied in the polyglot British centres. Even the comment on failure in particular question types
lower level examination had its scaled-down litera- to realise these aims fully, on the part of a sophist-
ture paper, essentially a composition and factual icated profession able to unite well-disseminated
recall test still remembered favourably for its linguistic theory with comprehensive experience.
encouragement of extended reading —and its Above all, the body of expert opinion repre-
'indoctrination value'. When a structure and usage sented in our replies has made out a strong case for
paper, introduced in 1970 along with objective moving much further in the direction of functional
comprehension tests at this level, quickly became testing, as a way of aligning the examinations with
the most popular alternative, the way became modern teaching methods and also as a way of re-
clear to establishing a standard pattern of tests defining their role internationally. The most
at each level. It on record that a consultation in
is strongly-argued method of achieving this is a sub-
1961 with department of applied lin-
a university stantial increase in the oral weighting of the
guistics, interested in the riches of Cambridge examination. This emerges clearly out of the trad-
candidates' answer scripts as a research source, itional doubts expressed about examiner reliability,

produced the first suggestions that a semi-objective, timing and general administration. Recognition of
well-designed Use of English type paper should be it is a leading element in the detailed consideration
the core of the Proficiency examination rather the Syndicate now proposes to commission, with a
than one among a mixture of ESP alternatives. In a view to a basic re-design of syllabus taking effect
striking contrast of tempo highly revealing of the possibly in 1984. Oral/aural activities deserving of
dynamism of the EFL teaching field, a further a place in an extended interview and listening com-
revision is now planned only five years after the prehension session will be tried out, and their
introduction of the current syllabus in 1975. The actual contribution to the candidate profile
inspiration for this came from a straightforward matched against that of the present range of tests,
administrative need to take general opinion on the in order to establish which new, and which old,
feasibility ofconducting listening comprehension features most qualify. 1975-1981 experience
tests on recorded material. As well as this issue suggests that, in spite of incidental criticisms, the
(which has proved inconclusive, in view of the picture stimulus for conversation achieves its

teachers' equal distrust of examiners and mach- modest aim, that of evoking something more
inery), the Syndicate's enquiry invited comments individual and lively than 'How long have you
on all aspects of the standard and content of the been learning English?' between total strangers.
examination. It is interesting to note some of these Experience also suggests that 'reading aloud' is not
in the context of the historical development of the basicallyan unclean concept, as it does reflect,
Syndicate's examinations and of discussion else- when suitably non-literary, a realistic skill and
where in this collection of articles. does help the tempo of an interview which is, after
The teachers, nearly 250 of
from whom replied all, a marking exercise as well as an extended role-

24 countries, have endorsed the move away from play. Current experience has borne out, however,
culture and background testing, and shown the criticisms of the present three-passage,
approval for a very large proportion of the objective examiner-read listening comprehension test, and
and semi-objective elements introduced. The old we be drastically re-vamping here, greatly
shall
'guessing game' criticisms of multiple-choice assisted, it is hoped, by the ability to use approp-
batteries are now as tiny a minority as the pleas riate recorded material: material with more dialogue
for a return of prescribed reading, with its charac- and general variety of text and delivery, and a

29
W G Shephard
good many steps nearer to the ideal of total eaves- From teachers, the range of queries is also wide.
dropping realism, probably in colour video, which 'What is the pass mark?' is a basic question, but a
will be the next demand. Only one centre put in confused one. It cannot be answered in a standard
for a 100% oral, in the ARELS manner, and no- way, as it is asked from an approach to testing
one felt a separate examination with no oral com- concepts which varies according to country.
ponent, (in Oxford manner?) to be viable. In
the Questions on the length of essays, and the relative
assessing a possible weighting, we were thus some- importance of content and language, also need to
where between 0% and 100%, with considerable be answered other than by formula to be helpful,
pressure to go above our present 25%. The current though we have our series of general reports with
model, now being considered by the centres and as illustrative extracts from candidates' work to send
stated already the subject of trial working, has out. 'How many mistakes are allowed?', another
gone for a one-third oral component, not so much 'backward' question usually related to rigid pass-
for neatness in an examination for which our com- mark concepts, is heard now and again, and shows
puter cheerfully allows 39 as a (raw) maximum little appreciation of the prominence given to com-

oral mark along with marks scaled to one decimal municative competence, which, as was suggested
place in the case of two written papers, as for a above, is much more basic.
combination which can be meaningfully filled in One very productive aspect of public relations
To increase the openly oral
the right proportions. with the teachers, in terms both of the general
element simply on demand has not seemed, over mounting of the operation and because the only
the years, necessarily the way forward, or the only effective answers to many queries on marking
way and complete credit to oral-based
to give true standards and procedure are gained in this way, is
fluency, though the day is clearly coming when their active involvement in our marking panels.
Cambridge can increase the oral weighting both For the present large and increasing entry, a total
because it is good to help to define the teaching team of over 1,000 examiners is appointed in the
syllabus and because it can be done more reliably. year, and the vast majority of these are teachers
The recent questionnaire was a special oper- with extensive and current specialist experience.
ation, emerging as explained from particular needs. The largest proportion of these are the oral exam-
The Syndicate's contacts with the EFL teaching iners at overseas centres, some responsible only for

public, and the candidates themselves, are however small numbers at isolated centres working with the
constant through the year, and highly revealing of aid of instructions, sample recordings and occas-
the variety of aims, emotions, misconceptions, etc. ional feedback, but a much larger number as part
associated with EFL. By means of yearly issues of of an organised teaching/examining operation on a
regulations and documents aimed at candidates, large scale. A panel of over 250 oral examiners
local secretaries, supervisors, oral and written deal with the interestingly polyglot U.K. entry,
examiners, all revised and resharpened in the light and many of these double as markers of Compos-
of the previous year's experience, the plain message ition or Use of English papers. The 'hard core' of
is conveyed about the timing and cost of entries, this group are the 50 or so who in addition to
conduct of the examinations, and issue of results. marking current examinations participate in the
A vast amount of correspondence, telephone setting of future examinations across the whole
enquiries, questions at conferences, etc., indicate range of objective and open-ended test types,
however, that much of this material is not received, written and oral/aural. These are our best allies in
not read, or not understood either through lan- the public relations field, those who know that
guage difficulties or because what is laid down is perfection in testing is not a matter of realising
not what is desired. Late entries, for instance, heart's desire once, but over and over again in a
simply cannot be accepted under a computer- way that will satisfy the demands of security,
processed control system which apportions ques- consistency, discrimination and still not offend
tion and answer documents and provides statistical against a wide spectrum of methodological or ideo-
data for the monitoring of marking for the entire logical feeling.
entry, yet provides also personalised documents, In general, the Syndicate's contacts with the
from timetable to final result, for each candidate. EFL teachers, whether as regular collaborators,
At the other end of the process, results once issued, or occasional enquirers or critics, indicate in a
though subject to a check when requested by highly interesting way the current degree of accept-
teachers who feel a grading is significantly out of ance of various concepts in language attainment
line, are final and related to a carefully-maintained and the teaching approaches based on them. Over
general standard. a period of ten to fifteen years we have seen a

30
W G Shephard
swing away from the revealingly formulated cry, taken his eye off mundane considerations of
heard at an ARELS conference, of 'Too much marking techniques and values.
objectivity!' against multiple-choice testing, and The Syndicate would claim that, by and large,
the 'Crossword puzzles!' criticisms of material in itsgood relations with the EFL teaching public are
structured answer-book format. A question setter, based on recognition that its role and procedures
quite eminent in course-book production circles, have caught the whole testing process, from reliable
did once actually submit a crossword puzzle as theory to consistent practice, somewhere near the
part of a semi-objective paper, having temporarily right point of balance.

31
Ian Seaton

Proficiency testing for tertiary level study

and training in Britain

1. Beginnings successfully. Another reason in this second

The British Council, either directly through its grouping was the need for a stable and monitored
own scholarship schemes, indirectly for the Over- system that could be developed on the basis of
seas Development Administration training awards evidence, and not anecdote, with the back-up of
or as service to international bodies such as the statistical analysis and provision of parallel and
United Nations agencies, administers overseas improved versions. Third was the need to make
students from their initial selection through their the test system more transparent. Tests of the
study or training to their return home. A crucial formal language system often present their infor-
part of this administration is ensuring that the mation in unitary scores with cut-off levels for
English language ability of these students is such standards determined by norm-referenced analysis.
that they can fully profit from their stay in Britain. As such, both in 'face' terms and the way the
Over the past twenty years, therefore, the Council scores were arrived at, they meant little to the ELT
has operated language tests to provide such infor- teacher and less to the student. A transparent test

mation for all concerned: students, teachers, the system would — it was agreed — organise its shape
Council itself, sponsoring bodies and British instit- and content, and present its results in a way easily
utions. Clearly, such tests operating all over the understood by the non-specialist.
world and informing all interested parties have
heavy demands placed on them. The tests have to 2. Requirements
select, diagnose and predict to enable the various The committees and working parties that met
people involved to take a variety of important during this period soon came up with a formidable
decisions. list of requirements for the new test system. It

Some five years ago it was decided by the would have to deliver information on three counts.
Council, together with other interested bodies like Had the student reached a minimum level of ade-
the Committee of Vice-Chancellors and Principals, quacy which would indicate that his/her planning
that the tests used hitherto — the Davies English to come to Britain within a year or so was reason-
Proficiency Test Battery and the Council Subjective able? Had the student reached a level where a
Test — should be changed. The reasons for this period of one to six months English language
decision fall into three groupings. First was the teaching would bring him/her to a fairly adequate
recognition that views on what language is and level? Had the student reached this fully adequate
how it is had changed in the 1970s. In
used (target) level already? The test system would
syllabus and in the classroom there was
design therefore have to be criterion referenced or linked,
more emphasis on presenting language in use and needed for
testing those language skills likely to be
ensuring that the learning was more specific and the specificpurpose of studying or training in
appropriate. It was felt that the new test system Britain. It would have to establish the various
should derive from the same concern, account for levels of adequacy in these skills validly and reli-
the communicative use of language and comple- ably. Its content and item types would have to
ment and integrate with the general shift in ELT. have at least a beneficial backwash into the class-
Second was the desire to achieve a balanced system room and its results accurately predict the part
whereby a central and controlled test system could language ability would play in the outcome of
provide a reliable and valid measure yet be flexible that student's actual course.
enough to allow for local (national) differences. Then the test system would have to be capable
The previous tests had been operated on an of being marked locally or centrally as certain
increasingly local basis, leading to unreliable scores decisionswould have to be made in the student's
which could not be consistently interpreted by all own country and certain in Britain. It would have
those involved in the long and often complex to be a comprehensive test, and yet not so com-
chain of getting the student to and from Britain plex as to inhibit its uniform administration in

32
Ian Seaton
some seventy countries. It would have to be a tests can be reliably scored wherever and whenever
testing service, flexible and 'on demand', not a the test is given. How could a test system empha-

formal twice-a-year examination. The results communicative interaction between text

sising the
would have to be presented in a way that gave a and person or person and person be so 'properly'
picture of the student's ability so that different scored. The item-types would inevitably call for
levels ofperformance in different skills could be whole, integrative performances which would have
seen and acted on by all involved in the process. to be subjectively assessed. And, finally, even if
the transparent reporting of results was laudable,
3. Problems could we as yet be fully explicit about what con-
With such a list of requirements, the working stituted language performance at this or that level
parties that began the work of survey, specification and describe it using commonly agreed terms?
and design in 1976 obviously ran into problems.
All involved knew they would have to account for 4. Solutions
a large number of variables that had never been By 1977/78 solutions to these and other problems
adequately identified — let alone described — and were coming from the teams set up by the Council
that any solutions would be partial. The survey and the Cambridge Syndicate. Some solutions
had to account for the wide range of teaching and were more 'solid' than others but all had to contri-
learning practice and levels of tolerance in British bute to getting the test system operational. It was
institutions, and yet organise and order this variety agreed that, given the size and nature of the task,
to control the design of a 'manageable' test system. however, it was essential that where a solution was
To specify the content of the test to the degree of clearly interim, procedures had to be set up to
delicacy and validity that would control the organ- allow its improvement over time and actual oper-
isation of text, activity and item types required a ation. The model developed by John Munby and
theoretical model of language in communicative set out in his book Communicative Syllabus Design
use. Previous tests inherently relied on the formal was used flexibly for both survey and specification
system of language — grammar, syntax, lexis, etc. by the six teams who investigated the language
to act as such a model with more or less agreed demands made on students in the academic subject
descriptive terms to label the components. There areas of Physical Sciences, Life Sciences, Tech-
was, and is, no such commonly agreed model to nology, Social Studies, and Medicine. This use of a
describe the communicative use of language; yet common procedure, supplemented by the data
without it the specifications could become too collection, observations and intuition of the
loose and unfaithful to the criterion behaviour. A specialist teams allowed the survey/specifications
six-month language course can allow the process to be consistently presented. Thus the inevitable
of teaching and learning to fill in the gaps between variables of such a wide survey could be system-
syllabus, materials and methods; a two-hour lan- atically extrapolated to enable an explicit and
guage test claiming to be a valid and reliable justifiable link to be made within and between the
measure cannot. The idealisations of theory and separate parameters. For example, the communi-
design can be constantly adjusted as they are cative event specified as 'taking part in an academic
exposed to the reality of practice in a course. tutorial' could be broken down into component
However, how can this be done in a test that is so activities such as 'note-taking' and then further
much more constrained and which has to remain into groups of component language micro-skills
stable for a longer period? such as 'use of cohesion devices in discourse'. At
The next series of problems came in converting each level of specification the teams could describe
the specifications (assuming that they were agreed, significant variables affecting performance and in
appropriate and detailed) into a language test. The particular use the dimension and tolerance levels
activities specified had no immediate item-type in in Munby's parameter six to assign different stan-
the test designers' repertoire to carry them. Deli- dards of performance.
cate and detailed specifications of the communi- The editing team then set about the organising
cative use of language might have to be constrained and condensing of these specifications into a core
into the strait-jackets of multiple-choice items to of EAP activities, text types, skills all with assigned
satisfy requirements of objectivity and reliability. levels of language realisation and performance to
If one ignores the issues of validity and the allow realistically the writing of the tests. Although
detrimental effect on language learning of the the specifications organised the language skills in
discrete-point tests of language structures solely groups of mixed-mode activities, it was decided
with paper and pencil, it remains a fact that such that the test system should select and organise its

33
Ian Seaton
sub-tests around the single modes of 'reading', is now available in all Council
offices in Britain. It
'listening', 'study and 'speaking'.
skills', 'writing' offices throughout the world for both sponsored
This re-organisation recognised that both students and private students, and parallel versions of four
and teachers still think in these terms, and that of the sub-tests are being phased in. In countries
such a test system could not afford to be too inno- like the Sudan it is used by ELT Institutes pre-
vatory. Again there was a compromise on item- paring students for study in Britain, while in Aus-
types. The system would need the 'anchor' that tralia, the University of Melbourne has contracted
discrete-point items provide while testing the more to use it for its own pre-sessional and concurrent
integrated activities specified, and so three of the To meet
service English courses. 'local' require-
sub-tests are multiple-choice and the other two ments where some students may already have a
task-based. Although the editing produced good English language background or may be
'common-core' skills it was decided that, partic- planning only a short period of study not necess-
ularly to make the test more acceptable in 'face' arily in Britain, one of five combinations or
terms, there should be six modular tests with patterns of sub-tests can be taken, as appropriate.
'source booklets' (collections of texts) particular The Liaison Unit, as well as monitoring the use
to the five specified subject areas — with an extra of the Service worldwide and controlling some
module for 'General Academic'. of the subjective assessments, plans the develop-
The solution to the administrative problems set ment of the tests on the findings of the various
by such a large-scale system was there from the validation studies. These studies currently focus on
outset in the decision to run the Service jointly. features, construct and content
basic reliability
The Council would provide its unique network of validity and most importantly predictive validity.
ELT qualified staff through its office world-wide,
The Unit also publishes occasional information on
while Cambridge Syndicate provided its
the
the Service together with the User Handbook (for
experience and facilities in running examinations prospective candidates) and the Specialist Hand-
and tests in Britain and overseas. The Test Develop- book (for the professional ELT community). As
ment and Research Unit at Cambridge would pro- part of the development of the Service, the Unit
vide the analysis and computing services while the plans to bring into operation in 1982 a series of
Council set up a Liaison Unit to combine these modular tests for the increasing number of students
various resources in operating and especially coming to Britain for vocational technical training.
validating the test system. The first versions of these tests were ready for
It was decided to report the results by presenting trialling in the Autumn of 1981.
them as a five-point profile of language ability;
the scores on each sub-test are converted to a
band, or level of performance, on a scale of one 6. Directions
to nine with each band having the appropriate des- Although it is a complex and comprehensive test
cription of what it means in terms of ability. This system using a lot of specialist resources, all of
framework already has interim validity but will be those involved in its design, operation and valid-
refined through the 1980s to improve both its ation recognise that we are at the beginning of the
definitions of target levels and its predictions as to business of describing and measuring the com-
the average time and type of language tuition municative use of language rather than at the end —
needed to reach target levels. The Council has its particularly so in attempting to define the role
own unit, the English Tuition Coordinating Unit, language plays in academic study, vocational
which is now interpreting these andprofiles training or any learning process. The English Lan-
advising students, sponsors and institutions on guage Testing Service more than fulfils the standard
placement and/or pre-sessional English language requirements of a language test, but beyond that
tuition. This means that, although there is only should provide in the next ten years a systematic
one test system, it can be used to inform on the and stable procedure to investigate the use of lan-
varying levels demanded on a whole range of guage in the areas in which it operates. Validation
courses. studies such as the one recently set up with the
Institute for Applied Language Studies at Edin-

5. Operation burgh University, which will run for the next five
After two years of pretesting, analysis and revision years, will support such investigations while, of
with large samples in Britain and overseas, the course, contributing to the improvement of the
Service went into operation in 1980 in forty decisions that have to be taken in arranging in-
British Council representations and certain regional Britain study or training.

34
Ian Seaton
Many surveys over the years have pointed out working systems which can then be increasingly
that other factors such as personality, motivation, validated and improved.
cultural background, etc. play a crucial role in
Key references:
determining the outcome of study or training in
The English Language Testing Service. User Handbook,
Britain. It is hoped that, when research into these 1981: Specialist Handbook, forthcoming, The British
factors has reached a certain stage, it might be Council.
possible to extend framework of profile
the Graded Objectives in Modern Languages, 1980. Centre for
reporting to build a more whole profile of a Information on Language Teaching and Research.
Testing Communicative Performance, An Interim Study,
student's learning style and language ability. Such
1980, Carroll, B.J., Pergamon.
a step would fit in well with the whole develop- Communicative Syllabus Design, 1978, Munby, J., Cam-
ment of the Service, which has been one of apply- bridge University Press.
ing current thinking as rigorously as possible to

35
;

CSWard

Progress testing
preparation and analysis

One of the main tasks of any educational institution with allowing the student to show what he has
and its teachers is to check on the success of their mastered. Scores on it should thus be high
courses with reference to particular students and (provided, of course, that progress has indeed
the whole group. In the case of the individual been made). Whereas in standardised achieve-
student, there is a need to know how well he is ment and proficiency tests, a wide range of
keeping up with the programme. Should there be performance should be indicated, the progress
difficulties, he needs to be provided with a test should show a cluster of scores around the

remedial programme or moved to a more appro- top of the scale.'

priate course. Checking on group progress enables
This difference has important consequences
a check to be made on the course itself and can
when preparing and analysing progress tests, and
provide clues as to how it can be improved.
the techniques used for standardised testing are
The importance of the progress test will vary
not completely applicable. This has not always
from situation to situation. In the small institution,
been sufficiently emphasised. Some suggestions
where there are close contacts between staff and
are now offered which may help produce good
students, the teacher's assessment of students will
arogress tests as defined in the above quotation.
probably have much greater weight than in the
large institution with many classes running at the
same time or where circumstances impose a con- Preparation
siderable number of teacher changes. In the latter One of the requisites of a successful syllabus is a

case, there will be much more pressure for a clear statement of aims and methods, which is also

common yardstick by which to judge students' a requisite for successful testing. As the progress
progress. However, even in the small school a testshould reflect the syllabus, the statement of
double check, such as provided by a progress test, aims and methods prepared for the syllabus will
will help the teacher to review his assessments and largely dictate the form the progress test will take.

will help to indicate what areas of the course have Each course syllabus, except those for an
yet to be mastered. A good progress test can also absolute beginner in a particular language, will
help the student to understand where his weak- assume an ability in certain areas of the language.

nesses lie and build confidence by indicating the On the basis of this previous ability, the course
areas he has mastered. will aim to develop abilities in new areas. In the

Many of the principles for preparing good pro- progress test that follows such a course, the design
gress tests are similar to those for preparing and content will seek to show that the students
achievement and proficiency tests. Instructions to have attained those abilities the course sought to
the students should be clear so that they have no develop. The test may include questions that
difficulty in understanding what they have to do. involve students using the ability they were assumed
Trick questions should be avoided. These are more to have at the beginning of the course, but it

likely to trick the better students than the poorer should not be central to the test. Nor should the
ones. However, in other ways, the principles will test include anything which was not considered a
be quite different, reflecting the different purpose necessary previous ability, and which was not
the test is to serve. A quotation from J.B. Heaton's covered in the course. In other words, the content
Writing English Language Tests (Longman, 1975) of the test should be such as to result in any
will serve to summarise these differences. student who has successfully mastered the course
content getting perfect or near perfect scores.
'Good performances act as a means of Once the content of the test is decided, the
encouraging the student, and although poor testing method will have to be decided. The types
performances may act as an incentive to more of possible questions are well described in the
work, the progress test is chiefly concerned standard texts and need not be discussed here. The

36
CSWard
choice will again reflect the course as much as Analysis
possible. We need to use the testing method that Once the been used, there is unfortunately
test has

will indicate as near as possible that the student a tendency for it one side. An analysis
to be put to
has attained the target ability. For example, it of the test results Can be done fairly rapidly and
would be inappropriate to ask students who have can provide a lot of information about the students
attended a course which emphasised reading and the course. It can also lead to the development
business letters to write a business letter as a test of better progress tests for the future. Often the use
of their successful completion of the course. of a test can show flaws in it which cannot be seen
Similarly, it is not advisable to depend on reading by inspection.
comprehension tests when checking the success One of the first steps in the analysis of the test
of a course in writing. This is true of other forms would be to seek wherever possible the opinions
of tests, but it needs to be emphasised even more of the teachers and students who have used it. If

for progress tests for two reasons. First, students the progress test has been a good one, most
will tend to concentrate on areas of the course students will be satisfied that their results reflect
in which they know they will be tested. Second, their progress. They will have seen the course
the correlations between different types of tests reflected in the test and any loss of
will see that
that are quoted as a basis for accepting standardised marks was due to their lack of mastery of sections
tests, the contents of which do not completely of the course. General discontent with any section
sample the language, may be suspect when proof the test will usually indicate that either the
gress tests are considered. A course which empha- instructions were not clear, or the question was
sises a particular skill over others will probably obscure, or the students did not feel that particular
cause such correlations to be reduced substantially. area was covered in the course. Teachers who have
However, in large institutions or where time is seen the test in operation will also often be able to
limited, a machine markable test may well be used give helpful advice.
as a common yardstick. If combined with, for A fairly easy second step is to look at the dis-
example, a teacher's assessment (hopefully, a tribution of marks. As stated in the earlier
continuous assessment) of areas not covered by quotation, the scores on a progress test should be
the test, and if the students are aware of this, then clustered towards the top end of the scale. If they
the disadvantages of using such tests may be are not, something has gone wrong. There are
avoided while the benefits are retained. several possibilities: for example, either the course
Finally, the questions need to be written and design needs to be revised or the test needs to be
the test booklet designed. These should always be redesigned, or both.
at least double-checked. A
second opinion will The final step is to investigate each question. It
often see flaws that the first writer cannot see is useful to keep each question on a separate card
until they are pointed out. The following questions with a summary showing when it was used and
need to be asked: how successful it was. This may take a little time
-

at first, but will save time later when making up

1. Are the instructions clear?
new tests. It will also provide any successor to the
2. Is the content restricted to course content or
present administrator a useful guide when he, in
what might be reasonably thought as a pre-
turn, has to prepare progress tests.
requisite to the course?
In the case of subjectively marked papers, the
3. Are the tasks open-ended questions suf-
in
best check available is to find out how well students
ficiently defined? Are they likely to provide
did on each question. The scoring scheme will
the student with an opportunity to show his
depend a lot on the type of question, and thus
ability in the relevant skills?
hard-and-fast rules are difficult to give. However,
4. In multiple-choice questions, is there clearly
generally the question scores should reflect our
one and only one correct answer?
expectation of the total score, i.e. the students
5. Is the marking scheme for the subjectively
should generally do well. If they have not per-
marked tests clear? Does it emphasise areas
formed well, a discussion among those involved in
stressed in the course or does it emphasise
teaching the course and preparing the test should
other areas? For example, if the course has
help to identify the problem, and either lead to a
stressed communication over accuracy, does
revision of the question or a revision of the course.
the marking reflect this?
In the case of multiple-choice tests, an item-
analysis can be done. This process is clearly des-
cribed in several texts. It need not take a long time
37

El
CSWard
and the information that can be obtained can be testing, this would generally be regarded
as too
very useful. However, the item analysis as des- easy as attempt is made to spread the
an
cribed in most texts was developed for checking scores along the whole scale. Items that all or
items in standardised testing and, if used in the nearly all candidates get right or wrong do not
same way for progress tests, will not help to develop help to do this, and so items with difficulties over

the type of test that is needed. Thus, the process 0.80 and less than 0.20 are rejected from such
will be described in detail and any differences of tests. However, in progress tests we expect the

approach needed will be pointed out. items to represent areas the students have mastered,
The first step is to rank all the answer booklets and thus easy items should be retained. On the
according to the total score obtained by the other hand, items which the majority of students
students. The lower and upper 27% are then set get wrong are suspect. Any item that has a diffi-
aside for analysis. There should be an equal num- culty of less than 0.50 may have been badly
ber of papers in the upper and lower piles. Thus, written or may test an area not covered sufficiently
if there were 100 candidates, there should be 27 in the course. In the first case it should be rejected.
papers in each pile. Sometimes there are several In the second, either the course should be revised
papers with the same score at the point where we or the item rejected.
have to make the cut-off. For example, there The discrimination coefficient for the above
might be in this group of 100, 24 students with
scores above 88% and 5 students with scores of
example would be
24 — 22 —27——
2
= - = 0.07 (to 2 deci-
27
88%. We would then choose 3 of those 5 randomly mal places). Again, in standardised testing, such a
for the upper group. We could then prepare a low discrimination would be regarded as unsatis-
table for each item as follows: factory. Low discrimination indicates low agree-
ment with the other items in the test. As much
A* B C D NA TOTAL agreement as possible is needed between items to
u 24 ° 27 help spread the candidates along the whole scale.
L 22 3 1 1 27 Thus, generally items with a discrimination of less
than 0.20 are rejected for such tests. However, in
Key: A to D — Choices in the item.
progress testing this is not such an important con-
* — Indicates correct answer.
sideration. Indeed, it can be proved mathematically
NA — No Answer.
that, when using the formulae given here, an item
U — Upper group.
L — Lower group. with a difficulty over 0.90 cannot have a discrimi-
nation above 0.20, and a question that students all
From these tables we can easily work out two get right can only have a discrimination of 0.00.
statistics: the difficulty coefficient (= simply the Furthermore, it is unlikely that an item will have
proportion of people who got the item right) and the maximum discrimination theoretically possible.
the discrimination coefficient (= a figure which Thus, few items with difficulties over 0.80 will
tells us how well the item discriminates between have discriminations over 0.20. In a progress test,
the upper and lower groups). The formulae are: as we wish to retain these 'easy' items, we will
have to accept lower discrimination coefficients.
Difficulty coefficient =
UR + LR
However, the discrimination coefficient remains
2n
useful. In progress testing, as in standardised testing,
Discrimination coefficient =
UR-LR we use the total score rather than the score on
n
individual questions, and so it is still important
Key-. UR — The number of people in the upper that particular items do not work against the rest of
group who got the item right the test. If more of the lower group get the answer
LR — The number of people in the lower right than the upper group, then the discrimination
group who got the item right coefficient will be negative and the item will be
n — The number of people in one of working against the rest of the test. Such an item
the groups. should be rejected or revised. A suggested approach
(N.B. There is a more complicated and accurate is to reject all items with a negative discrimination
formula for discrimination for those con- coefficient, and reject items which have both a
templating using a computer.) difficulty coefficient of less than 0.80 and a dis-
24 + 22 _ crimination coefficient of lessthan 0.20. In this
For the above example, difficulty is 2X 27
way we would build a test that could identify the
~ = 0.85 (to 2 decimal places). In standardised
weaker students while continuing to have a cluster

38
CSWard
of scores towards the top of the scale. Many will simply be badly written, but others will
Following this, we should check the distractors provide keys to where the course is failing to cover
in the table previously given. The distractors are certain areas or failing to clear up misunderstand-
the incorrect choices (B, C and D in the example ings (or where it is actually creating misunder-
given). Distractor B would be acceptable inany standings). Where it is decided that the course is

test as more of the lower group chose it, and it at fault, the course can be revised and the item
thus helps to identify the lower group. Distractor retained.
D would be rejected or revised as more of the The above discussion has been an attempt to
upper group chose it, and it thus confuses the issue. suggest an approach to the preparation and analysis
Distractor C is more difficult. It would be rejected of progress tests that emphasises such tests as a
in standardised testing as does not help to dis-
it tool for the educational administrator or teacher, a
criminate and is thus just a waste of candidates' tool to encourage students and a tool to help check
reading time. However, in progress testing, if it and revise the course so that the aims of the
represents a common which students com-
error course will be realistic aims for both students and
pleting the course have mastered, it is worth teachers. Such tests cannot function properly if
keeping. If, on inspection, however, it proves to be they are divorced from those who are responsible
a distractor that is so ridiculous that no-one would for the courses. Used wisely, progress tests will
think of choosing it, it is worth revising. lead to better designed courses. Progress tests are
The final stage is to go through all the reject an adjunct to courses. They should never super-
items and try to establish why they are rejects. sede them.

39
John Refers

Tennis plays Nhd-or how to

humanise tests
•

Nha (/na/) was a Vietnamese member in one of the course members about what had been said in class
courses for the English Language Institute's Dip- and about possible traps in the test itself, which
loma in the Teaching of English as a Second then taught as well as tested. Other items were
Language. He spent a lot of his time on the tennis about the test itself. All of these helped to take
court. (He doesn't mind my saying this.) So when some of the tension out of the test and to provide
it was time to give a test on a linguistics unit, it some interest and amusement in the actual test
seemed the sensible thing to change the wording of paper. Again, it would be difficult to say whether
Chomsky's 'Golf plays John', and make the ques- the results were better or not. Perhaps it was more
tion of immediate, personal relevance to Nha and important that items like the following provided
his colleagues. In addition, of course, the course some primary motivation. They might even have
members all smiled — some laughed — when they stimulated course members to try something
came to this question. I cannot prove that more similar in their own testing programmes back
people got the question right because of the per- home.
sonal reference, but at least the examinees relaxed For the classifying sentences test, we usually
a little and the name Nha triggered a memory. had a sentence like this as an example:
This same technique worked equally well in S V O
usually feared Grammar Simple
tests. In a so-called We/like/tests.
Sentence Patterns test, course members were asked In the re-test, the example becomes:
to classify sentences into SV/SVC/SVO/SVOO/ S V o
There VS types. Previous tests included the usual We/(still) like/tests.
dreary classroom/grammar book sentences. We Scattered through the tests were sentences like
found that sentences referring to individual course the following:
members and to some of the things that they had 'I hope you avoid the traps.'
actually said in conversation or in tutorials caused 'There aren't any problem sentences.'
considerable visual and audible amusement in the 'So far this test doesn't seem very difficult.'
examination room. Apparently, these personal 'I think number 15 was a trap.'
references evoked vivid, and therefore easily re- 'There won't be any more there sentences.'
trievable, memories of grammatical explanations. 20. I can't remember the difference between
This may have been an interesting example of the SV and SVC sentences.
involvement of both the right and left hemispheres, 21. This is an SVC sentence.
and their distinctive thinking or heuristic styles, in 22. This isn't.

problem-solving. Here are a few examples. (The The last item in a test: 'I've begun to enjoy classi-

situation is explained before each test item.) fying sentences.'

Another final item: 'I don't want to make up
a. We had some discussion about 'taste' and a
another test!'
class joke developed about a certain course
The conclusion to another test:
member's exploits.
'You will lose a mark for every minimum

SVC
re-
'Lim has tasted the night life of Wellington.'
quirement mistake you make.'
b. During a very cold spell, there were complaints
about the beds in a university hostel.
You/have been/warned! (or SV, if we haven't
'The university hostel wouldn't give them any
brainwashed you)
Dutch wives.'
S V o
c. One course member had been described as a
I/hope/you all pass.
very romantic person.
S V c
'Ibrahim gave her his heart, body and soul.'

Other test sentences were intended to remind (This/is/) THE END

40
John Rogers
At the beginning of our Study Skills course we Directions?' test. First of all we show the following
usually administer the following 'Can You Follow flashcards:

C^ZS) ^OtA. *^o//o&J //7 J/^ c/ c-//"o ^ •T ?

/4r*4 ^oca Scce ?

£-&,-/ 's- ^ot^c* ouA,

Z-e/'J* hav-e c? S-es-/-/

Voce {hc*U-e 0/?/y 5" ** i ^ LA-/-& t

Then we give the following test:

Can you follow directions? 18. Now that you have finished reading every-
{Speak to no one. Do not look at anyone else's thing carefully, do only sentences one and
paper. Work very quickly. You have only five min- two.
utes.)
1 Read everything before doing anything.
2. Put your name in the upper right-hand corner
(Have you been caught out by this test? The editor
of this paper.
of this collection of articles was caught out not
3. Circle in the word name in sentence two.
many years ago!)
4. Draw five small squares in the upper left-hand
corner of this paper.
At the end of the Study Skills course, we gave a
5. Put an 'X' in each square.
Reading Comprehension Test. The opening in-
6. Sign your name under the title.
structions contained the following:
7. After the title write, 'Yes, yes, yes'.

8. Put a circle around each word in sentence 'Can you follow directions? Do you remember
number seven. the test on following directions? Read all the
9. Put an 'X' in the lower left-hand corner of instructions first. This isn't a test of reading
this paper. speed. But there is a time limit for the whole
10. Draw a triangle around the 'X' that you have test.'

just written.
Course members smiled, heads nodded and they
11. On the back of this paper multiply 703 by
started work on page 1. It was a long test, inten-
9805.
tionally, and when they got to the end, on page 7,
12. Draw a rectangle around the word 'paper' in
they found this final instruction, which, of course,
sentence number four.
the majority had NOT read before they started on
13. Call out your first name when you get to this
page 1
point in the test.
14. If you think you have followed directions 'Now that you have read all the instructions
properly up to this point, call out, 'Yes, I and now that you know how to follow direc-
have'. tions, do NOT 1, 5 or 9 on
answer questions
1 5. On the back of this paper, add 8950 and 9850. The Stolen Letter (question 1).
16. Put a circle around your answer. Put a square If you have already circled a, b, c or d for
around the circle. questions 1, 5 and 9, you will lose three marks.
17. Count out loud in your normal speaking Do NOT ask for another question paper. Do
voice backwards from ten to one. NOT rub out any marks you have made.'

41
John Rogers
Quite a number of course members laughed out Alt you need
is love and oxygen
loud at this point, shook their heads and sat back anyone's guess how many people have
It's

resignedly. I think they then learnt how to follow joined the Mile High Club by making love in an
directions. Tests CAN teach. And they CAN be aeroplane.
made more human. 'I mean, who knows what do under
Finally, an example of a cloze test that pro- blankets when the are turned out,' said an
vided a great deal of amusement and entertainment. Qantas spokesman.
It proved to be challenging, but the inherent 'Love-making in is certainly possible and
human interest of the material kept motivation our and hostesses have caught people
high. And there was an instant clamour for the the act.
'official' answers. The follow-up 'correction' 'If the couple_
quite happy and not dis-
lesson was as entertaining as the original test. I turbing we'd probably say a
else, 'Sorry,
offer it to anyone who would like to try it out. sir' and leave them. it. We've nothing in the

Making and taking tests CAN be fun. Enjoy your- on it.

self! 'Incidentally,' the spokesman it s time

the terminology of Mile High Club was up-
dated — planes fly at six miles high.'
One report caused Mr. Norman Tebbitt,
British MP and former B.O.A.C. , to state
that passengers should instructed to 'fasten
your chastity before boarding.'
'It is not for people to become bored
over-emotional on long flights. After ,
'
what else is there to ?

Airlines offering in-flight movies are help

to impatient lovers who keep their hands off
each . 'You'd be surprised what goes
when we darken the cabin,' stewardess said.
'If there's a scene in the movie it

starts someone off. People get_ on long day-

light routes.'
As the dangers in six-mile love-
making, medically speaking, there none, says
Professor John Llewellyn-Jones, professor of
obstetrics and gynaecology Sydney Univer-
sity. 'It is O.K. a pressurised aircraft, in the
of the deepest mine, or .the Moon,
with the right. .,' he explained. 'All you.
need is oxygen.
'The only _ against it are the reactions
the other passengers. I suppose would
either object or be
(Adapted from an article by
John Sims, in The Dominion,
Wellington, New Zealand.
August 21st, 1971)

42
'auline MRea
An alternative approach to testing
grammatical competence

The main thrust in language education today is on language proficiency, involving a wider breadth of
the teaching of language as communication. The vocabulary and more sophisticated manipulation
terms 'notional', 'functional', and 'communicative' of language structures, takes a longer time to
are labels frequently used to describe current develop.One can therefore anticipate difficulties
approaches to language teaching. In other words, when second language learners try to operate at a
the central concern is with the imparting of lan- higher level of understanding and communication
guage skills which will enable our language learners in the language but find they are unable to do so
to engage more efficiently and effectively in because of an inadequate formal linguistic gener-
natural communicative activities. Whereas at one ating mechanism.
time the focal point has been on the learning of When we say that someone 'knows a language',
the grammatical patterns of the target language, it we mean that this person has acquired certain
is now claimed that the primary aim of most lan- abilities. These include the ability to produce
guage programmes is to develop learners' 'com- grammatically acceptable sentences in the target
municative competence' in the target repertoire. language, together with an ability to use these
There is, however, a tendency to interpret a 'com- correct forms appropriately, as the occasion
municative' syllabus as one which is organised demands. It is essential, therefore, that both these
primarily around a set of notional and functional aspects of communicative competence are taught,
categories, which subordinates the role of grammar and that the importance attached to the teaching
as an organising principle to second place. This of the social functions of language should not
sharp swing in the pendulum in the orientation of obscure the crucial role of the grammatical system
language teaching programmes will certainly lead to the successful communication of ideas and
to considerable difficulties in the foreseeable future intentions. It follows from this that the shift in
if the imbalance in present trends is not redressed. emphasis in language teaching programmes has
There are two inherent dangers with some inter- neither eliminated nor even reduced the need for
pretations of the communicative approach to lan- teachers to assess their students' grasp of structural
guage teaching that are relevant to the present dis- items of the target language. The requirement to
cussion. The first is associated with the rather assess grammatical competence is as necessary
unsystematic and unprincipled presentation of the today as it ever was. However, the different views
grammatical system of the target language. Indeed that we now hold of language, in particular the
Wilkins (1976), as one of the major contributors role of grammar in terms of its function within a
to discussions on the notional syllabus, stresses the semantic and pragmatic framework, do have impli-
importance of the acquisition of the grammatical cations for the way in which grammar will be
system of a given language (p. 66). The second, assessed. The rest of this paper will examine the
related problem stems from the emphasis which changes in the approach to test syllabus specific-
many communicative courses give to the acqui- ation, the method, and the format of tests which
sition of socialising skills during the early stages of are a direct result of the current trends in language
the language learning process. The likely outcome teaching and learning programmes.
from both these factors is the emergence of groups Hitherto, specifications for a grammar test have
of learners whose language proficiency, at best, been in the form of an inventory of different
demonstrates an adequate degree of fluency at a aspects of English grammar. Such a list would
basic level communicative interaction, but
of include determiners, such as 'some', 'any', 'much',
whose knowledge of the underlying target language 'little', and verb forms such as 'present', 'impera-
system is grossly inadequate. Cummins (1979) has tive', 'modals'. Typically, test items to match these
found that basic interpersonal skills may be areas would be similar to those shown in Table 1
acquired fairly rapidly whereas literacy-related on the next page.
43
Pauline M Rea
DETERMINERS IMPERATIVE
We haven't got tomatoes at all. Q. '
a cigarette?'
A. 'No thanks'
1. some 1. Do you have
2. much 2. Have
3. a few 3. You have
4. any 4. Have you got

PRESENT MODALS
He often in the bath. Q. 'Must I do it this evening?'
A. 'No, you
1. is singing 1. mustn't
2. to sing 2. can't
3. sings 3. needn't
4. singing 4. won't

Table 1: Conventional multiple-choice grammar test items

Implicit in a specification of this kind is the belief sample test specification which relates grammatical
that 'knowing a language' corresponds to accurate and communicative categories appears in Table 2
manipulation of the grammatical forms of that below.
language, and test items have tended to re-inforce The second inadequacy of existing grammar
this belief by tapping knowledge of the (formal) tests is associated with the method of testing used.

rules of the language (usage). Thus, they account The isolated sentences format illustrated in Table 1
for only one of the two abilities involved in com- is not valid within a communicative framework. It
municative competence. Given current concerns is obvious that a test method which excludes the
to develop communicative language proficiency, total context inwhich the grammatical and lexical
any test specification and any test item should system of a language operates cannot claim to be

Noun Phrase
Use of determiners — some, any, few, etc. for QUANTIFICATION
— a/an for INDIVIDUATION
— zero for GENERALISATION
— possessive/the for SPECIFICATION

Verb Phrase
Use of passive for PROCESS
Use of infinitive for PURPOSE
Use of finite forms — imperative for INSTRUCTION
— present for DESCRIPTION

Table 2: Sample communicative test syllabus

also account for the second ability mentioned assessing the appropriate use of grammatical struc-
earlier, namely the appropriacy of language use. tures and forms. There are three related difficulties
In other words, they should not only list and which make the conventional methods incompat-
assess the grammatical functions basic to com- ible within the framework of the communicative
municative activities but they should also assess competence model.
the application of the rules of language use. A
44
'auline MRea
1. Inadequate coverage and imbalanced item petence, the final part of this paper will make
distribution suggestions for an alternative approach. The test
This is the inevitable consequence of testing items which are illustrated below are designed as a
grammar and vocabulary in discrete sentences. For more valid means of testing grammar. Additionally,
example, there are some items which are very easy these items permit the objective marking of ques-
to test, such as adverb-tense collocation (perfect + tions, thus satisfying the criterion of practicability
just/never/ever/ + neg + yet/etc.) and question tags which is an overriding consideration when large
(isn't it/shouldn't we/didn't we/etc). Thus we numbers of students are involved in the testing
often find an overabundance of items of this type process. The discussion is restricted to the com-
included in grammar tests. On the other hand, municative function area of 'description', and
there are aspects which are more difficult to assess takes as a specific example the case of students
and are thus often excluded. These include areas whose purpose for learning English is to facilitate
such as the passive verb group, and modals. further studies through the medium of English.
Given this information, we are in a position to
determine that these students may be required to
2. Limited demands on students from test items
use English, for example, to describe and relate
themselves
historical facts, to explain physical and scientific
Normally, test items are heavily weighted towards
phenomena, to compare and contrast events and
recognition-type tasks and rarely assess control
processes, or to report on an experiment. Because
(i.e. production) of appropriate linguistic forms.
a test syllabus isexpected to reflect the relationship
Students may be very used to manipulating
between linguistic forms and communicative
grammatical forms transforming, for example, the
functions, we become aware of the importance of,
present into the past tense, or direct into indirect
for example, finite verb forms — present and past —
speech. There is therefore a strong possibility that
for 'description', of the passive verb group for
students produce correct answers on traditional
'process', the perfect verb group for 'development',
multiple-choice test formats mechanically, and do
use of the modal group for 'ability', 'possibility',
not demonstrate an ability to use the appropriate
and so on. The next step, once the syllabus has
forms as the 'real-life' context requires.
been detailed, is to select topics which are relevant
to the purposes for which your students are
3. Lack of authenticity
learning the language. These should be sufficiently
The isolated sentences format is inadequate as
general and accessible to all students, and varied
there are a large number of items which cannot be
so as not to favour any particular group. After this
adequately assessed without reference to the con-
comes the selection of a suitable text, followed by
text in which they are normally found. In actual
the design of the test items.
communication, an appropriate linguistic form is
Example 1 is an illustration of the way in which,
selected for its function within a text, which also
within the context of a report on an experiment,
involves decisions about the overall function of the
the ability to produce grammatically correct and
text itself.
appropriate verb forms can be evaluated. It mainly
Having outlined above the invalidity of the uncon- involves the selection between contrasting verb
textualised method of assessing grammatical com- forms of the past active and the past passive.

Example 1

READ THROUGH the instructions for this experiment.

Pour the water into a displacement vessel until it overflows through the pipe into a measuring jar.

Read the water surface in the measuring

level of the jar. Then lower the solid into the vessel until
it is completely covered by the water

YOUR REPORT
Water 1) into a displacement vessel until it 2) . . through the pipe into the measuring
jar. The level of the water measuring
in the jar 3) ... . . . Then the solid 4) into the
vessel until it 5) by the water

45
Pauline M Rea
Contextualisation of items in this manner allows assessing grammatical competence at the level of
the realistic assessment of mastery of individual the verb phrase, focus on appropriate tense sel-
grammatical elements appropriate to, in this case, ection and accurate verb formation within the
the genre of sub-technical report writing. Depending communicative category of description are given
on the level of students, we may require the below. In the first example, students are expected
recognition or the (cued or uncued) production of to insert the most suitableword for each space in
the required answer. Two further examples, the passage.

Example 2
PORTUGUESE EXPLORATION OF THE COAST OF WEST AFRICA
The progress of Portuguese exploration 6) slow at first. Madeira 7) discovered in
1418, but Portuguese ships 8) not pass Cape Bajador until 1434. The Azores 9)
first sighted in 1439 and in 1441 Cape Blanco 10) rounded, and Arguin just to the south
of it 11) discovered in 1443 by Dias and Tristao

The next illustrates a cued production task in most suitable form so that they make sense within
which the verbs to be used are provided in brackets. the passage.
Students are instructed to put these verbs into the

Example 3

KINGDOMS OF THE SAVANNA

In the savanna region of the Congo there 12) (rise) powerful kingdoms and empires whose
beginnings can be 13) (trace) Long before this time,
to as far back as the fifteenth century.
the area must have 14) (inhabit) by Bantu speaking people who by A.D. 800 15) (live)
in organised agricultural communities, and in some places had already 16) (make)
long distance trade contacts with the east coast. Little was 17) (know) about the history
of these states until Jan Varsina's book, Kingdoms of the Savanna, 18) (appear) in 1966.
The account which 19) (follow) 20) (come) mainly from this book.

The final example is designed to assess word order, one word has been omitted from each line and
especially at the level of the adverb phrase (with word is listed on the right of each line.
that this
adverbs used as modifiers), and noun phrase (ad- They have to choose the place (a, b, or c) where
jective placement). Students are instructed that the word should be written.

Example 4

OMITTED
WORDS
(20) Aa / walk through any / of the
c
/ slums city
If —
(21) in the / world / is an / experience. unpleasant
c
(22) They*/ start just / outside the city / frequently
c
(23) limits. / As / they are / peopled by usually

The final, important stage in the process is the Our answers to these questions should indicate
check on the adequacy of our testing procedures. the extent of the test's potential validity, which
We may do this by asking four main questions.- operates at three distinct levels:

1. What is the test measuring? CONTENT i) syllabus — content and coverage

2. How does it match the test
syllabus? COVERAGE ii) method — contextualised language samples
3. Which approach is used? METHOD from the appropriate field of dis-
4. Which item-types are used? FORMAT course

46
'auhne MRea
iii) format — authentic item-types defined by of language have to produce grammatical elements
their function within the selected which are determined by their overall function
text. within sequences of linguistic events; this requires
an analysis and synthesis of these events rather
It should be clear from the preceeding examples
than what might be simply the result of fortuitous
that the selection and production of the required
an item, or a mechanical response.
recall of
structure and grammatical form is determined not
only on grounds of grammatical correctness, but
also by its overall function within a given com-
municative area. These examples have been used to References
illustrate one alternative way in which linguistic Cummins, J.,'Psychological Assessment of Immigrant
Children: Logic or Intuition?', in Journal of Multi-
competence may be evaluated within a communi-
lingual and Multicultural Development, Vol. 1, No. 2,
cative model of language teaching, learning, and 1980, pp. 97-111.
testing. Although emphasis in the test items reflects Halliday, M.A.K., 'Language Structure and Language
a concern for appraising a large number of gram- Function', in New Horizons in Linguistics, Lyons, J.
(Ed.), Penguin Books, 1970, pp. 140-65.
matical items, the method and format used is such
Leech, G & Svartvik, J., A Communicative Grammar of
that individuals are required to integrate their lin-
English, Longman, 1975.
guistic knowledge in a way similar to the normal Wilkins, D.A., Notional Syllabuses, Oxford University
use of language for communicative purposes. Users Press, 1976.

47
S F Whitdker

Dictation as a testing device

No doubt there are fashions in testing procedures, written record, we must see that dictation has
as there are in clothing and cars. Dictation, essay, more relevance to measuring linguistic ability
precis, and transformation exercises are 'out', like than is generally recognized. Add to this theorizing
mini-skirts and spats, but multiple choice and the remarkably close correlation found in practice
Cloze are 'in'. Yet rational currents are discernible: between results on tests in dictation and those on
a search for economy, simplicity, speed on the one the best alternative tests devised, and we cannot
hand, and for validity, or something approaching ignore its claims. (See the work referred to in Oiler,
natural language use, on the other. If a case is to 1979, and in Carroll, B., 1980.) In order to use
be made for the use of dictation for testing pur- dictation to advantage, we must decide what it is

poses (and also for teaching purposes), we must we want to test, and then try to conduct the task,
consider in turn (1) the language skills that are and score the results, in such a way that we give
involved, and therefore (2) the system of scoring credit where we consider it is due. This clearly
results so as to measure these skills; (3) the sorts does not mean unthinkingly dictating in such a
of texts which might be chosen, and (4) the prac- way, at such a speed, and with so many repetitions,
tical procedure of dictating: speed of delivery, that no student feels stretched or unsure, and then
length of utterance (or frequency of pauses, the scoring only according to success in spelling,
length of pauses, the number of repetitions (if including punctuation — unless we have decided
any). that that is all we want to test.

Reasons for unpopularity of dictation

The language skills involved
(z) Dictation does not lend itself to mechanized
The major skill, in taking dictation, is aural com-
scoring. A vital feature of it is that it allows scope
prehension (with all that involves). The writing
for the learner's processing of aural input, and this
down of the text provides evidence of that com-
processing yields a remarkable variety of products.
prehension. Filling in charts, tables of figures,
Only the use of a standardized typewriter, or of an
completing diagrams, in accordance with dictated
electronic word-processor, might permit mechan-
instructions, are non-verbal alternatives which may
ical scoring of the individual student's output — if
have their value. Since we rightly emphasize realism
he is a competent typist!
and naturalism in language learning designed to
(«) Dictation does not have an obvious face-
promote communicative ability, we can select
validity, since not many normal language-users
texts, and contexts, in which messages are delivered
spend time writing down what they hear read out
orally, and typically in monologue (one-way com-
to them. It may also be out of fashion if it is
munication, because in a test, at least, a class of
regarded as a purely 'passive' skill, in which only
students cannot all be given a chance to put indiv-
the oral source, whether man or machine, is active,
idual questions).
and dominant. It has a relish of another age, when
Aural comprehension involves a much deeper
classic tasks (like producing a precis, one-third of
sounds than is generally realised
level of processing
the original length) were meekly performed by
by the competent speaker. A parrot, even a cliff-
students, and duly inflicted, by those who became
face, may repeat a stream of speech — with some
teachers, on the next generation.
sort of neuro-muscular 'processing' in the case of
The flexibility of dictation the bird. But to identify units of meaning, in that
But when we consider the various possible treat- stream of sound, calls upon powers of analysis,
ments of a dictation (from noting a telephone matching, recognition, and synthesis, of a high
message down correctly, through taking down a order. It is well known how misunderstandings
letter in shorthand, or writing minutes, to repro- between native speakers can arise through wrongly
ducing faithfully each graphic feature of a text 'perceiving' one sound, or misinterpreting a homo-
which is already perfectly written down), and phone, with the re-casting of the whole utterance
when we reflect upon the language competence which that entails. Most examples are significant
required to process a stream of speech at natural only in the circumstances in which they occur, and
speed, and to transform it into the conventional become trivial if quoted, but some have become

48
S F Whitaker
classics. (Sadly my cross-eyed bear; Send tbree- focusing on the Without the necessary
learner.
and fourpence; She left him with three hundred rigour in its would be use-
presentation, dictation
children.) We need only look at the efforts of less, and students' acceptance of this rigour must

students attempting a suitably difficult dictation be obtained, or they may feel frustrated, and fail
to see how fundamentally constructive their pro- to cooperate — to their own disadvantage.
^cessing of aural data is: words that they know, or
Alternative texts
that seem plausible, will be written — because they
The more plausible and realistic the context for
have been 'heard' — instead of the original. (See
dictation, the easier this acceptance should be.
examples by H.V. George, in Oiler & Richards,
What is plausible and realistic will depend on the
1973); and in Oiler, 1979, pp. 276-285.) 'A
learner's situation, his interests, his imagination.
brought taste in music', instead of 'a broad taste'
(A little make-believe
is often appropriate and
(phonologically explicable, and 'heard' by an
enjoyed.) Recorded information and announce-
Armenian as well as by a Chinese); 'settlements',
ments offer possible material, as do songs if one is
for 'sentiments' (U.S. pronunciation); the pro-
anxious to get the words down. There is a built-in
duction of one preposition when another was dic-
incentive. Ringing travel enquiries might yield
tated: of instead of for, for instead of with, of something that would be written down like this (if
instead of on or at. We remember that these pre-
time allowed):
positions will be in their weak phonetic forms,
and all these errors demonstrate the operation 'To get to Birmingham by 23.00, you should
of a 'transitional competence', of a 'grammar of take the 19.37 from Grantham, arriving at
expectancy'. Even the words are not 'given' (as Peterborough at 20.03, and change at Peter-
Lado alleged): they have to be inferred or con- borough, departing at 20.23, and arriving
structed through linguistic ability, from various at Birmingham at 22.52.'
kinds of evidence arising from aural stimuli.
Further study and analysis of students' per- This involves 'comprehending': (a) times (given in
formance demonstrate even more forcefully
will time-table form, probably), (b) key verbs (get,

how the taking of dictation calls on an 'integrative' take, change, depart),and (c) place-names, a special
language competence, which combines elements of area of lexis, helped by the possession of as much
phonetic discrimination, availability of vocabulary, geographical knowledge and previous experience
understanding of structure at the phrase, clause, of characteristic English place-names as possible.
sentence levels and above, together with overall Of course this, like much information 'dictated'

textual comprehension. It is true that it involves a in thisway, would normally be taken down in
largenumber of different language features: this note form. Notes imply intelligibility to the writer,
makes it difficult to offer specific remediation for and may be in his own code; but he must be able
errors committed, but it is precisely what gives dic- to reconstruct the message from the notes, and it
tations, even more than Cloze tests, their pragmatic is not unrealistic to suppose that he might write it

validity, as approximating to natural language use. so that it would be intelligible to someone else.
The minimum written version might then be: To
Recognising the relevance of dictation get to Birmingham by 23.00 take the (Note that
A number of practical decisions will have to be 'train' is understood from this use of the definite

taken, once the principle has been agreed. But it is article with a time) 19.37 from Grantham arrive ,

the conviction of its which will govern

validity (or arriving) Peterborough 20.03, change Peter-
these decisions. It is also important that this con- borough^, depart (ing) 20.23 arrive Birmingham
,

viction should be shared by the students carrying 22.52 In assessing and scoring such a dictation,
.

out the task. They must accept that the task of one might decide that there are 18 essential bits
grasping the full content of (for example) an of information to be intelligibly recorded in writing,
announcement, a broadcast, while difficult, is a and these have been underlined. It is not the
desirable skill, and that it must be practised when spelling (eg of Peterborough) that is the criterion,
it is difficult rather than when it offers no chall- but the recognizability of each bit of essential
enge. It is of a nature which must exclude any information in the written version. This may some-
individual plea for external assistance when it is times require some discretion (and the spelling of
being trained, and of course when it is being tested. some place-names is unguessable), but the criterion
(I.e.Face the challenge with your own resources, is a relevant one.
instead of asking to have 'the answer' handed to Writing down a 'telephone' description of lost
you.) In that respect, it is individualised instruction, property would provide another plausible task, in

49
5 F Whitaker
which the criterion of success would be the num- justify the repetition of each burst, but once only,
ber of specific bits of information. in order not to approach the frustration threshold
too closely. Readers may like to experiment with
E.g. 'It is a small, black, plastic bag, with a metal
the division and delivery of a short narrative
clasp, gold-plated, and my initials, DJB, in
extract, from William Trevor:
small silver letters on one side. There is a
narrow strap for holding it, which has become 'Without in any way sounding boastful, Edwin
'

rather worn and frayed in the middle told her of episodes in his childhood, of risks
taken at school. Once he'd dismantled the
Students might enjoy getting a good story elderly music master's bed, {rising or falling
written down accurately, so that they could later intonation?) causing it to collapse when the
memorize and re-tell it. Here notes would not music master later lay down on it. He'd removed
suffice: fully articulated sentences would be the carburettor from some other master's car,
required. But though spelling is certainly not un- he'd stolen an egg-beater from an ironmonger's
important, it would not have enough significance, shop. All of them were dares, and by the end of
in relation to the task, to justify spelling errors his schooldays he had acquired the reputation
being scored on the same basis as others. (Oiler, of being fearless.- there was nothing, people
1979, p. 281, showed that performance in spelling said, he wouldn't do.'
did not correlate at all with overall proficiency.) A
text that is to be dictated, like one used for Cloze
The dictation of punctuation
procedure, should be worthy of the attention it
There is no clear reason why a text, spoken at
will receive, either because it fits well enough the
something like the normal speed and in something
lowly purpose or level it serves — at an elementary
like the normal manner, should solemnly include
or intermediate stage — or because it is satisfyingly
the announcement of 'comma', 'speech marks',
worded and expresses some memorable thought —
'full stop'. These are not part of the spoken lan-
at a suitably advanced stage. It should be well
guage, which has another signalling system. If
written.
knowledge of certain conventions of the written
Length of sections between pauses system is being tested at the same time, that know-
It may be agreed without argument that dictation ledge can be displayed better by the student sup-
should be at a speed close to 'normal', with longer plying the punctuation he judges necessc-ry, as in
pauses for writing, rather than at an unnaturally straight composition, rather than by his showing
slow speed, with words separated, and shorter that he recognizes the words 'full stop', and
pauses. For a standardized examination, a tape will succeeding in making a dot on the line. In this
be provided, or precise instructions given regarding way the learner will be actively producing even
timing and repetition. The teacher working on his more for himself, in the act of taking dictation.
own will be guided by experience and reflection. Scoring will again require the exercising of a little

While it is obvious that sense-groups must form the discretion.

basis for deciding where pauses should be made in
the dictation, the border-line cannot always be Conclusion
clear-cut. In fact, noun phrases are likely to be cut Inevitably, there are some difficulties in the pre-

off from verb phrases, head-words from defining paration, execution and scoring of a dictation, but
relative clauses, and verbs from their complements. these are not as great as the difficulties, and
Some texts will have to be rejected as unsuitable expense, of compiling really satisfactory multiple-
for intelligible division into suitable short portions choice tests. Even more important, the validity of
for dictation at a given level. Texts will always dictations, when it has been measured rigorously,
need to be carefully prepared, and pauses studied appears to be much higher. (Oiler, 1979, p. 267.)
and marked, with particular attention to inton- 'Dictation and closely related procedures probably
ation, respecting the natural tunes of speech-that- work well precisely because they are members of
has-been-exploded, rather than adopting the the class of language processing tasks that faith-
'listing' During practice it may be
intonation. fully reflect what people do when they use lan-
better to err towards dictating longer bursts, guage for communicative purposes in real life con-
stretching the short term memory span, thus pro- texts.' It is not suggested that dictation should be

viding enough data for successful processing, rather adopted as a predetermined package, but app-
than uttering short fragments that need to be re- roached and developed according to a pragmatic
interpreted as the co-text is added. This may well view of language teaching.

^^^^^B
S F Whitaker
Summary of suggestions References
i. Reflect upon the validity of dictation for Carroll,Brendan, Testing Communicative Performance,
yourself, so that you can use it in a way that Pergamon, 1980, p. 99.

exploits
„
V .
its useful features.
, ,. . , , ,
°" er '

Oiler, J. W.
}*'
&
*?!*£*&
Richards,
?" " Sf^°
s

C. (Eds.), Focus
J.
Z 1979
' ^^^
on the Learner.
-

U. Convince students of its value, through the pragmatic perspectives for the language teacher, Rowley,
relevance of your procedure and the satis- 1973.
faction and enlightenment that reasonable
success at it can bring.
iii. Select and prepare your text judiciously; pre-
sent it strictly in line with principles agreed

upon. Each occasion is a real linguistic en-

counter, even though only 'one-way', in the
oral-aural mode.
iv. Score the performance in a way that is con-
sonant with your priorities in teaching.

51
Penny Frantz is

Listening Comprehension: some comments on the

testing and marking of written communication

Listening to and understanding spoken English ject 'studio commentator' or on the verb form
involves the student in a range of skills: for ex- itself, such a lapse being at variance with this
ample, the ability to identify individual words particular student's usual grammatical performance.
from a blur of speech, recognizing the significance What seemsto be restricting the students' accu-
of stress, intonation and syntactic patterns, and rate decoding of the message is their limited
retaining what is heard long enough for the message experience of the range of phonological realizations
to be understood in its entirety. Along with these that words or strings of words can possess. Further-
skills is the ability to anticipate or predict what is more, this limitation seems to override any seman-
likely to be heard in a given situation, using clues tic or syntactic knowledge the student could use-
drawn from the cultural context in which the fully bring to bear on the listening comprehension
speech is heard and from the observation of such process, and is characterized by the common com-
features as the speaker's facial expression, speed plaint that the English the overseas student hears
or loudness of voice. in no way corresponds to the English
Britain in
Comparing examples of students' transcriptions he has exposed to in his own country.
been
of spoken discourse in the form of a news broad- The following description is of an attempt to
cast from a tape with the original text, one finds tackle this problem. The primary consideration
some indications of the complexity of the listening was to make the student aware that full and
task. explicit articulation of each word is not a feature

Original text Student transcription

1 . as a result of as result of

2. planning a holiday planning holiday

3. in favour of granting in favour granting
4. due to land in this country due to London this country
5. the studio commentator read the rest, and studio come into the reverence giving rise
giving rise to to/the studio commentator never rest giving
rise to

6. to wear their own clothes weather and clothes

In transcribing, any visual clues are absent, the of normal speech and thence to enable him to
acoustic paramount and the listener's
signal is identify words from the blur of elided, assimilated,
internalized knowledge of syntactic, semantic and stressed and unstressed vowels and consonants that
phonological rules are what he must refer to in make up the acoustic signal he receives.
order to decode the message on the tape. Although At the same time, however, it was recognized
in the first three fragments above, the students' that close attention to spoken discourse at the
comprehension is not in question, what is surprising phoneme level results in the loss of overall com-
is that the students were aware of the use of articles prehension of the text. A combination of exer-
and familiar with such strings as 'in favour of, and cises, tasks or testshad therefore to be devised to
'as a result of, yet they failed to use this know- ensure that predictive skills were encouraged, over-
ledge to modify what they heard, whereas the all comprehension was not lost and that phoneme

native speaker would automatically have cued in discrimination was developed — in. other words
the missing words. Similarly, in example 5, the that the students' awareness of the phonological
listener codes 'rest' as a verb but fails to provide potentiality of the language was extended.
any compatible number marker either on the sub- The material chosen for presentation at a weekly
52
Penny Frantzis

one-hour session was a two-minute news bulletin same time, however, two of the problematic
from Radio 4, recorded the day before the class to features of spontaneous discourse are still very
allow time for a transcription and preparation of much in evidence in the material chosen: rapid
exercises to be made. The language of a news delivery and incomplete articulation.
bulletin is, of course, distinct from the spontaneous The presentation of the material falls into three
spoken discourse of conversation in that the lan- sections.The first section deals with overall com-
guage heard is being read from a prepared text and prehension. The two-minute news bulletin is played
is thus devoid of such verbal redundancies as hesit- and the students are instructed to identify the
ationr-re-phrasings Furthermore, as
and false starts. number of news items (usually between six and
the maximum amount of news has to be condensed seven items). After students' answers have been
into a given time, the information content of a compared and a consensus has been reached, the
news bulletinextremely high, the choice of syn-
is tape is played again. This involves a writing task:
tactic structure correspondingly economic, deli-
is jotting down keywords or notes to identify the
very is rapid and the articulation is not always gist of each news item. Students then pool their
explicit. To counter objections that this is a very information in pairs or in groups, and are invited
restricted diet of spoken discourse, it should be to give the main content of each item, thus even-
mentioned that tapes of conversations, interviews tually arriving at a general outline.
and lectures are presented for study in other An alternative presentation involving reading
classes. The rationale behind the choice of a news can be used for testing purposes. Students are
bulletin, however, was based on several factors. given a sheet containing a list of ten possible head-
Firstly, it was felt that its content was of general lines and asked to pick out those they expect to
interest and relevance to the student, informing be included in the news bulletin before it is actually
him of the current issues in Britain and providing heard. While listening to the tape, they are in-
him with topics and the necessary structure and structed to number the headlines in the order in
vocabulary to initiate or join in English conver- which they occur.
sation outside the classroom. It appears also that During the next re-play, the tape is stopped at
the ability understand news broadcasts can
to intervals to allow more precise quesioning and to
serve to combat feelings of loneliness; one student identify any vocabulary problems. Names of people
remarked that he spent a great deal of time studying or places unlikely to be known are written on the
in his room, with the radio providing background blackboard or on a handout, and any cultural
music, and had very little contact with the 'outside information crucial to general understanding is
world'. Subsequently having attended the class, provided either by the teacher or students.
he actively listens to the news coverage instead of Throughout this stage, it has frequently been ob-
allowing a stream of sound to wash over him and served that students in their replies to questions do
feels much less isolated from the community in not use the vocabulary of the actual news bulletin
general. but provide a synonym, the inference being that
Secondly, given the repetitive nature of news the word heard (although understood) may not
bulletins in Britain, i.e. strikes, earthquakes, hi- yet be in their active vocabulary: e.g. 'strike' is

jacks, riots and the Financial Times Share Index, used instead of 'industrial action' or 'stoppage';
lexical items as well as structural features are 'wind' instead of 'gales'; 'snowstorms' instead of
frequently recycled in slightly altered contexts, 'blizzards', etc.
allowing the student the opportunity to consolidate A technique which can be used at the more
his previous learning. Reinforcement of the material detailed questioning stage is a series of oral state-
presented is, of course, also available outside the ments constructed so as not to include sections
class in the form of newspaper articles, TV and from the text, requiring 'true or false' judgements.
radio news coverage. Students reply by saying or writing T or F to each
Moreover, unlike spontaneous discourse, in a statement. Written replies can be checked by
news bulletin there are no changes of register or means of self-monitoring, pair work or group work.
significant differences in the accents of the news The amount of material listened to before the
readers and thus a measure of continuity can be tape stopped and questions are asked can vary
is

guaranteed each week. These constant factors have in length depending on the density of the infor-
the advantage of enabling both teacher and student mation carried. During a whole course on listening
to become more clearly aware of the progress comprehension it is noticeable how students'
made from week to week, thus quickly generating auditory memory develops and retains increasingly
confidence in listening to radio broadcasts. At the more information. Even when the material has not
53
I

Penny Frantziis

been totally understood, the student's echoic more suited to a self-study mode in the language
memory (i.e. of the sound signal) is frequently laboratory than to a class presentation.
long enough to enable him to arrive at the sense of The second stage of the presentation dealing
the stream of sound, especially if prompted by a with phonemic discrimination more obviously
searching question. This whole section could be involves testing techniques. Students are given a
presented in the form of a series of short multiple- sheet containing gap-filling tasks.
choice questions, but this, as I will amplify later, is

Text A Transcription
Every fifth word has been omitted in the text
below. Listen to the item and fill in the gap.
The Civil Service campaign support of . . . The Civil Service c ampaign[jnJsupport of
wage increase .... brought two of Scotland's wage increase ) has] brought two of Scotland's
airports to a complete | major] airports to a complete IstandsriTTI
Air traffic controllers at and Edinburgh Air traffic controllers at |Prestwick| and Edinburgh
walked out after 7 o'clock this walked out shortly] after 7 o'clock this |morning|
and they're not expected resume work until . . and they're not expectedftojresume work until
this this lafternoon.]

TextB Transcription
Listen to the news item and fill in the gaps in the
transcript below. Each dash represents one letter.

The employers' organisation says the The employers' organisation |the CBl| says the
recession - - deepening but - - - - - recession lis still deepening but |in its|

report there are some signs quarterly] report |it says] there are some signs
- - - levelling out. The report predicts that it is]l evelling out. The report predicts
small decline next 4 a further small decli ne |during the] next 4
months and cautions - - undue optimism. months and cautions against undue op timism.
The report that encouraging signs The report |says| that encouraging signs |should|
— distract attention - - that |not| distract attention from th e factjthat
!

manufacturing output is 12% its manufacturing output is over 12% below its
1975 level. 1975 level.

TextC Transcription
Words have been missed out from the transcript Many peoplejin Lincolnshire and
below. Indicate with a '/' where the words should
Oxfordshire arelstilllwithout
1 1

be and write in the margin the words you hear. electricity aft er th e |weekend|
i I

The number in brackets indicates how many blizzards, and full supplies may
words are missing in each line. not| be restored until llate)
Many in Lincolnshire and Oxfordshire are without (2) tomorrow Ex tra . tea ms of
electricity after the blizzards, and supplies may be (3) engineers] h ave| been] brought in
restored until tomorrow. Extra teams have brought (4) from Cumbria
as far |afield| as
in from as far as Cumbria to help restore power to (1) power to three
to help restore
three homes in Lincolnshire and people in Oxfordshire. (4) [thousand] homes in Lincolnshire and
|about two thousand| people in
Oxfordshire.

Text D Transcription
Complete the following gaps with numbers after
listening to the news item.

All ... . were arrested . . . years ago and are All 4 were arrested 10 years ago and are
accused of handing out rifles, . . . accused of handing out 74,000 rifles, 300
artillery pieces and more than rounds and more than 10,000,000 rounds
artillery pieces
of ammunition to their militia. of ammunition to their militia.
54
Penny Frantziis

The first exercise (Text A) is and

a cloze exercise WILL is clearly being contrasted with WILL NOT,
generally causes few problems after the first com- although no explicit reference is made to a change
prehension section has been dealt with. However, of mind.
it is a useful indicator of inaccurate comprehension During the course of the lesson, the student will

and alerts the teacher to the problem. An alter- hear the entire news bulletin five or six times, for
native exercise as described by
might be such the most part concentrating on different listening
H. Templeton (ELTJ XXXI,
1977) in 'A new 4, skillseach time, the phonological properties of the
technique for measuring listening comprehension' language having been explored at the same time as
— a cloze exercise specially designed for listening, the meaning of the text has been unravelled and
where no reading is involved and where the sound sound and syntactic patterns have been registered.
is bleeped out of the text at given intervals. This material can, of course, be used on a self-
Text B enables the teacher to concentrate on access basis in the language laboratory. The
words^or strings of words which in their unstressed selecting and ordering of a list of headlines, a set

or elided forms are likely to cause problems of dis- of multiple-choice questions and true/false state-
crimination for the student. The tape should be ments would cover the initial comprehension stage.

played and repeated as often as

in small sections Testing and developing phoneme discrimination
necessary, allowing the student time to read the would be covered in the second section with gap-
text and write in the words. filling exercises as illustrated for some of the news
Similarly, Text C allows the teacher to remove items. These tests could all be checked by the
from the text such grammatical features as articles, student himself against a full transcription.

prepositions, and weak verb forms, which require The transcription exercise, a sophisticated form
acute phonological discrimination and syntactical of dictation where the sounds, however, are not
awareness. By systematically drawing attention to distorted by segmenting, is particularly useful for
these forms in this way, the teacher can discover the teacher and the student in diagnosing problem
the problem areas of the students and can also making the teacher aware of the
areas and also in
sensitize the students to these difficulties. complexity of the listening task the student is
Text D concentrates on the understanding of confronted with. E.g. The phrase 'in and out of
numbers, a special skill which can be developed was transcribed as 'in doubt of by a number of
separately by a similar gap-filling exercise with students, reduced was the 'and' acoustically
so
recorded time-tables, the Financial Times Index or that a strong conviction about the meaning of the
even football results. Interesting intonation pat- text was required for the student not to believe his
terns can also be observed in the latter, and own ears!
students have fun predicting draws, wins or losses One final comment about this particular self-
depending on the intonation of the broadcaster! access format. I have used it for some time as a
The final section of the presentation involves listeningcomprehension progress test administered
the student being given a full transcription of the to a group of students at the beginning of their
news bulletin. This time the text is followed while general language course and at the end of the
listening to the tape. Exercises requiring students course. The first test is not administered until the
to mark in tone unit boundaries, stressed words students have become familiar with all the types of
and stressed syllables, and to check for contrastive test during two or three class sessions, and the final
stress are of great help in developing listening test results correlate well with the teacher's sub-
ability. The of the position of the
significance jective impression of the students' progress and
stress in the following sentence, for example, falling with their performance in general language pro-
on a word usually unstressed needs an explanation: ficiency tests. Furthermore, the student is im-
mediately made aware of his progress by the degree
understood that Mrs. Thatcher WILL now
'It's
of ease with which he feels he has understood the
make a statement in the House of Commons
news broadcast.
about her talks with President Reagan.'

55
Keith Morrow

Testing spoken language

There is general agreement among language teachers 2 It is very difficult to get the pupils to say any-
that testing students' command of the spoken lan- thing interesting. I don't mean, of course, that

guage one of the most important aspects of an

is we should expect pupils to entertain us with
overall evaluation of their language performance; witty anecdotes or brilliant conversation, but
at the same time, it is widely recognised that most for a pupil's spoken language to be 'interesting'
public or institutional examinations, if they have for testing purposes, he/she must be able to do
an oral component at all, often attach far less a number of things.
weight to it than they do to written papers. And,
a. He must have the chance to show that he
furthermore, most teachers, if they wish to devise
can use the language for a variety of pur-
their own informal oral tests, find the job an
poses. These will include both the language
extremely daunting one. So we have a situation of
of 'reporting' (e.g. describing, narrating) and
generally recognised need and pitifully inadequate
the language of 'doing' (e.g. apologising,
supply. Why should this be?
warning).
The first and most obvious answer is that testing
command of the spoken language in any systematic b. He must have the chance to show that he
or realistic way is extremely difficult. Some of the can take part in a spontaneous conversation,
reasons why it is so difficult are discussed in the responding appropriately to what is said to
next section, but there is another reason which it him by another speaker and making relevant
is worth mentioning briefly. In many countries, contributions.
the educational system as a whole values the written
c. He must have the chance to show how he
word above the spoken. The study of literature has
can perform linguistically in a variety of
prestige, while the ability to use the language for
situations, adopting different roles and
everyday communication, especially in speaking, is
talking about different topics.
less acceptable as proof of academic or intellectual
worth. Interestingly enough, many language Needless to say, it is difficult to devise testing
teachers now see their job very much in communi- procedures that meet even one of these criteria,
cative terms, so there is often a mis-match between let alone all three.
the requirements of official examinations and the
3 Assessing what the pupils actually say is very
sort of language which teachers want to teach.
There are many casualties in this battle, but the difficult.What sort of scale can we use? How
reliably can we indicate one pupil's performance
most likely sufferer is the spoken language.
compared to that of another? How reliably can
SOME PROBLEMS we indicate a pupil's performance now com-
But what are the problems which a teacher wishing pared to the same pupil's performance six
to devise some end-of-term or end-of-course oral months ago?
tests is going to face?

1 Oral testing is very time-consuming. If you have

SOME SUGGESTIONS
30 pupils in your class and you want to spend These are suggestions rather than solutions, because
I don't want to claim that they answer all the
even 5 minutes with each one, it will take at
problems. But they might encourage experiment-
least 2Vi hours, and probably a lot longer.
ation.
Furthermore, it is 2Vi hours when you must be
there at the same time as the pupils, whereas a 1 Tasks
written paper can be marked in your spare time. My plea is that all evaluation of spoken language
An additional problem is relativities. If you should be based on a task or activity which the
devote only 5 minutes to an oral test, and say pupil performs through using the language.
1 hour to a written test, it is clear which the Furthermore, the task should as far as possible be
pupils will perceive as being more important, one which the pupil might recognise as being the
even though this may be the exact opposite of sort of thing that real people in the real world
what you intend. really do. So, for example, the traditional oral test

56
Keith Morrow
where pupils are asked to read aloud a passage in 'Repeat' or 'Please repeat' or
the foreign language and then answer a series of 'Could you repeat that please'?
questions on it would not meet this criterion of
'reality'. Exactly what sort of task you ask your
FLUENCY: How do you equate a pupil who
pupils to perform will depend very largely on the
says very little, all of which is

syllabus they have been following, but if they are

totally correct, with one who
familiar with role-play exercises, these are an contributes well to the task but

obvious possibility. Of course, even role-plays are

whose language may have some
mistakes. Clearly, there is no
not 'real', but the classroom can never re-create all
point in being totally accurate if
aspects of the world outside. The question is
whether it is worth incorporating as much as we you never say anything: equally
clearly, we do not want to en-
can, and for me there is no doubt about the
answer.
courage the idea that anything
goes.

2 Groupwork Finding the right balance between these aspects

Setting tasks to groups of students for evaluation
of production an extremely difficult and con-
is
purposes mayhave at least two important advan- troversial area,and it may well be that different
tages. Firstly, it may help with the problems of
things should be stressed at different times in the
time outlined above, in that a number of students learning process. However, the main advantage of
can be evaluated simultaneously. Equally impor-
using labels like these as a basis of evaluation is
tantly in terms of time, it will allow tasks to be set
that you can describe the type of language pro-
which will take longenough to complete for the
duction you are looking for in terms of what the
pupil to feel that what is being done is significant
pupil can do. It might be interesting to look at a
and worthwhile. A second advantage is that taking
concrete example of a specification in these terms
part in a group task gives the pupil the chance to
based on the Royal Society of Arts 'Examination
use language in some of the ways described above,
in the Communicative Use of English as a Foreign
i.e. spontaneous conversation involving a variety of
Language', basic level. (;')
functions.

ACCURACY: Pronunciation may be heavily

3 Assessment
influenced by native language
The most important pre-requisite here is that the
but should be generally intelli-
teacher has to have a clear idea of what is being
gible. No confusing errors of
looked for in a particular test at a particular time.
grammar or vocabulary.
A number of criteria have been suggested for
evaluating spoken language, but not all of these APPROPRIACY: Use of language broadly appro-
may be equally relevant to all pupils at all times, what the speaker means
priate to
and of course there may be other criteria which no subtlety should
to say, though
particular situations may demand. As a point of be expected. The intention of
departure, though, it may be useful to think in the speaker is generally clear.
these terms:
FLUENCY: The speaker may often have to
ACCURACY: search for a way to say what he
What level of accuracy in gram-
matical and pronunciation terms wants to say. Contributions may
are you looking for? Is less than be limited to one or two simple
total accuracy acceptable at utterances.
a
given time?
Scoring can be on a scale of 1-5. Performance
APPROPRIACY: Are you looking for a simple or which is of the level specified scores 3, while 4-5
sophisticated degree of relation- and 1-2 are awarded for performance above or
ship between the forms of the below the criteria.
language used by the pupils and This is a specification designed for a particular
that message they
particular examination at a particular level, and it is clearly
wish to convey. For example, not applicable to every classroom. But the impor-
does it matter in a given test if a tant point to note is that performance which is
pupil who does not understand clearly not perfect in absolute terms is defined as
what is said by another says: being acceptable at the particular time.
57
Keith Morrow
Using criteria of this sort, and describing what
isexpected from pupils in these terms, provides a
framework which can be used by teachers in many
different situations.

(i) Further details of this examination can be obtained

from Miss H. Orchard, RSA, 18 Adam Street, London
WC2N6EZ.

58
Robert K Johnson

Questioning some assumptions about

cloze testing

Cloze testing is used in this paper to provide a fam- The three factors are: (1) choice of text, (2) the
iliar point of focus for questions which are applic- scoring procedure, and (3) the deletion rate.

able to language testing as a whole; not because Alderson's findings are that 'individual cloze
cloze tests are poor tests — on the contrary, they tests vary greatly as measures of EFL proficiency
have been shown to be more effective than most — changing the deletion frequency of the test
nor because they are more vulnerable than other produces a different test, which appears to measure
tests to the kinds of questions asked. different abilities, unpredictably. Similarly,
changing the text used, results in a different
'Objectivity' and the Statistical Validity of Cloze
measure of EFL proficiency (and) changes
Tests
in scoring procedures also result in different valid-
There are a number of myths quite widely held
ities of the cloze test, but the best validity corre-
about cloze testing, which have arisen perhaps out
lations are achieved by the semantically acceptable
of wish-fulfilment rather than the literature itself,
procedure.' (Ibid., p. 225)
and are based on the premise that cloze is 'an
false
Alderson concludes: 'Testers should above all
automatically procedure which results in
valid
be aware that changing the deletion rate, or the
universally valid tests of language and reading'
scoring procedure, or using a different text may
(Alderson, 1979: 220).
well result in a radically different test, not giving
Those who adhere to this belief in its strongest
them the measure that they expect.' (Ibid., p. 226)
form stress the 'objectivity' of cloze procedures.
Other investigators have of course obtained
At some point, the words 'statistically valid' or
different results, e.g., Stubbs and Tucker (1974),
worse 'proved statistically' are likely to be used as
replicating correlations obtained by Oiler and
the ultimate knock-down argument. This position,
others, showed significant positive correlations
named by Strauch 'quantificationism', and by
between exact and contextually appropriate res-
Polanyi 'objectivism' (House, 1977: 14/15), results
ponses ( r = 0.97; p < 0.01). However, there is no
from the desire to find objective procedures which
necessary conflict between the results obtained
will relieve the investigator of any responsibility
by Stubbs and Tucker and Alderson, which suggest
for the results. Given the pressure that is placed
the eminently reasonable conclusion that the effect
upon those who set tests and examinations, this is
of changes in deletion rate will vary, depending
an understandable aim, but the conclusion that
upon the characteristics of text and subject. These
cloze procedures fulfil this requirement cannot be
and related factors are discussed further below. It
sustained (1) because the claims regarding the
may be worthwhile at this point to make two
objectivity and automatic validity of cloze tests
further points about statistical data and the con-
are largely false, and (2) because statistics are data
clusions that can be based upon them. Firstly, a
and not arguments, and valid conclusions can only
statistically significant difference is not necessarily
be reached by processes of argument. (Statistics
an important difference and an important differ-
provide a basis for argument.) The first point has ence may not be statistically significant (Carver,
been demonstrated by research findings and can
1978); secondly, tests which are statistically
be amply supported by appeal to common sense;
identical in relation to a particular population may
the second is taken up in the later sections of this
yield very different outcomes when used with a
paper.
different population (Farhady, 1979).
Alderson (1979) focuses upon three factors
which are important for claims that
critically The Selection of Cloze Texts
effective cloze tests can be constructed by objective Having dealt with the question of automatic valid-
procedures which are independent of the intuitions ity, there remains the question of objective pro-
and judgements of the person constructing the cedures. Cloze tests do not satisfy the basic pre-
test, and give results which are valid and reliable. requisites for a claim to objectivity, and seem

59
Robert K Johnson
typical of that class of instrument discussed by It should be clear from the above that the
House which has 'replicable procedures', but is exercise of judgement in the selection of passages
'infected by biases and hence qualitively necessarily causes cloze tests to be 'infected by
subjective' (House, 1977: 41). bias', and iteminently sensible and desirable
is

To quote from Moser and Kalton, bias arises that this should be so. Subjectivity in the form of
when the legitimate exercise of informed judgement is

(1) the selection is consciously or unconsciously not and should not be regarded as harmful. To
influenced by human choice quote House once again:
(2) the sampling frame which serves as the basis
The evaluator must be seen as caring, as inter-
for selection does not cover the population
ested, as responsive to the relevant arguments.
adequately, completely or accurately (Moser
He must be impartial rather than simply
and Kalton, 1971)
objective. (House, 1977: 46)
'To ensure true randomness, the method of selec-
tion must be independent of human judgement.'
Deletion Rate
(Ibid., p. 82) Most studies which have focused upon deletion
Consider at least some of the major acts of
rate have shown that there is a high level of corre-
human choice based on judgements which go into
lation in the results achieved if the rate is every
the selection of a suitable cloze text for a partic-
fifth word or above, and that going beyond every
ular group of language learners.-
fourteenth word is uneconomical and unnecessary.
(a) intellectual content,
It is not valid to conclude from these studies
(b) cultural content,
that it makes no difference whether every fifth,
(c) linguistic difficulty,
seventh, ninth etc., word is deleted, and evidence
(d) register,
exists that deletion rate does make a difference in
(e) level of formality, and some cases (see Alderson above). A more valid and
(f) idiosyncracies of style, e.g., lists of items and obvious conclusion would seem to be that two
a high proportion of idioms, proper names,
very similar cloze passages, tested with identical
and numbers. populations, will give very similar results. As a
Consider too the adequacy of a cloze passage, corollary we might note that changing the gapping
usually 250-300 words in length, as a sample.
procedure for a passage may provide two tests
Clearly, it does not cover the population (every-
which are practically identical, and that this has
thing ever written in the English language?) been shown to be the case in a number of studies,
adequately, completely or accurately, and any though not in all.
attempt to do so would be ludicrous because of its
There is nothing magical about a randomizing
insensitivity to the requirements of the testers and
process for gapping a cloze passage. Given the sub-
the purposes and abilities of the learners.
jectivity of the judgements that normally contri-
Given that these requirements regarding the bute to the selection of the passage there can be
selection of a suitable passage are met through pro-
no subsequent claim to objectivity. Alderson
cesses of rational argument, intuition, and common offers a revealing analogy, and draws what seems
sense by people with the necessary knowledge and
to me to be the appropriate conclusion:
experience to support their judgements, a number
of conclusions seem likely to follow, e.g. that the The (deletion rate) procedure is in factmerely
results from a cloze test based on a carefully sel- a technique for producing tests, like any other
ected passage would correlate less highly and dis- technique, for example the multiple-choice
criminate less well. It also seems likely that, given technique, and is not an automatically valid
a highly selected passage, variation of the deletion procedure. Each test produced by the technique
rate might not affect the results to a statistically needs to be validated in its own right and
significant degree, while this would be less likely modified accordingly. (Alderson, 1979: 226)
if a passage were selected at random.
We are all accustomed to the notion of a bad
I would argue then that the high levels of corre-
multiple-choice item, yet the notion of a 'bad'
lation achieved in a number of studies involving
item in a cloze test seems alien. It should not be.
cloze tests may
be regarded as resulting from, and
Each gap is an item and capable of being validated
providing supporting evidence of, the reliability
in the same way that other items are validated.
and validity of the judgement of the person who
selected or prepared the texts rather as evidence 1. Rational justification It seems to test the kind
.

bearing upon cloze procedures per se. of thing we think should be tested which ideally ;

60
Robert K Johnson
would require a theoretical framework including Before looking at these factors in some detail, it

a theory of language, of reading (or listening), may be desirable to clarify the situation with
of the relationship between reading and cloze, regard to claims that exact scoring solves the prob-
and a full statement of the learners' aims. lem of marker subjectivity. If the examiner is in a
The fact that we do not have such a theoretical situation where the markers cannot be trusted or
framework at present does not mean that we adequately supervised, exact scoring is a useful
should abandon rational justification, which as expedient, no more and no less. The subjectivity
has been stated, is the basis for the selection of of judgement has simply been transferred from the
the passage in the There are theoret-
first place. marker, who is in a position to know what he is
ical models available and these can and should doing and why he is doing it, to the author, who,
be used. There are also insights based on experi- unless the passage was specially prepared, had no
ence which can be brought to bear. idea that his work would be used in this way.

2. Statistical justification. Does this item ad-

equately discriminate amongst the subjects in The Historical Provenance of Cloze Tests
relation to their known abilities and/or in re-
From Readability to Reader-Ability
lation to their abilities as shown in the overall
The cloze procedure was developed, as is well
test results? If not, then the item should be
known, as a test of the readability of a particular
changed.
text in relation to a particular population of
Let us consider a specific example. The passage readers.The assumptions underlying the use of the
selected contains the following sentences: procedure for this purpose were given clearly by
Taylor in 195 3, and though the behaviourist
A. Inspector McTavish mused upon the inad-
model of language on which his approach was
equacy of the clues. On the first day, (John)
based is no longer generally accepted, the basic
was interviewed by a (person) introduced to
premise
him as (Smith). On the second day (his)
body was recovered from the river. Inspector (that) a cloze score appears to be a measure of
McTavish sighed and turned his mind to the aggregate influences of all factors which
more promising areas of investigation. (No interact to affect the degree of correspondence
further mention is made of 'John' or a 'per- between the language patterns of transmitter
son' named 'Smith'.) and receiver (Taylor, 1953: 432)
Anyone trying to fill the gaps indicated would
has not been challenged and is not challenged here.
certainly sympathise with Inspector McTavish over
It is important, however, to emphasize that as a
the lack of clues. It would hardly be necessary to
measure of readability, there was and could be no
pre-test in order to be certain that the deleted
suggestion of a value judgement, only of the
items would fail to discriminate, even though the
degree of match between texts and readers. Ran-
passage as a whole may work well. Various satis-
dom deletion and exact replacement are in this
factory solutions might be proposed, but retaining
context, to use Taylor's words, 'not only defensible
such items on the grounds that they were arrived
but rationally inescapable'. No judgement was
at objectively is not one of them. Alderson suggests
implied regarding the writer or the reader, and any
that the principle of randomness might be aban-
question as to whether a non-exact replacement
doned in favour of the rational selection of del-
was acceptable or not must be irrelevant. The
etions, basedon a theory of the nature of language
point of focus is the text, and acceptance of a non-
and language processing (Alderson, 1979: 226).
exact replacement involves accepting a different
Exact and Acceptable Items as Gap-fillers text.
Selection of the cloze passage and of cloze items One of the great advantages of cloze procedure
has already been identified as an aspect of cloze as a measure
readability is that it takes some
testing which judgement should be exercised
in account of such factors as literary genius and
and decisions defended by reasoned argument. idiolectal eccentricity. To give one example, the
These same arguments apply to methods of scoring, passage Taylor used from Finnegan's Wake gained
and will be taken further in the next section of the a low readability score. To provide an equally ex-
paper, which deals with the influence of historical treme counter example, the writings of a semi-
factors on attitudes to random versus non-random literate speaker of West African pidgin would also
deletion and exact items versus acceptable items receive a low readability score. Both use the English
as gap-fillers. language in highly unpredictable ways. There the

61
Robert K Johnson
comparison ends. Above all, no judgement can be feelings, and thoughts, and it does so in some-
made or is required as to whether non-exact re- what the same way regardless of the particular
placements are in some sense worse than, or better form of the language or the culture of the
than, the original. user, as long as the language is a so-called
Once the focus changes from the passage to the natural language that is used from childhood
reader, the purpose from establishing degrees of on as a native language by its users. (Carroll,
compatability of text and reader to determining 1971: 177)
absolute levels of attainment by the reader, and
conclusions from value free to value loaded, then
It is not possible therefore to assume that the
the premises underlying the use of cloze procedures
second language speaker uses that language in ways
must be re-examined very closely indeed in order
which are directly comparable to the first language
to justify their continued use. In particular the
speaker. The experience of learning a language
wording of the passage is no longer unchallenge-
primarily in the classroom (and this discussion
able. There is no reason for example why a non-
relates only to such learners) is totally different
exact replacement should not be 'better' in some
from the experience of learning a language 'natu-
sense, e.g., more precise, more expressive, less
rally' as part of the general process of socialization
ambiguous, etc., than the original. Similarly, the
and maturation. It is not to be expected that the
rationale for random deletion no longer obvious
is
range of information, experience, feelings, and
once the purpose is the comparison of readers and
thoughts will be as wide, nor will the language
not of passages. The justification for cloze pro-
usage be particularly closely attuned to the native
cedures which provide a means for comparing
speaker culture. The imposition of a native speaker
passages becomes irrelevant. equivalence as the performance standard by which
There are a number of problems relating to the the second language speaker should be measured is
shift from readability to reader ability where the
therefore impractical. It is also undesirable in that
reader is a native speaker of the language, e.g.,
the aim is set far beyond the rather limited require-
variations of dialect, idiolect, control of the 'elab-
ments and purposes of most second language
orated' variety, etc., but these are not considered
learners and second language teaching programmes.
here, and can be implied from the arguments
It cannot be assumed, therefore, that testing
regarding the next change in the function of cloze:
procedures which are valid for native speakers can
from the assessment of native speaker reading
be applied automatically to second language
ability to the assessment of second language
speakers. In the case of cloze procedures there are
speaker reading ability.
many reasons for concluding that such an assump-
tion would be false.
From Native Speaker to Second Language Speaker Text type is one factor which is revealing. One
Evaluation part of native speaker competence seems to be the
The use of procedures with second language ability to identify different registers and styles to
speakers which were considered appropriate for the extent that text type is not normally a source
testing the language skills of native speakers in- of variation in native speaker cloze scores. Research
volves a number of assumptions. The first assump- has shown however that for second language
tion that has to be challenged is that the constraints learners, texttype is an important source of vari-
under which a second language speaker operates ation (Freeland, 1979). Freeland concludes (p. 6)
are the same as those of a first language speaker. that
The second assumption is that the second language
certain assumptions about cloze tests need re-
is acquired for the same purposes as the first lan-
examining And the practice of expressing
guage.
learners' scores as a ratio of native speakers'
Carroll has stated the position very clearly in
is between native
suspect, since the relationship
laying down principles for dealing with educational
and non-native scores can fluctuate in unknown
issues in relation to native speakers and non-native
ways.
speakers of languages as well as non-standard
dialects: Oiler came to a similar conclusion:

(1) Language is a complex human phenomenon There if any reason to assume that con-
is little

that takes the same general form wherever it is clusions from research with native speakers can
found. It permits the expression of a certain validly be generalised to the case of non-native
very wide range of information, experiences, speakers (Oiler, 1973: 107).

62
Robert K Johnson
On the question of exact versus acceptable linguistics, and little if any understanding of the
scoring, Oiler states: 'Clearly, when dealing with passage.
non-native speakers there is something counter- Two points are at issue here: the largely self-
intuitiveabout requiring the exact word' and data evident one that the intellectual content of a pass-
from tests reported by Oiler (1973: 109) age will affect the scores, and is a potential source
of bias; and the less obvious point, which for that
supported the conclusion that with non-native
reason may be more important, that it is possible
speakers the method of allowing any context-
to complete items satisfactorily in the absence of
ually acceptable response is significantly superior
any global understanding of the meaning of the
to the exact word scoring technique.
text.
Elsewhere, Oiler discusses the problem further, The test setter therefore has to make decisions
noting (1) word replacement makes
that exact regarding the intellectual neutrality of the passage
tests extremely difficult for L2 speakers, and (2) content in order to avoid criticisms that the text
that exact word replacement often requires insights favours some learners and disadvantages others. It

which may not be regarded as language skills (Oiler, may be desirable to bias the content of a passage
1972). to reflect the purposes of a particular course
Thus there is evidence of the need for principled (English for nurses; English for engineers, etc.) but
judgements to be made in the development of the question still remains whether the content

cloze tests in relation to the selection (or develop- should be truly neutral, i.e., intellectually familiar
ment) of suitable texts, decisions regarding the to all learners so that the gap-filling exercise tests
elements to be deleted, and regarding degrees of primarily language ability, or whether the content
acceptability under a system of non-exact scoring. of the passage is regarded as part of the challenge
to the learner in reconstructing a genuine piece of
communication. The latter will bring into play
The Selection of Cloze Passages
powers of deduction and other analytical skills as
Intellectual Bias well as memory, which will not be required by the
The relevance of intellectual knowledge is well less intellectually demanding passage.
illustrated by a short cloze passage taken from
Anderson (1971). Cultural Bias
The problems of cultural bias can be illustrated
B. The idea that the (1) of a language, very obviously by the following:
unlike (2) words, are probably in-
finite (3) number, so that they (4) C. blind . Three mice,
be listed, is no (5) one, how- how run. how run. all

ever familiar it (6) recently have after farmer's who

become through (7) writings of off tails a knife. you
Chomsky and (8) followers. see a in life

three mice.
Some readers may be 'tuned in' by the word
'infinite' as a cue for 'sentences' as gap-filler for Those who have had at some time or another an
(1). (Though why not 'sounds' or 'phrases', 'mess- intimate experience of certain aspects of the child-
ages' or 'meanings'?) The real clue is 'Chomsky', rearing practices of native speakers of English will
whose often quoted position on the generative be able to complete this cloze passage without dif-

power of language is invariably expressed in terms ficulty. Of those who lack this experience, only
of 'sentences', no doubt because transformational those who have studied eighteenth century British
generative grammar was until recently essentially a political history are likely to have come across the
sentence grammar. In other words, if the cloze passage in its original role as political satire.
reader does not have an intellectual grounding in It is obvious that such a passage would not be
the transformational generative orthodoxy as pro- selected because of its idiosyncratic style and the
pounded by Chomsky and his followers, he will fact that some learners at least may have memo-
not be able to gap (1) and possibly gap (5). If
fill rized it. Yet the same reasoning applies to ex-
failure to understand results in carelessness in the pressions such as:
use of clues which are available, the learner may
D. Birds of a flock together.
fail to fill gap (4) satisfactorily. The other gaps will

probably be filled satisfactorily by most native Such idioms are 'known' in very much the same
speakers even though they have no knowledge of way as nursery rhymes are known and for the
63
Robert K Johnson
same reason. Success the gap is directly
in filling almost vertically on the enemy with a sound very
related to the extent to which this particular string much like a hail storm, no doubt, as a roof of
of words has been committed to memory. Anyone shields was hastily erected as protection. The
who has not committed the string to memory will weapons may change, but the cliches live on.
insert, for example, 'kind' or 'species'. There are no easy dividing-lines to be drawn.
The step from so-called common sayings such When does a metaphor become a cliche? How do
as the above, to cliches, which are in fact far more we differentiate between a cliche and a string
common, is a short one, and objections to 'birds of which has a high level of sequential or collocational
a feather', etc., apply equally to surviving archaisms probability (Beattie and Butterworth, 1979: 210)?
such as It is no more possible to have a piece of discourse

which has no cultural content than it is to have

E. The situation was with danger
one which has no intellectual content, but as in
where 'fraught' is the past participle and only the latter case it is necessary to make judgements
surviving form of a now obsolete verb. Those about the nature of the cultural content and its
whose reading has included some time or another
at acceptability. As in the case of intellectual con-
a certain type of adventure story will be (all too) tent, the test may be deliberately biased in a par-
familiar with 'fraught'. Others have probably never ticular direction, e.g., it may be desirable for cer-
encountered the word. tain purposes to place a premium upon knowledge
The following is an even more extreme example of the British cultural heritage and of contemporary
of a particular genre: life in Britain. However, in most countries where
English is taught as a second language, or for inter-
F. The traitorous Lord Fred was arraigned at
national purposes, it would be necessary to select
Westminster before the King, condemned,
passages in which the cultural content is neutral
carried thence and into the foulest
with regard to those taking the test, and very care-
darkest in the Tower of .
ful consideration is needed to ensure that this is

Those who know something of English history achieved.

might identify the Tower of London; those who It might be argued that the modern economic

read or have read a certain genre of historical and technological world is culturally neutral in the
novel will probably identify a 'dungeon' as the sense that there is nothing specifically British

appropriate place for traitorous villains, and will about it, that access to this world is the main pur-
know that traitors are not 'put' or 'placed' or pose of learning English as a second language and
'locked up' in a dungeon, they are 'cast' into them. any bias that results from a choice of passages
On second thoughts they might be 'flung', but I
relating to this world is therefore fully justified.

would vote for 'cast'. One more example: The argument is sound in international terms, but
in relation to a national education system for
G. The posse rode unsuspecting into the ambush example it may be said to introduce an unaccept-
and was met by a of bullets from the able level of discrimination in favour of the
outlaw guns. socially and economically advantaged sections of
Those of us who have read our Westerns, not to the community, who have access to that world,
mention war stories and detective stories, are and against the socially and economically dis-
sufficiently habituated to the notion of a 'hail of advantaged, who have spent their lives in villages
bullets' not to notice how extraordinary the meta- or urban slums.
phor, which presumably preceded the cliche, really As Oiler (1973) has observed, you cannot and
is.In this context 'hail' is almost a collective noun, should not try to separate language skill from
to be included with a 'pride of lions' and a 'gaggle knowledge of the world in the measurement of
of geese'. The meaning is linked to the base performance, but it is necessary to consider pre-
meaning in the sense of a large number of small, cisely what knowledge of the world can reasonably
hard, punishing objects travelling at speed and in be expected of particular learners given the con-
close order; yet 'hail' operates on a more or less straints under which the learning has taken place.
vertical axis, while bullets, generally, do not. You If such factors are not taken into consideration the

can have a 'hail storm' or a but you

'hail stone', cloze test may become an instrument for social
cannot have 'a hail' any more than you can have and economic discrimination, or might be seen as
'a rain' or 'a snow'. Perhaps the cliche is very ven- such by those who consider themselves to have
erable indeed and goes back to the days when been disadvantaged.
archers deliberately fired high so that arrows fell

64
Robert K Johnson
Linguistic Bias depends upon a steady of elec-
In addition to the judgements that have to be tricity a break. the middle of the
made regarding intellectual and cultural content, would also cause problems. course, the
there is linguistic content. (It is not suggested of fellow could , but the human voice
course that these are discrete categories; they are not loud enough to that far. An
merely convenient points of focus.) additional is that a string break
R.E. Johnson has noted the high level of corre- on the instrument. there could be no
lation between cloze scores, and measures of to the message. It clear that the
redundancy (Johnson, 1975: 435) and raises best would involve less distance.

several points which are relevant to the selection there would be fewer problems. With
of texts, and other issues (marking and what it is face to contact, the least number
that cloze tests measure) which are discussed fur- things could go wrong.*
ther at a later stage.
In making judgements regarding the linguistic
features of a text, then, it is necessary to decide
1. Johnson notes that missing words may be in-
the extent to which the gaps in a given passage
serted without an understanding of the passage
some can be filled regardless of an understanding of the
as a whole, or even, in instances, of the
which a gap occurs. passage as a whole, and to determine whether or
particular linguistic string in
It may therefore be questioned whether the
not this is acceptable in view of the purposes for

purpose of the cloze test is in fact the recovery which the test is being set.

of the original message. It seems rather to be Before advancing any further it is obviously

the recovery of the original text. necessary to attempt to come to terms more
closely with what it is that cloze tests actually
2. He suggests that passages may prove effective measure.
for cloze testing because they are highly redun-
dant and that such passages may be extremely A Transfer Feature Theory of Cloze Items
boring in their unreduced original form. Reading has been described by Goodman as 'a
It is arguable then at least that texts selected psycholinguistic guessing game', a characterization
for cloze tests are by normally accepted stand- which many people have found intuitively satis-
ards poorly written, a strong argument against fying. The objective of the game is to achieve
accepting only exact replacements. understanding, and it is generally accepted that the
meaning of a linguistic string is not the arithmetical
On the first point, it is easy to demonstrate that product of the elements in that string, nor is the
cloze passages may test something other than the meaning of a passage the arithmetical product of
ability of the reader to reconstruct the original the meanings of all its sentences. On the contrary,
message. The following passage was constructed by the global meanings which result from the pro-
Bransford and McCarrell in such a way as to ensure cessing of linguistic strings in the short-term mem-
that the reader finds it totally incomprehensible ory are achieved partly in terms of the constraints
regardless of the fact that there are no linguistic exercised by the linguistic items in the string, and
difficulties. The fact that the writer's overall partly by means of the non-linguistic information
meaning remains totally obscure does not materially that the receiver brings to the task of compre-
affect the use of this passage as a cloze test, which hension. Smith (1971) and others have suggested
gives support to the argument that cloze tests that what the receiver brings to the task of de-
focus on relatively low order language skills re- coding is of far greater importance to eventual
lating to 'core proficiency' rather than higher comprehension than the linguistic items on the
order language skills like reading comprehension page. This extra-linguistic data was categorized by
(Alderson, 1979: 225). Uhlenbeck (1963) as follows: (1) the situation in
Clearly, the following passage tests core pro- which the sentence is spoken, (2) the preceding
ficiency in some limited sense and little else. sentences, if any, (3) the hearer's knowledge of the
speaker and the topics which might be discussed
H. If the balloons popped sound wouldn't
with him.
be to carry since everything _be
These extra-linguistic factors will be referred to
too far away_ the correct floor. A be clear
here as the 'presuppositional base'. It will
.window would also prevent
sound from carrying, since buildings
tend to be insulated. Since the whole 'See Appendix on page 71
65
Robert K Johnson
that the failure of the reader to understand the Selection of items
passage quoted earlier from Bransford and It is now possible to return to the consideration of
McCarrell, even to its fully restored form, results some types of judgement that might be made in

from the lack of an adequate presuppositional base. selecting items for gapping.
A recent discussion of cloze procedure by Finn It has been suggested that one reason for using
(1977) provides a useful means for relating the objective procedures in gapping a cloze passage is

general model of decoding characterized above to that no adequate theoretical base exists for making
the specific case of completing a gap in a cloze decisions affecting which linguistic items should or
passage. Finn proposes a 'transfer feature theory' should not be gapped. As the previous section
of processing in reading, which combines Shannon shows, we may not have all the answers, and may
and Weaver's (1949) contribution to communi- never have them, but we do have a basis for app-

cation theory and Weinreich's (1966) semantic roaching the task in terms of transfer features, and
theory. in particular in terms of the analysis of the sources
Finn defines the 'cloze easiness' of a word in of the transfer features, e.g.,

terms of the percentage of subjects supplying the

(1) Are there transfer features which would cue
exact word in a cloze task, and argues that cloze
successful completion? If not, as in the case of
easiness is a measure of the information carried by
proper names or numbers in certain contexts,
the word in the passage (Finn, 1977: 512).
the item should not be used.
Cloze easiness is affected by word frequency,
the difficulty of the passage, and the number of (2) If transfer features exist, are they drawn
times a word occurs in the passage. primarily from the text or from presuppos-
The Shannon and Weaver theory of communi- itional sources other than the text?
cation is based upon the relationship between a. If the latter, is it acceptable within the con-
'information' and doubt, where the occurrence
i.e., text of this test to use items which are de-
of a word is is no infor-
totally predictable there pendent upon intellectual or cultural know-
mation. Thus the greater the doubt (the lower the ledge? It should be noted that no compre-
cloze easiness score) the more information is said hension question would be acceptable if it

to be carried by a particular item. (It will be tested knowledge derived from sources
obvious from such examples as a function word other than the text, and the only grounds
with a low easiness score, that 'information' here is for accepting such an item in a cloze test
used quite differently from 'meaning' and is in fact would be that the non-textual knowledge
determined without reference to meaning.) forms an integral part of the learning pro-
Finn uses the term 'transfer features' to describe gramme and/or the inclusion of this pre-
those grammatical and semantic markers and 'dis- suppositional element provides a desirable
tinguishers' which supply redundancy or which bias in view of the aims of the test (a de-
generate expectancy. The term is taken from cision which most public examiners might
Weinreich and relates to the fact that the inclusion find it hard to defend) or that the infor-
of a particular word in a discourse can dictate mation in question is neutral in that it is

some lexical features for other words in the dis- readily and equally accessible to all candi-
course, and Finn defines 'information in a word' as dates.
'a function of the number of features not supplied If the presuppositional requirements for
by transfer features in discourse' (Finn, 1977: 520). the items cannot be justified in these terms,
In a theory of processing in reading, I find the the item should not be used.
distinctionbetween information and meaning in- b. If the transfer features are drawn from the
appropriate and unhelpful since the objective of text, do they relate to meaning or form?
the processor is to arrive at meaning and not Do the transfer features arise out of an
'information'. In cloze testing, however, the pro- understanding of the passage as a whole,
cessor actively seeks transfer features in order to are they dependent upon understanding
achieve a high level of expectancy regarding the the immediate context only, or do they
identity of a particular linguistic item. In this con- depend primarily upon collocation or other
text, and for my purposes (unlike the Weinreich primarily linguistic features? It seems
model), transfer features are seen as being derived reasonable to suggest that cloze tests can
from the non-linguistic aspects of the presup- only claim to test communicative skills if

positional base as well as from the text. items which depend on recovery of the
theme and reasonably precise grasp of the

^i^a^MH M^H
Robert K Johnson
meaning of local contexts are emphasised, (3) 'the' exercises a strong constraint on the
while those which require a purely linguistic meaning since it identifies the 'swearer' as the
response are restricted in number if not driver of the bus. 'a' also exercises a strong
eliminated. This does not imply an emphasis constraint in that it eliminates the driver of
on content words at the expense of function the bus and indicates some other driver.
words. It means the selection of items which
exercise a positive constraint on the (4) 'a' is eliminated by transfer
unacceptable,
meaning of the passage and which carry features from
and exemplifies a colloc-
'first'

sufficient transfer features from other ational pattern which should be familiar, e.g.,
aspects of the text to make that constraint the / first, last, only / time, chance, etc. The
readily apparent to those who have achieved meaning constraint is extremely localized;
the level of language ability which the test however, the item is suitable and could be pre-
is designed to evaluate. An example may be tested.
helpful at this stage to illustrate the notion
of constraint exercised by an item on the A subsequent part of the (reconstructed) cloze
meaning of a passage; a function word is passage reads as follows:
used in the example because the notion of
As I got off the bus, I heard the driver call out:
constraint on meaning is comparatively
'Mind the traffic this time!'
obvious in the case of content words.
Articles have been chosen because auto- It is now clear from transfer features based on
matic deletion procedures tend to provide evidence internal to the text that gap (3) should be
a large number of such items. filled by 'the'. It is considered to be an excellent
item in that it tests understanding of the text as a
Yesterday as I was crossing (1) whole and should be included in the pre-testing.
I. road,
dodging cars and bicycles to catch (2) The example given above illustrates another
bus that passes near RELC, (3) point very clearly. It is not the case that we do not
driver
swore at me in English, (4) first time
know what is being tested in a cloze passage. It
this has ever happened to me in Singapore.
seems to me that we can state rather precisely
what it is that each item is testing, and to this
Let us assume that I intend to include at least extent it could be claimed that cloze tests are not
one item in my cloze passage which tests control integrative,but a means of providing discrete point
of the distinction between the definite and in- items which have a somewhat enhanced presup-
definite articles, and I have to consider the merits positional base. However, the distinction (discrete
of (l)-(4) above for this purpose. point/integrative) is not one which can usefully be
applied to the nature of the learner's task (e.g., a

comprehension question: free choice, multiple-

(1) Neither 'a' nor 'the' exercises a constraint on
choice, or true/false might be set regarding the
the meaning of the passage, 'some' or 'my'
identity of the 'swearer' in the example just given
would be acceptable alternatives, and I would
and would thus be essentially a discrete point
not really want to reject 'Orchard' or any
question. Another question might address itself to
other road name: (1) requires some grasp of
the 'best' summarizing statement. It is possible to
English noun phrase structure, i.e., that a gap-
imagine a cloze item requiring a similarly global
filler is required, but this is obvious anyway,

and on the whole (1) is a poor item.

strategy inand evaluating a range of
gathering
transfer features, though again it should normally
be possible to give a reasonably precise account of
(2) 'a' is possible and exercises some degree of these features and their origins, and to estimate
constraint on the meaning, i.e., one of the the potential value of the item accordingly).
many buses which travel to RELC. 'the' is One result of randomising procedures in item
also possible and exercises on the
a constraint selection has been, in my experience, the develop-
meaning, e.g., that particular category of bus ment of tests which tend to be too difficult. Pass
which serves RELC and no other. Again the marks are set fairly low (30%-40%) and many can-
two are broadly synonymous and interchange- didates achieve the pass mark giving little evidence
able in this context and the item merely tests that they have understood more than occasional
an elementary grasp of the noun phrase struc- sentences and phrases, and much evidence that
ture of English. they have not. Their correct answers are achieved

I
Robert K Johnson
primarily through linguistic transfer features. who produced the following in response to the
Rational justification for the selection of items cloze test on Three Blind Mice that was considered
and a requirement that the pass mark must be in earlier.
the 60%-70% range would greatly improve the
K. Ignoring blind alleys. Three investigated
quality of cloze tests and cause more them to bear
closely upon the aims of the teaching programme mice, analysed how they run, studied how
they are used to evaluate as well as promoting farms run. Three all sought after rich farmers.
more constructive classroom practices. One, who casts off his tails to a deserving

knife grinder you often see, became a million-

Scoring
Once it is accepted that it is quite possible to have aire in his life time. Three studied mice.
exact word replacement with little or no under- Note that the deletion of every other word
standing of the text, that the exact word need not increases the difficulty of the cloze task but also
necessarily be the 'best' word in terms of precision, (dangerously) increases the reader's freedom to
expressiveness, etc., or the only suitable word, and head off in directions of his own.
that the requirement of exact word replacement (K) is of course an extreme, and highly unreal-
may introduce bias into the test in favour of those istic example, yet the point it raises is a valid one.
whose socio-economic position gives them greater By 'acceptable alternative', do we judge degree of
access to western culture in general, or native synonymity or degree of functional equivalent to
speakers of English in particular, the arguments for the original text? If acceptability is given this
acceptable word scoring become very strong narrow interpretation (as it usually is), the candi-
indeed. However, a number of issues then arise, date whose virtuoso performance is presented
and in particular a number of studies (Bowen, above would receive no marks, "and a gross injustice
1969; Oiler et al., 1972; Clarke and Burdell, 1977) would have been done, particularly when, as was
have shown the difficulty, perhaps the impossibility noted above, many candidates achieve pass marks
of setting up objective criteria for determining having offered as a reconstruction of the original a
degrees of acceptability.
text which has no theme, and is in fact gibberish,
Clarke and Burdell write (to give one example): though admittedly English gibberish. How can we
Judgements of synonymity of 'non-nativeness' place a premium on the ability of the candidate
to make sense out of the cues available to him and
are almost entirely idiosyncratic and heavily
influenced by one's dialect (Clarke and Burdell, construct a message of some kind out of these

1977: 140). cues, even if it diverges somewhat from the original

message? If we cannot do this, then we must accept
A judgement amply confirmed by the following that many cloze tests are tests of a very low order
data, which is cited as totally acceptable to Clarke of language ability indeed.
and Burdell but would be judged marginally accept- One solution, which was suggested above, is to
able or unacceptable by most speakers of my ensure through pre-testing that passages are not so
dialect. difficult that the pass mark has to be lowered to a
point where comprehension is hardly required in
J. I just wrote a hotel (letter) asking for a room
order to reach the necessary score. Easier passages
in August. (Where 'letter' was the original,
admittedly will tend not to discriminate amongst
and 'hotel' the non-exact replacement.)
the candidates as effectively as the more difficult
The first step, it seems to me, might be to passages, even though what is being tested is more
abandon the notion of acceptability to a native rationally defensible, but it should not be imposs-
speaker in favour of acceptability to an (idealized) ible to identify texts which, perhaps with suitable
speaker of international English. modification, can satisfy both requirements.
The second would be to attempt to establish One further point should be made if acceptable
principles on which judgements of acceptability alternatives are permitted. The use of expressions
could be made in terms of degrees of meaningful- such as 'mutilation of the text' to describe the
ness and deviation from forms which would be gapping procedure suggests that, for some people
acceptable for the purposes of international at least, a text has a certain inviolability and exact
English. word replacement is an act of restitution as well as
Whatever guidelines are adopted by markers, restoration. I find this attitude irrational but
they should be sufficiently flexible to ensure a understandable. It is much harder to sympathize
very high mark indeed for an (imaginary) candidate with a desire to sustain the inviolability of the one

68
Robert K Johnson
word gap. Provided that the candidate performs usually make a text unduly difficult. (Bright
those functions which have been identified as the and McGregor, 1970: 20)
aim of the particular test, are there really grounds
Difficult texts can be toiled painfully through
which consist
for rejecting acceptable replacements
but the process is dreary, bears no resem-
of two words or more, rather than one? Is there
blance to reading and is not conducive to the
any reason why the gaps themselves should consist
establishment of good learning habits. (Ibid.,
only of a single word? The gap could consist of a
p. 19)
phrase, a sentence, or indeed of a complete para-
graph.
Even a revised claim, that cloze activities corres-
The answers to such questions should be form- pond to intensive reading with a difficult text,
ulated in terms of the purposes of the tests and the
cannot be sustained; partly because of the type of
constraints under which the examiners have to item as previously stated, but more importantly

operate. Assertions that such tests would not be perhaps because what cloze requires is entirely

'proper' cloze tests reflect the desire discussed

different from the normal and important language
skill of inferring meaning from context.
earlier to elevate procedures to the level of prin-
ciples.
As Rivers (1968) and others have pointed out,
in reading it is common for
a person to come across
What do Cloze Tests Test? a word which unknown. This requires the reader
is
It has been shown above that it is possible to re- meaning from context, etc. Language
to infer a
store considerable parts of a cloze passage without teachers are frequently urged to help learners to
understanding its overall meaning. The reverse is
develop confidence in their ability to do this, e.g.,
also possible, i.e., meaning may
that the essential
be understood while the candidate fails to produce The new vocabulary should not co-occur with
an acceptable linguistic item as a slot-filler. What difficult structures and a certain amount of

then does cloze test: receptive ability or productive vagueness in guessing the meaning of words
ability, understanding knowledge?
or linguistic must be accepted. The teacher should not ex-
The rather answer seems to be
unsatisfactory pect students tocome up with exact meanings
almost any combination of these, but with a pre- while guessing in this manner. (Kruse, 1979:
mium, at the lower levels of achievement, against 209)
communicative competence (i.e., the ability to
This is a very different, and much less precise
gain understanding in spite of linguistic deficit)
requirement than that of filling a cloze gap; it
and in favour of linguistic competence (i.e., the challenges the reader's communicative competence
ability to identify an acceptable slot-filler in spite
in the broadest possible terms, while the cloze
of a communication deficit). A high level of item challenges the linguistic competence in very
achievement would normally reflect a balance of precise terms indeed.
communicative and linguistic competencies, but,
as has been noted, accepted achievement levels on
Can Cloze Technique Be Taught?
cloze tests are frequently low.
One aspect of test construction which is rarely
Perhaps the best way in which to approach the considered as seriously as it should be is the effect
question of what it is that cloze tests test is to
that tests have upon the classroom teaching and
look at what the candidate does. It is sometimes
learning situation. A great deal has been written
claimed that the gaps in a cloze passage are equi-
and said in the last twenty years about account-
valent to unknown elements and that in a text,
ability and the 'contract' between teacher and
gap-filling samples a natural and normal reading
learner. It is often assumed that the primary con-
activity. The claim is false on both grounds. Gaps
tract between the English language teacher and
do not reflect unknown elements in a text, which
learner involves the acquisition by the learner of a
are normally low frequency content words. Gaps
certain competency in the English language. In
in cloze tests are rarely of this type and often
these terms, the relevant function of the examiner
consist of function words. Secondly, whatever the
is to devise a test which samples the knowledge
gapping rate (or average gapping rate if non-
and skills of the learner in such a way as to deter-
random selection is used) the proportion of un-
mine whether or not a satisfactory level of language
known elements will be far higher than would be
ability has been achieved. The examiner's task
acceptable for any normal reading purpose.
would be much simpler if this were true, and one
Experience has shown that more than twenty major question regarding reliance on cloze tests
five new words per thousand running words could be eliminated. In fact the primary contract

69
Robert K Johnson
that teachers have with the learners is to get them to give poorer students intensive practice in
through their examinations, which leads to the obtaining transfer features from the gapped text
well-known practice of 'teaching to the exam'. It which would assist in determining an appropriate
should be true that the best preparation for an replacement item.
examination which tests language ability would be As was noted above, varying proportions of the
the development of the learners' language ability, transfer features on each item are grammatical.
but this is generally not the case, partly perhaps The 'good' second language speaker shares the
because examiners are really concerned more with native speaker's ability to judge grammaticality to
obtaining a satisfactory distribution of marks than some extent at least; the poorer reader will be less
with true accountability. As a result, teachers have able to make such judgements and this will be
to decide whether to develop their pupils' language reflected in the cloze scores. Another approach
abilities to the greatest possible extent, while then would be to develop an intensive basic pro-
accepting that for most of them this will mean gramme in grammatical analysis. This might go
failure in the examination, or abandon true com- some way towards compensating for the deficiency
municative competence as a goal and aim instead in the poorer reader, and repay time and effort as
at the appearance of such competence as demon- regards cloze score improvement rather better than
strated by the ability to carry out the tasks required a programme designed to develop overall language
by examiners at a level which will secure a pass. and reading ability.
Nodoubt, in seeking to acquire these examin- Various writers have shown that very high levels
ation skills, pupils' language abilities improve, but of correlation can be obtained between cloze test
the examination is the goal and a pass is the moti- results and a range of other tests and conclude that
vation. the cloze test can, under these circumstances, be
In these circumstances, it behoves any examiner substituted for the scores from a whole battery of
to bear in mind the likely classroom repercussions tests with consequent saving in time and resources
of the selection of a particular testing technique. of examiner and learner alike. Perhaps the most
The next claim that we might consider regarding important reason against such a step is the effect
cloze is that the technique for completing such that this would have upon the language teaching
tests cannot be taught and therefore the only way programme. Even if it is true that cloze cannot be
to prepare pupils is by improving their language taught, teachers and learners will believe that it
ability. This seems doubtful in view of the differ- can.
ences already discussed between the activity of
completing a cloze test and normal language activ- Conclusion
ities such as reading, with which cloze bears the The problem in language testing, as in linguistics, is

closest superficial correspondence. It seems much that the chief result of increasing methodological
more likely that rather specialised skills can be rigour has been to show how little we actually
brought to bear on the task, and that these skills understand what we are dealing with. This is true
can be taught. on the macro level and may always be true since
One way of improving scores on a particular the ultimate questions (what is language? How is it
type of test is to give massive amounts of practice. different from/related to intelligence, cognition,
If the test passages are typically so difficult that and the broader issues of communicative com-
most learners will understand very little of what petence? etc.) may prove to be unanswerable.
they read, then it will be necessary to practise However, it is no solution to hope that safety from
with passages of a similar level of difficulty. Work these imponderables lies in the rejection of judge-
with more suitable passages would have little trans- ments based on principled argument; nor is it true
fer to the examination task, though it would be that at the micro level (the level of the construction
more likely to result in a genuine development of of tests and test items for particular groups of
communicative ability. learners following specific programmes for specifi-
It has been noted that 'good' readers tend to able purposes), that we lack resources for making
use the whole text in determining the appropriate reasonable and reasoned judgements. By doing so,

gap-filler, while 'poorer' readers tended to make we can increase both the true validity of our tests
mistakes which showed that they paid attention and our understanding of what it is that we are
only to preceding text, or to the immediate textual testing, whileminimising the dangers inherent in
environment (Neville and Pugh, 1976). One way of any situation where an examination rather than
improving performance on cloze tests, therefore, a syllabus or teaching programme may determine
though not necessarily of reading ability, would be what happens in the classroom.

70
Robert K Johnson
Appendix Theory of Processing in Reading', Reading Research

The reconstructed text of H is as follows: Quarterly 13, pt. 4, pp. 508-37.

Farhady, Hossein, 1979, 'The Disjunctive Fallacy between
If the balloons popped the sound wouldn't be Discrete Point and Integrative Tests', TESOL Quarterly

would be too far 13, no. 3, pp. 347-57.

able to carry since everything
Freeland, Jane, 1979, 'Text Type as a Factor in the Cloze
away from the correct floor. A closed window Testing of Foreign Languages', BAAL Newsletter,
would also prevent the sound from carrying, no. 8.
since most buildings tend to be well insulated. Green, Raphael, 1979, 'An Experiment with Cloze Test-
Since the whole operation depends upon a ing', English Language Teaching Journal 33, no. 2,

pp. 122-26.
steady flow of electricity a break in the
Hildyard, Angela, and Olson, David R., 1978, 'Memory
middle of the wire would also cause problems. and Inference in the Comprehension of Oral and
Of course, the fellow could shout, but the Written Discourse', Discourse Processes 1, pp. 91-117.
human voice is not loud enough to carry that House, E.R., 1977, The Logic of Evaluating Argument,
far. An additional problem is that a string could C.S.E. University of California Monograph Series in
Education, no. 7.
break on the instrument. Then there could be
Jongsma, E.R., 1971, The Cloze Procedure: A Survey of
no accompaniment to the message. It is clear the Research, ERIC ED 058 015, Bloomington, Indiana
that the best situation would involve less dis- University.
tance. Then there would be fewer potential Johnson, Ronald E., 1975, 'Meaning in Complex Learning',
Review of Educational Research 45, no. 3, pp. 425-59.
problems. With face-to-face contact, the least
Kruse, Anna Fisher, 1979, 'Vocabulary in Context', Eng-
number of things could go wrong. lish Language Teaching Journal 33, no. 3, pp. 207-17.
Littlewood, William T., 1979, 'Communicative Perfor-
References mance in Language Developmental Contexts', IRAL,
Alderson, J. Charles, 1979, 'The Cloze Procedure and Pro- 17, no. 2, pp. 123-38.
ficiency in English as a Foreign Language', TESOL Mishler, Elliot G., 1979, 'Meaning in Context: Is There
Quarterly 13, no. 2, pp. 219-27. Any Other Kind?', Harvard Educational Review 49,
Anderson, J. 1971, 'A Technique for Measuring Reading no. 1, pp. 1-19.
Comprehension and Readability', English Language Moser, C.A. and Kalton, G., 1971, Survey Methods in
Teaching 25, no. 2, pp. 178-82. Social Investigation, 2nd ed., London, Heinemann.
Beattie, Geoffrey W., and Butterworth, B.L., 1979, 'Con- Neville, Mary H. and Pugh, A.K., 1976, 'Context in
textual Probability and Word Frequency as Deter- Reading and Listening: Variations in Approach to
minants of Pauses and Errors in Spontaneous Speech', Cloze Tasks', Reading Research Quarterly 12, no. 1,
Language and Speech 22, pt. 3, pp. 201-11. pp. 11-31.
Bowen, J.D., 1969, 'A Tentative Measure of the Relative Oakshott-Taylor, John, 1979, 'Cloze Procedure and
Control of English and Amharic by 11th Grade Ethio- Foreign Language Listening Skills', IRAL 17, no. 2,
pian Students', UCLA Workpapers in Teaching English pp. 150-58.
as a Second Language 2, pp. 69-89. Oiler, J.W. Jr., 1972, 'Scoring Methods and Difficulty
Bransford, J.D., and McCarrell, N.S., 1974, 'A Sketch of Levels for Cloze Tests of ESL Proficiency', Modern
Cognitive Approach to Comprehension'. In Weimer, Language Journal 56, no. 3, pp. 151-58.
W.B. and Palermo, D.S. (Eds.), Cognition and the Oiler, J.W., Jr., Bowen, D., Dien, T.T. and Mason, V.W.,
Symbolic Processes, Hillsdale, N.J., Lawrence Frlbaum 1972, 'Cloze Tests in English, Thai, and Vietnamese:
Assoc. Native and Non-Native Performance', Language
Bright, J.A. and McGregor, G.P., 1970, Teaching English Learning 22, no. 1, pp. 1-16.
as a Second Language, London, Longmans. Oiler, J.W., Jr., 1973, 'Cloze Tests of Second Language
Cambourne, Brian, 1976, 'Getting to Goodman: An Proficiency and What They Measure', Language Learn-
Analysis of the Goodman Model of Reading with ing 23, pp. 105-18.
Some Suggestions for Evaluation', Reading Research Oiler, J.W., Jr., and Perkins, Kyle (Eds.), 1978, Language
Quarterly 12, no. 4, pp. 605-36. in Education: Testing the Tests, Rowley, Mass., New-
Carroll, John B., 1971, 'Language and Cognition: Current bury House.
Perspectives from Linguistics and Psychology'. In Riley, Pamela M., 1973, The Cloze Procedure: A Selected
Laffey, James F. and Shuy, Roger (Eds.), Language Annotated Bibliography, Lae, Papua New Guinea Uni-
Differences Do They Interfere?, Newark, Del., Inter- versity of Technology.
national Reading Association. Rivers, Wilga M., 1968, Teaching Foreign Language Skills,
Carver, Ronald P., 1978, 'The Case Against Statistical Chicago, University of Chicago Press.
Significance Testing', Harvard Educational Review 48, Robinson, Richard David, 1972, An Introduction to the
no. 3, pp. 378-99. Cloze Procedure: An Annotated Bibliography Newark,,

Clarke, Mark A. and Burdell, Linda, 1977, 'Shades of Del., International Reading Association.
Meaning: Syntactic and Semantic Parameters of Cloze Strauch, R.E., 1976, 'A Critical Look at Quantitative
Test Responses'. In Douglas Brown, H. et al. (Eds.), Methodology', Policy Analysis 2, no. 1 (Quoted by
On TESOL '77 — Teaching and Learning English as a House, 1977).
Second Language: Trends in Research and Practice, Streiff, Virginia, 1978, 'Relationships among Oral and
Washington, D.C., TESOL. Written Cloze Scores and Achievement Test Scores in
Finn, Patrick J., 1977, 'Word Frequency, Information a Bilingual Setting'. In Oiler and Perkins, 1978.
Theory and Cloze Performance: A Transfer Feature Stubbs, Joseph Barstow and Tucker, G. Richard, 1974,

71
Robert K Johnson
'The Cloze Test as a Measure of English Proficiency', 2nd ed., New York, Harcourt, Brace, Jovanovich.
Modern Language Journal 58, nos. 5-6, pp. 239-42. Wainman, H., 1979, 'Cloze Testing of Second Language
Taylor, Wilson L., 1953, 'Cloze Procedure: A New Tool Learners', English Language Teaching Journal 33,
for Measuring Readability', Journalism Quarterly 30, no. 2, pp. 126-32.
pp. 415-33. Zurif, Edgar B., and Blumstein, Sheila E., 1978, 'Language
Uhlenbeck, E.M., 1963, 'An Appraisal of Transformation and the Brain'. In Halle, Morris, Bresnan, Joan and
Theory', Lingua 12, pp. 1-18. Miller, George A. (Eds.), Linguistic Theory and Psy-
Valette, Rebecca M., 1977, Modem Language Testing, chological Reality, Cambridge, Mass., MIT Press.

^^^^^^^B
Paul Nation

Getting information from advanced

reading tests

Two kinds of tests

Here is a short reading text followed by two kinds of tests.

A major construction problem is the diversion of the river to enable the foundations for the dam to be
excavated and the concrete placed. Since it would be uneconomical to construct diversion works and
cofferdams to divert the full flood discharge of the river, the diversion has been divided into several
distinct operations. The critical period will be during the low-water season, because, as the river falls,

5 the cofferdam on the left bank will be demolished where it crosses the diversion channel, allowing water
to flow through the temporary openings in the dam wall. This having been done, a rockfill cofferdam
will be constructed across the main river channel downstream of the main site. This will cause the water
at the dam site to remain quiescent by preventing any flow in this part of the main river channel and
directing water through the diversion tunnel.
(from Adamson & Lowe, 1971, pp. 106-107)

The diversion has been divided into several dis- The test shows that reference words referring to a
tinct operations because noun group are easier than those referring to a
clause.
a. the full flood of the river must be diverted.
The reference word test can also give us inform-
b. the foundations have to be excavated.
ation about individual learners. If a learner does
c. a complete diversion is too expensive.
not answer item 4 and similar items correctly, we
d. it occurs during a critical period.
know that that learner needs help or extra practice
The rockfill cofferdam with the reference words this, that, and it where
a. will direct the flow through the diversion they refer to a clause.
tunnel. We can contrast the two kinds of tests by
b. is on the main site. thinking of learning as a journey. The first kind of
c. will increase the flow at the dam site. test, exemplified by the multiple-choice items,
d. is on the left bank. tells us how far the learners have come along the
road. The second kind of test, exemplified by the
it in line 5 refers to in line
reference word items, tells us what the road is like,
This in line 6 refers to in line what difficulties can be found on the journey, and
how learners have coped with these difficulties.
This in line 7 refers to in line
There is another important difference between
The multiple-choice test tells us how much infor- these two kinds of tests. Good multiple-choice
mation the reader got from the passage. We can items, comprehension questions, and true/false
use the results of this test to classify our learners statements are not easy to make. Items like the
into groups (Pass/Fail) or rank them on a scale (A, reference word items however are easy to make
B, C ).
because the items can follow a fixed formula like

The reference word test, on the other hand, can This in line 6 refers to in line
provide us with several kinds of information. It

can give us information about language learning in

What does This in line 6 refer to?
general.For example, in a class of 40 learners, 5
did not answer item 3 correctly. Twenty-two did The only difficulties involved in making such items
not answer item 4 correctly. If we compare the are knowing what language features to test and
two items, we find that it in item 3 refers to a finding examples of these features in the text. In
noun group {the cofferdam on the left bank) the rest of this article we will look at useful feat-

whereas This in item 4 refers to a larger unit (the ures to test to gain information about reading, and
cofferdam on the left bank will be demolished). we will look at possible types of items.

• I
Paul Nation
Types of test items: language features the verb from the text. Here are items based on
Comprehension questions direct attention to the the examples of noun groups given above.
message of a text which is peculiar to that text.
This makes it difficult for the teacher or the are no longer satisfactory.
learners to get information about points that need
depends on
further attention in order to make it easier to read
other texts. The items described in this section,
2 Co-ordination
however, test language features that are important
The following sentence can be divided into three
for the understanding of almost every text. Per-
parts.
formance on these items can be used as a basis for
1
planning further teaching.
The main cables of all modern suspension
Noun groups 2
1

Much of the complexity at the sentence level is bridges/are fixed to the tower tops, /and subject
3
caused by noun groups containing relative clauses
or reduced relative clauses. Here are two sentences
the towers to a very heavy vertical load (almost

The noun groups the whole weight of the bridge).

from an unsimplified text. are in
brackets. Notice how the relative clauses and re- Parts 2 and 3 are parallel to each other and they
duced relative clauses complicate what is basically are both connected to part 1. A problem in reading
a simple sentence pattern. such a sentence is seeing the relationship between

(The advent of and rocket propulsion, and

jet
part 1 and part 3. This can be tested in the follow-
ing ways.
of nuclear reactors,) has shown that (the mat-
erials which previously served for constructional
a. The teacher instructs the learners to remove
purposes) are no longer wholly satisfactory for and, and rewrite the sentence as two sentences.

(the manufacture of equipment on which the

efficient functioning of these new sources of b. If part 3 begins with a verb, the teacher gives

power depends.) (An industrial demand) has the learners the verb and they copy the subject

arisen for (entirely new metals which were and object of the verb from the text.

merely laboratory curiosities a few years ago, or subject

which, in the case of the transuranic elements
produced by nuclear fission, never before existed
3 Reference words
within the history of Man.)
This,that and it are the most difficult reference
(Adamson & Lowe, 1971, p. 25)
words because they can refer to items other than
There are various levels of difficulty of relative nouns or noun groups. There are however other
clauses, and test items will reveal these. Briefly, the factors which affect the difficulty of reference
most difficult relative clauses are those that inter- words. These include whether the reference word
rupt the normal subject verb (object) pattern. So a follows or precedes the item referred to, whether
relative clause which is attached to the subject of a the item referred to is in a different sentence from
sentence causes more difficulty for a reader than the reference word or not, and whether there are
one attached to the object. In addition, relative other grammatically (although not semantically)
clauses with the object of the relative clause re- possible items between the reference word and the
placed by the wh- word are more difficult than item referred to. Barnitz (1980) found that the
those with the subject of the relative clause replaced most difficult reference items were those that re-
by the wh- word. That The man who I saw ....
is, ferred to a noun in the same sentence and followed
is more difficult than The man who saw me the item referred to. Barnitz tried to make a
Understanding of complex noun groups can be ranking of the difficulty of different types of ref-
tested in the following ways. erence items. When such a ranking has been made,
a. The teacher gives the learners a noun from the teachers can use tests of reference items to see
text. The learners copy the whole noun group how far a particular learner has gone on the way to
containing the noun. Alternatively the learners mastering the items. Then suitable help and prac-
can just copy the first two and last two words tice can be provided. Teachers can also use the

of the noun group in the text. tests described in this article to make their own
b. The teacher gives the learners a verb from the rankings of difficulty. Examples of reference word
text and they copy the subject and object of items have already been given.

74
Paul Nation
4 Verbs encroaching a. What part of speech is it?
Verbs typically enter into a relationship with a b. What does not encroach on
subject, and an object, adjunct, or complement. In what?
a sentence like The committee reached a decision, c. Which word could you put
the subject and object of the verb reach are quite between the two sentences —
apparent. These relationships are less apparent in a but, because or then}
sentence like After much deliberationby the com- d. What does encroach mean in
mittee a decision was finally reached. Sometimes the text?
the relationships are even less apparent when the
If the learners' guess at d is a different part of
verb has become
noun. For example, in the
a
speech from a then they need to make a basic
following sentence, some learners will have diffi-
change in their strategy. Their guess must be the
culty in deciding what uses what and what com-
same part of speech as the unknown word, b tests
petes with what.
whether the learners notice the immediate gram-
Showing the orchard to grass will result in a tem- matical relationships, c tests the learners' appreci-
porary check to the vigour of the trees, due ation of the wider context. A breakdown at any
principally to the use of the available nitrogen one of these points shows where further practice is
by the grass and competition by the sward for needed.
the moisture.
2 Simplifying sentences
(NZDA, 1974, p. 50)
The steps in this strategy consist of items men-
We have seen how
What does what? item can
the tioned in the section on language features, namely
be used to test learners' understanding of noun a. reference
groups and co-ordination. It is the most efficient b. co-ordination, and
way of whether learners see the subject
testing c. noun groups.
verb (object/complement) relationship in different Here is a sample item.
parts of a sentence. Here is a sample item.
Simplify the following sentence by replacing all

compete with (line 3) reference words by the items referred to, by

removing and, but, or or and rewriting as differ-
The rules to follow when answering a What does
ent sentences, and by removing the parts of the
what? item are:
noun groups that follow the main noun in the
(i) Always make the verb active — not passive.
group (relative and reduced relative clauses).
(») Copy only the headwords of the subject and
object noun groups. That is, give short answers. The legs of the Severn towers, on the other
(Hi) Do not use reference words when answering hand, have no and the visible outer
cells,
but answer using the item referred to. plating, reinforced by internal longitudinal
stiffeners, is carrying the whole of the tower
Types of test items: problem solving strategies
load, and surrounds an otherwise open space,
In order to succeed in independent reading, learners
save for diaphragms, ladders and lifts.
need to be able to cope with unknown words and
(Adamson & Lowe, 1971, p. 97)
complicated sentences. It is not enough to know

that a learner cannot cope with these difficulties. Here is the answer.
If help is to be given, the teacher must know
The legs of the Severn towers, on the other
where the learner is going wrong.
hand, have no cells. The visible outer plating is

1 Words in context carrying the whole of the tower load. The

Here is an item to test how well learners have visible outer plating surrounds an otherwise
mastered the strategy of guessing unknown words open space, save for diaphragms, ladders and
from context (Long & Nation, 1980). lifts.

If the home orchard area is small, the problem This strategy is used whenever the learners meet a
of the large size to which most fruit trees grow sentence that they cannot understand in spite of
can be met by planting dwarf or semi-dwarf knowing the vocabulary (Nation, 1979; Long &
trees They can be grown conveniently Nation, 1980).
along the edge of the vegetable garden without
encroaching on it. Validity
(NZDA, 1974, p. 51) The validity of the test items described in this
75
Paul Nation
article depends on whether the items test real reasons for this. Firstly, performance on the items
problems in advanced reading and whether the provides the teacher and learners with feedback
strategies work. which can result in appropriate help and practice
There is considerable evidence that the items with features that occur in almost every reading
test real problems. Most of the evidence however text. Secondly, the items allow teachers to invest-
comes from studies of children learning their igate learning. That is, teachers can act as experi-
mother tongue. Researchers have found that chil- menters and develop and validate rankings of
dren learn items like reference words, or noun learning difficulty of structural features. Through
groups, in a particular order. This order seems to their testing they can gain new insights into lan-
reflect the difficulty of the items. So, children guage and how it is learned. Thirdly, the items
learn relative clauses attached to the object before direct attention to language as a system. An impor-
relative clauses attached to the subject. In addition, tant educational goal of language learning is the
comprehension tests reveal that have
children development of an interest in language for its own
more difficulty in interpreting sentences containing sake. An awareness of the system behind language
a relative clause attached to the subject. The small is a step towards this goal.
amount of experimentation done with learners of It needs to be stressed that the items described
English as a foreign language supports the findings in this article are not offered as substitutes for
of first language learning research. But there is comprehension questions. They are useful add-
need for more research with foreign learners. itional tools which may lead to a greater under-
Teachers can carry out much of this research in standing of learning and a corresponding improve-
their own classroom by combining the use of the ment in teaching.
test items described in this article with translation
References
checks and individual interviews with learners. Evi-
Adamson, V. and Lowe, M.J.B., General Engineering
dence about whether the strategies work can only Texts: English Studies Series 9, Oxford University
come from their use. They have been useful in my Press, London, 1971.
teaching but teachers should not accept them un- Barnitz, J.G., 'Syntactic effects on the reading compre-
hension of pronoun-referent structures by children in
critically. The question of validity should be the
grades two, four and six', Reading Research Quarterly
concern of all teachers.
15, no. 2, 1980, pp. 268-89.
Long and Nation, Read Thru, Longman, Singapore, 1980.
Nation, I.S.P., 'The curse of the comprehension question:
Conclusion
some alternatives', RE LC Journal Guidelines 2, 1979,
The test items described in this article direct
pp. 85-103.
attention towards structural features of a reading N.Z.D.A. (New Zealand Department of Agriculture), The
text and to analytical strategies. There are several Home Orchard, Government Printer, Wellington, 1974.

76
Brian Heaton

Writing in perspective: some comments on the

testing and marking of written communication

The need to improve tests of writing does not primarily on those aspects of grammar and syntax
seem to have received as much attention in the which meaning
relate to in a piece of discourse
past decade as the need to improve techniques for rather than meaning in isolation: e.g. reference
testing grammar and However, even during
lexis. features, connectives, substitution devices,
the heyday of the psychometric and structural omission. In addition, a controlled composition
school of testing, composition writing, together test may seek to identify and measure such judge-
with the oral interview, provided an essential ment appropriacy of style, register, rele-
skills as

balance to discrete point tests by laying stress on vance and ordering.

the more communicative tests of language use. Paragraphs for completion, stretches of language
For many years, the composition paper has been in which sentences are put in their correct order,
ostensibly concerned with the total effectiveness pieces of continuous writing designed to test an
of the written message rather than with the correct- awareness of style and register as well as relevance
ness of the language forms comprising its various and appropriacy are all useful techniques for testing
parts. Although such mechanical methods of writing. However, not all tests of writing need con-
scoring as the error-count method (in which one sist of long pieces of connected discourse: there

mark is deducted for each mistake made) were are also ways of testing writing beyond the level of
used at one time by some examiners in an over- the sentence, using much shorter stretches of lan-
riding desire to achieve reliability, misguided and guage. Samonte and Sharwood-Smith cite the use
inhibiting methods of this nature were on the of a two-sentence text which measures an ability
whole short-lived, and were soon replaced by to form a coherent unit of language. For example,
analytical and impression methods of scoring. The candidates may be instructed to write a sentence
advantage of setting two or more short realistic to precede the statement 'Moreover, it was imposs-

writing tasks in place of one literary type of essay ible open the windows'. Example responses
to
has also been recognised for many years in several could be 'It was very hot in the small room', 'There
widely administered examinations. was only one fan in the room, but it was broken',
Nevertheless, in spite of its many merits, free 'The door slammed behind John, and he realised
composition is by far from being the only means he was locked in the room', etc. Though in all
of testing the writing skills. Neither is it necessarily cases students are required to demonstrate an
the most reliable means. Controlled composition awareness of the communicative nature of language
testing, still in its way of
infancy, offers a reliable in general,and cohesive devices in particular, there
measuring number of identifiable skills at
a limited is and freedom of res-
a large degree of subjectivity
a time. Although by no means appropriate for ponse in this type of item, as can be seen from the
every situation in which writing is tested, controlled following example stimuli.
composition may be useful in many progress and
There
(/') is one here, too.
diagnostic tests, simply because it can help teachers
(«) To do this, the water must
first be boiled.
to identify, and concentrate on, specific areas of
(Hi) These should then be carefully sorted.
difficulty. Moreover, the controlled writing task
(iv) For wild life, however, there are even
itself can be made far less time-consuming.
greater dangers in the pollution of rivers,
What does controlled composition measure
lakes and seas.
which a grammar by definition does not
test of
(v) But there is no reason to be so pessimistic.
usually measure? Most tests of grammar are con-
cerned with the recognition and manipulation of The degree of control (and objectivity) in
correct forms of language and operate at the sent- testing composition writing can vary widely. The
ence level (though good grammar tests often operate following is an example of a closely controlled
beyond the level of the sentence). Tests of written composition task which resembles in many ways
composition, on the other hand, should concentrate a test of grammar. However, it should be remem-

77
Brian Heaton
bered that the composition is intended to test when carrying out the writing task. Indeed, in
solely an ability to use appropriate connectives everyday situations in real life we rarely write
and reference devices in connected discourse. without a particular purpose in mind, whether it is
an article, a report, or notice. Consequently,
a letter,
A travel agency in Bangkok is now arranging
even the two examples of controlled composition
day tours to foreign countries by jumbo jet.
previously given would benefit from such rubrics
you can leave Bangkok very early in the
,
as:
morning and have lunch at the Taj Mahal.
you can fly to New Delhi .do some shop-
The following paragraph has been taken from a
ping returning to Bangkok at midnight.
newspaper report. The writer hopes to interest
you may leave Bangkok much later in
,
his readers by giving surprising information
the morning spend a day shopping in
about new kinds of day tours. Rewrite the para-
Singapore, returning home in time to watch the
graphs, inserting a suitable word or phrase in
evening news on television.
each blank.
Control can be relaxed to varying degrees; at
and
the other extreme, as illustrated by the following
example of written work based on information Use the following table to provide information
little control need be
given in tabular form, very about the influence of propaganda. As your
exercised. readers are largely ignorant about the effects of

Propaganda

Means Result

1 Loaded words Influence people with no strong views

on subject
2 Both sides of argument Convert people originally opposed to idea

3 Repetition Make people remember

4 Quotations from respected sources Strengthen arguments

Regardless of the extent of the degree of con- propaganda, try to convince them of its power
trol operated, however, it is important that con- over people.
trolled composition tests should never become un- Where it is important in the test to concentrate
natural exercises involving the performance of on a particular register, students can be instructed
mental somersaults on the part of the students. to write a letter to a friend, an article for a news-
Tasks requiring students to form sentences accor- paper, a report drawing certain conclusions, a
ding to certain patterns 'Combine the foll-
(e.g. memorandum advising someone, etc. The following
owing sentences, using which and although.') often two topics from the Joint Matriculation Board
result in all kinds of errors simply by forcing Test in English Overseas illustrate how writing
students to guess what was in the examiner's mind, tasks can be put into a communicative context
adopting unfamiliar and alien lines of thought. which provides both motivation and guidance for
All written work, whether free or controlled, candidates. It is also worth noting that these items
should be carried out as far as possible in a com- appeared many years ago.
municative context. When we speak, we are gener-
ally aware that we are addressing another person: Example (i)

hence there is a basic desire — whether conscious Imagine that a British friend of yours has re-
or subconscious — to communicate with that per- cently gone to live in your country. You have
son. Because the writer often finds himself add- arranged for him to stay for a week-end with
ressing a general audience which is far less clearly some relative of yours. They are eager to wel-
identifiable than in normal speech situations, come your friend, but have never met any
such a desire is not so apparent in writing. Con- young people from Britain before.
sequently, it is all the more important in tests of Your friend is very frank, sincere and likeable,
writing to provide both a context and a purpose but he has many casual ways that you think
which the student can have uppermost in his mind might upset your relatives. Your friend is often
78
Brian Heaton
untidy and unpunctual, treats older people as writing. The scoring of tests of free-writing has, in
equals (in a very friendly way of course), arid fact, long been the subject of considerable research.
likes to argue about subjects such as politics Briefly, marks may be awarded on what the testee
and religion. Your friend probably be
will has written; on what thought the testee meant
it is

ignorant of the customs observed in your by what he wrote; on handwriting and general
country when visiting people. appeararace of the composition; or on previous
Write a letter to your friend, who may be knowledge of the student. Furthermore, it is poss-
male or female, whichever you prefer. Describe ible (and not unusual) for two markers to differ
your relatives briefly, and advise your friend widely in the spread of the marks they award,
how to behave towards them. Write between their strictness and their rank ordering of papers.
one and one-and-a-half pages. Indeed, whether the analytical method or the
multiple-marking method (often referred to as the
Example (ii)
impression method) is used, examiners award
The following table gives information about the
marks chiefly on the basis of their impression of
(jost of sending 27 tons of office machinery
the students' work.
from London to New York by air and by sea:

TRANSPORT DELIVERY TO INSURANCE TOTAL

THE NEW YORK
CUSTOMER
Sea £5,610 £810 £1,660 £8,080

Air £6,500 £840 £1,120 £8,460

Imagine that you work for the company that Whatever method is used to assess written work,
wishes to sell this machinery to America. You care should be taken to avoid an excessive concern
have been asked to write a short report advising with the manipulation of language forms. At the
your company whether to send it by sea or by other extreme, of course, is the attitude typified
air. In addition to the above information, you by the statement 'I know what the candidate is

know that most other British companies that trying to say' — an attitude which too frequently
export office machinery send it by sea, and that reflects a desire to interpret on behalf of the
the American customers want quick delivery. student at all costs and which results in a neglect
The delivery time by sea is six weeks; by air it is of the grammatical and even the communicative
two weeks. Furthermore, the total value of the aspects of the written language.
27 tons of office machinery is £115,000. It is precisely in the attitude to grammatitical
Write your report to your company. It should errors where the testing of the writing skills differs
not be in letter form. Write between one and from the teaching of these skills. In the teaching of
one-and-a-half pages. writing, attention may be concentrated on the
correction of high-frequency errors or of those
errors which are least acceptable to native-speakers.
It is necessary, however, to sound a cautionary Alternatively, the teacher may wish to correct
note when devising writing tasks which stress the only those errors which are related to the particular
communicative nature of language. Creating language forms currently being taught. In tests of
cultural obstacles should be avoided when devising writing, however, attention ought to be paid pri-
realistic tasks: for example, however tempting it marily to those types of grammatical errors which
may be to instruct candidates to write notes for impede written communication. As a result of
imaginary milkmen, such a writing situation would examining such kinds of errors, Burt and Kiparsky
be quite alien to students in Thailand — or even in (1972) advocate classifying errors according to
Germany. whether they are global or local errors. They define
In many compositions, assessment of writing global errors as those which involve the overall struc-
performance is sometimes based largely on the ture of a sentence, causing the reader to misunder-
number of grammatical errors made. As will be stand a message or even fail to understand it at all.
readily appreciated, the resulting score using this Misuse of connectives, pronouns and other
relatives,
method bears little relation to the effectiveness of reference devices, wrong sequence of tenses, incor-
a student's" ability to express himself freely in rect word order and inadequate lexical knowledge as

79
Brian Heaton
well as serious mis-spellings and wrong punctuation Finally, the whole question of time should be
can usually be classed as global errors. Local errors, considered when administering tests of writing.
on the other hand, comprise those errors which While it may be important to impose time limits in
cause trouble in a particular constituent or clause tests of reading, grammar and lexis, such constraints
in asentence and which do not significantly hinder may well be very harmful in tests of writing, in-
the comprehension of the sentence. They include creasing the sense of artificiality and unreality.
misuse of articles and prepositions, lack of agree- Moreover, the fact that candidates are expected to
ment between subject and verb, incorrect affixes, produce a finished piece of writing at their very
wrong verb forms and the incorrect position of first attempt adds to this sense of unreality. How

adverbs. Burt argues that this distinction between often in real-life situations is anyone expected to
global and local errors provides the most useful write something without having a chance to pro-
criteria for determining the communicative impor- duce one or more preliminary drafts first? Not
tance of errors, claiming that the correction of one only should students be given sufficient time to
global error helps to clarify a message far more produce preliminary drafts of whatever they write
than the correction of several local errors. but they should be actively encouraged to do this
In a study based on the work of Burt and in any test of composition. If writing tests are
Kiparsky, Tomiyana examines the various ways in made far more realistic and relevant to real-life
which grammatical errors can distort written situations, emphasis will automatically be placed
messages. The results indicate that such local on writing as a communicative activity.
errors as those caused by the omission and wrong
choice of articles are easier to correct and hence
less crucial to successful communication than the

omission and wrong choice of connectives. How- References

ever, the incorrect insertion of both articles and Burt, M.K. and Kiparsky, C, The Goofxcon. a repair
connectives does not cause any serious breakdown manual for English, Rowley, Mass., Newbury House,
1972.
in communication. Hendrickson has modified
Hendrickson, J. (Ed.), Error Analysis and Error Correction
Burt and Kiparsky's global/local error distinction, in Language Teaching, RELC Occasional Papers,
defining a global error as a communicative error No. 10, 1979.
which results in a proficient speaker misinterpreting Joint Matriculation Board, Test in English (Overseas),

or even failing to understand a message and a local March 1968, July 1968.
Samonte, Aurora L., 'Techniques in Teaching Writing',
error as a linguistic error which renders a structure,
RELC Journal, Vol. 1, No. 1, 1970.
etc. awkward but which nevertheless does not give Sharwood-Smith, Michael, 'Courses in Written English -
rise to any real difficulty in understanding the in- Some Comments and Words of Caution', ELT Docu-
tended meaning of a sentence. The whole area of ments, (73/1), 1973.
Sharwood-Smith, Michael, 'New Directions in Teaching
global/local or communicative/linguistic errors is a
Written English', Forum, Vol. XIV, No. 2, April, 1976.
rich one for further research and may well provide Tomiyana, M., 'Grammatical Errors Communication
a systematic method for assessing fre-written work Breakdown', TESOL Quarterly, Vol. 14, No. 1, March
with deeper insight. 1980.

HB^^H M
JPB oyle

Testing language with students of

literature in ESL situations

Passing your driving test means being able to drive 'And this function (of literature in the state)
the car not simply knowing the Highway
well, has to do with the clarity and vigour of any and
Code and a manual on engine maintenance. Lan- every thought and opinion. It has to do with
guage testing nowadays, like language teaching, maintaining the very cleanliness of the tools,
also stresses the ability to do something with the the health of the very matter of thought itself

language, not merely to know about its formal When work goes rotten —
their (writers')
characteristics: the rules of use are seen to be by that I do not mean when they express
important as well as the rules of grammar. Alan indecorous thoughts — but when their very
Davies' Survey Articles in Language Teaching and medium, the very essence of their work, the
Linguistics: Abstracts (0 outline well the new application of word to thing, goes rotten, i.e.
awareness of the problem of testing communicative becomes slushy and inexact, or excessive and
competence as well as formal knowledge. Inter- bloated, the whole machinery of social and
estingly, he ends his careful and wide-ranging survey individual thought and order goes to pot. This
with the question: 'Is communicative testing is a lesson of history, and a lesson not half
feasible?' learned.' («)
Whatever our feelings on this, most people with
experience in language testing would agree that a So the first justification for keeping language

good test will contain both questions of a general teaching — and therefore testing — in touch with
nature and questions on more specific details, in literature is that literature, being language at its

other words integrative as well as discrete point. If

most vigorous and clearest, keeps language 'clean
we accept this general position and turn to the and healthy'.
business of testing students of literature in ESL A recent issue of Forum shows a renewed
interest in returning to literature as a source of
situations, some interesting test types, both of a
general and specific nature, can be suggested.
texts for enjoyable and stimulating language
teaching. Admitting to a slight guilt-feeling about
But first, some preliminary remarks on the val-
using literature, one writer says:
idity of considering students of literature as a
special group. I am thinking of the type of situation 'Most of us teach literature in language class
which uncommon, especially in tertiary
is not for exactly the same reason we are ashamed
education in Commonwealth or former Common- that we teach literature: stories and poetry are
wealth countries, where the students who choose interesting. We enjoy them. The students enjoy
to do a degree in English must study a great deal them. Our attention is engaged, as it is rarely
of literature. However, their language ability is engaged by word-lists and exercises, for
often not too good, particularly in countries where literature touches our common humanity.' (Hi)
English is being spoken less and less and the mother
And in the conclusion to the first article of the
tongue is taking over. There is a tendency in such
same issue ofForum, Albert H. Marckwardt, the
circumstances to play down the relevance of litera-
author of The Place of Literature in the Teaching
ture in language teaching. This tendency is rein-
of English as a Second or Foreign Language (1978).
forced when ESP becomes fashionable and the
says:
generalities of literature are considered less relevant
to the students' needs than the more purpose- 'In our wholly justifiable concern with the
specific language of other disciplines — science, language per se and with taking every possible
medicine, engineering. advantage of the systematic study of language
More than half a century ago Ezra Pound refuted to facilitate the learning process, there is a
such thinking: danger of overlooking or undervaluing some of
the uses to which language may be put, among
them its function as a literary medium.' (iv)
81
J P Boyl<
Teaching language to students of literature is in the painting of the fence in Tom Sawyer.
a sense ESP, with the paradoxical twist that the
speciality of literature is the consideration of c. Cloze/Modified cloze. The cloze test, together
human nature at its broadest, its most general. In with dictation, has risen in the popularity polls.
the medical field the specialist — the surgeon, the The Hong Kong Examining Board is happy with
psychiatrist, the paediatrician — is considered the initial results from modified cloze (where the
high-flyer; the poor old General Practitioner def- blank has three or four answers and the correct
initely a cut below. In language teaching ESP is one has to be chosen — a multiple-choice question).
felt by some to be more high-powered than general All types of cloze test seem to be reliable: deleting

English. But with language, as with medicine, the every wth word, or deleting on a more rational
specialist must first be an expert in his general basis-, allowing only the exact word or accepting
field. And with language that general field is reasonable alternatives. For literature students the
human nature in action — the realm of literature. cloze is probably more valid, certainly more rele-
In this context, then, let us see what kind of vant, if something like the following can be given.
testswould be appropriate to students of literature. It is taken from the text of an interview with

Rather than speak in general terms, I will give Joyce Cary, the novelist, contained in the Paris
examples where possible. Review Interviews, Writers at Work.

1. Reading
a. short story is read, e.g. Frank O'Connor's My
A INTERVIEWERS: Have you read The Bost-
Oedipus Complex. Specific comprehension ques- onians? There was the spellbinder.
tions can be asked: CARY: No, haven't read that.
I

(i) What particular delight did climbing into his INTERVIEWERS: The Princess Casamassima}
mother's bed in the morning give the boy? CARY: I'm afraid I haven't read that either.
(ii) What incident brought the boy and his father Cecil is always telling me to read her and must. I

together? But I read James a good deal. There are times

Short passages can be quoted and their significance you need James, just as there are times when you
in the story as a whole questioned: must have Proust — in his very different
(i) 'You must be quiet while Daddy is reading, of change. The essential thing James is
Mother said impatiently. It was clear
Larry,' that he came a different, a highly organ-
that she either, genuinely liked talking to ized, hieratic society, and for him was
Father better than talking to me, or else that not only a very and highly civilized society,
he had some terrible hold on her which made but It was the best the could do.
her afraid to admit the truth. 'Mummy,' I said But it was subject to corruption. This was
that night when she was tucking me up, 'do center of James' moral idea — everything
you think if I prayed hard God would send good was, for that specially liable to cor-
Daddy back to the war?' ruption. Any_ of goodness, integrity of
(ii) I pretended to be talking to myself, and said character, that person to ruin. And
in a loud voice: 'If another bloody baby comes whole civilization, because it was real civil-

into this house, I'm going out.' Father stopped ization, cultivated and sensitive, fearfully
dead and looked at me over his shoulder. exposed to frauds and brutes and
'What's that you said?' he asked sternly. 'I was grabbers. This was tragic theme. But my
only talking to myself,' I replied, trying to world quite different — it is intensely
conceal my panic. 'It's private.' a world in creation. In world, politics is like
navigation a sea without charts and men
b. The must be able to discrim-
literature student
live the lives of pilgrims.
inate in between facts that are non-
reading,
essential and others which are of central symbolic
significance. In D.H. Lawrence's The Horse Dealer's In this example I have deleted mechanically
Daughter, for example, the relevance of the pool every sixth word. To guess the correct word in
(into which the doctor wades to rescue the some cases may perhaps seem extremely difficult,
attempted suicide) could be questioned. This type even for the native speaker, e.g. 'static', which fills

of question tests not simply accuracy in remem- the seventh blank, or 'exposed', which fills the
bering all the facts of a story — 'vacuum-cleaner fourteenth. However, two things must be remem-
reading', as it has been called — but the power to bered: first, that such a short passage is not a real
appreciate the importance of certain facts — like example of a cloze test; secondly, that with a

82
JPB oyle
longer passage, the same mental mechanics go on Another way is to give a picture of a face which

as in crossword puzzle solving — clue 3 across seems expresses a complex of emotions. The Mona Lisa
impossibly difficult, until you get helped by dis- would be good example. The student has to des-
a

covering the answer to clue 5 down. Similarly with cribe the face. Or a pair of pictures can be used,

cloze, a word guessed later on in the passage gives e.g. a self-portrait of Rembrandt and a self-portrait

a clue to an earlier word, e.g. 'dynamic', which fills of Van Gogh. Works of art are usually more useful
the fourth last blank, would not be impossible to for this, in that they are more enigmatic, less

get, particularly with the help of 'a world in cre- explicit than most photographs. But good photo-

ation'immediately following it. And once 'dynamic' graphs can readily be found too.
has been guessed, in the context of the whole pass-
age, the earlier seventh blank might well be cor- b. Students of literature are no different from
rectly filled too, 'static'. other ESL students in their tendency to make
grammatical errors. I am concentrating here on
INTERVIEWERS: Have you read The Bost- test-types which seem to be particularly relevant,

onians? but by no means saying these are the only ones.

In one type of test the grammatical accuracy may
CARY: No, haven't read that.
I

INTERVIEWERS: The Princess Casamassima? be the testing-point. In another, the student's

CARY: I'm afraid haven't read that either.
I
power of imagination may be the important thing.
Cecil is always telling me to read her and must. I
'The snow felled on the ground like flying-saucers
landing' or 'The tree failed over like an oil-rig in
But I read James a good deal. There are times
you need James, just as there are times when you the North Sea capsizing' should not be marked

must have Proust — in his very different world down, because of the slip, 'felled' or 'failed'. I am
of change. The essential thing about James is not saying the student should not be 'felled' for
that he came into a different, a highly organ- such errors in a grammar test. Since simile and
ized, a hieratic society, and for him it was not metaphor are so important in literature, simple

only a very good and highly civilized society, tests can be devised. The student has to complete
but static. It was the best the world could do. the sentence imaginitively.

But it was already subject to corruption. This (0 'His face was wrinkled like
was the center of James' moral idea — that 'Her hair was flowing like
everything good was, for that reason, specially 'The old lady's teeth were black like
liable to corruption. Any kind of goodness, (fl) More difficult would be examples with two
integrity of character, exposed that person to blanks:
ruin. And the whole civilization, because it was 'His face was like
a real civilization, cultivated and sensitive, was 'Her hair was like
fearfully exposed to frauds and go-getters, 'The old lady's teeth were like
brutes and grabbers. This was his tragic theme.
(Hi) With less control will come more variety and
But my world is quite different — it is intensely
scope, as well as more difficulty:
dynamic, a world in creation. In this world,
'His was like
politics is like navigation in a sea without charts
'Her was like
and wise men live the lives of pilgrims.
'The old lady's was like

c. To test appreciation of register, discrete-point

2. Writing examples like the following are useful:
a. Vocabulary. The literature student has to be (i) 'This old (person/chap/individual/gentleman)
more at home in words which describe the finer comes up to me and he says
'

shades of human emotion. Basic feelings can be (ii) 'The butler bowed and (heaved/shoved/passed/
taken as the starting point. Then the student must
donated) over the letter.'
find words within the range of that broad
six
More creatively, the student can be asked to
feeling, and use the six words in a sentence. For
write a short passage which deliberately aims at
example, ANGRY might turn up such words as: Wode-
humour by means of mixing registers. P.G.
raging/cross/peeved/annoyed/furious/fuming. JOY:
house is the master and model for this.
happy /glad/overjoyed/pleased/cheerful/delighted.
FEAR : terrified / afraid / dreading / apprehensive / d. More global tests of writing would include com-
anxious/nervous. LOVE: adore/like/be fond of/be position, more or less controlled, or a free response
attached to/be devoted to/be infatuated with. to a text, tape or film.

83
J P Boyle
3. Listening d. Oiler makes much of dictation as a reliable test-

a. Drama on tape/radio. After a play has been type. For straight dictation, probably such occa-
listened to, response of a general nature can be sional essays as J.B. Priestley's Delight, for in-
required. What was Or more
the play really about? stance,would be the most suitable.
specific answers could be demanded: Match each A more difficult test, of the Dicto-Comp type,
of the following with one of the characters in the would be to read a section from a modern play,
play — wily/forthright/exuberant/discreet/retiring. preferably a radio-play, and then to ask the student
A problem with this type of question, of course, is to fill in the missing dialogue, as well as can be
that words describing character tend to be subtle remembered. The following example may brief
in their nuances. However, this brings us back to serve as an illustration. This example deletes too
the type of exercise under Writing, 2a. much in too short a space, and is therefore more
difficult than a real test; it merely gives the idea of
b. Story. Again, after hearing a story, a global res- the test-type. comes from
It a prize-winning radio
ponse can be required: briefly retell the story, play by Jennifer Phillips, Daughters of Men. (vi)
making sure you include what seems to you to be
the main point. A more specific question would KATE: Age. Photos, of course you can have
be: Which of the following titles best suits the retouched.
story you have heard? This would be a multiple- ANNE: What?
choice question, in effect, with one of the answers KATE: Bahama had her photo taken every
more obviously defensible objectively. week.
ANNE: She told you?
c. Poetry. I have said very little on the use of KATE: No, Boy did. But he doesn't under-
poetry in testing the language ability of literature
stand. You can't retouch the image in the
students. It is of limited value, it seems to me, to mirror.
use a poem for structural questions, asking the
ANNE: But you have so much else to draw on,
student to put the poem into 'plain English'. This
Kate inner strengths.
can take the heart out of poetry, though in some KATE: And shall I be much freer.
cases, with extremely difficult poets (e.g. Hopkins),
ANNE: Oh, yes.
this exercise will be almost necessary for ESL
KATE: Without a child.
students. A more relevant type of test question
ANNE: But you won't be without.
will test the literature student's ability to appreciate
KATE: Best to be prepared, isn't it? On the
the way poetry charges words with different levels
defensive. I always went round before exams
of meaning. 'To read poetry adequately a student
at school saying I'd fail, didn't you? Saying I
must not only have a command of lexis in the
had to fail because I hadn't worked at it.
sense that he knows how to use a number of words;
ANNE : The thought of exams struck me dumb.
he must also know a number of possible uses for
KATE: Your driving test! Oh, I'll never forget
any given word.' A good example would be W.B.
that. And the lead up to it. And then I prac-
Yeats' The Song of the Old Mother.
tically carried you there.

rise in the dawn, and kneel and blow

I I
KATE: Age. Photos, of course, you can have
Till the seed of the fire flicker and glow:
retouched.
And must scrub and bake and sweep
then I
ANNE: What?
beginning to blink and peep;
Till stars are
KATE: Bahama had her photo taken every
And the young lie long and dream in their bed week.
Of the matching of ribbons for bosom and head, ANNE:
And their day goes over in idleness, KATE: No, Boy did. But he doesn't under-
And they sigh if the wind but lift a tress: stand. You can't retouch the image in the
While must work because I am old,
I
mirror.
And the seed of the fire gets feeble and cold. ANNE: But you have so much else to draw on,
Kate inner strengths.
The student can be asked what 'the seed of the KATE:
fire' means in the poem. Many examples of this ANNE: Oh, yes.
kind can easily be found. I have included this KATE: Without a child.
exercise under listening, because poetry is essen- ANNE: But you won't be without.
tially an oral exercise, but clearly it could be a KATE: On the
test-type for reading comprehension too. defensive. I always went round before exams
84
J P Boyl<
at school saying I'd fail, didn't you? Saying I Conclusion
had to fail because I hadn't worked at it. David Daiches claims that in literature students
ANNE: can achieve 'the fullest possible awareness of
KATE: Your driving test! Oh, I'll never forget human relevance.' (viii) And an African writer to
that. And the lead up to it. And then I prac- ELT Journal says: 'We have become so convinced
tically carried you there. that learning ESL means acquiring skills that the
teaching of literature seems to have become of
4. Speaking much ESL teachers
less significance. do not seem
a. Reading aloud. The difficulties of oral produc- to give much importance to the educational values
tion tests are well outlined in J.B. Heaton's Writing of the literature they teach.' (ix) In a sense it is

English Language Tests, (vii) For testing appreci- true of language teachers that, by
you their tests,
ation of the meaning of a text, his recommend- shall know them. This article has done no more
ation of combining specific features with general than suggest a few test types which will show our
fluency seems sound. For students of literature, students that we do respect the relevance of liter-
the dimension of reproducing feeling or emotion ature and that we do consider educational values
in a text should also be tested, with some sort of in our teaching and our testing.
dramatic reading. This kind of test will, of course,
be affected by all sorts of extraneous factors, and
these can hardly be overcome.

b. Conversation/Discussion. In ideal test con-

ditions, with small numbers involved, the tester
References
should be able, not so much to conduct a con-
(«) Davies, A.,Survey Articles on Language Testing,
versation with the student (this examiner/examinee Language Teaching and Linguistics: Abstracts,
situation can render real conversation pretty well Vol. 11, (1978).
impossible), but to observe two students conver- («') Pound, Ezra, 'How to Read' (1928), Literary
Essays, London, Faber & Faber, 1954, p. 21.
sing. With the importance which conversational
(hi) Power, H.W., 'Literature for Language Students:
dialogue has in both the novel and in drama, it is
the Question of Value and Valuable Questions',
an area which should have a place in testing, but Forum, XIX, 1, 1981.
the problems of assessment are obvious. Probably (iv) Marckwardt, A.H., 'What Literature to Teach:
as good a general test as any for literature students Principles of Selection and Class Treatment',
Forum, XIX, 1, 1981.
is to let them view a short play on video, after
(v) Haynes, J., 'Polysemy and Association in Poetry',
which the examiner asks them, pair by pair, to dis- ELT Journal, XXX, 1, 1976.
cuss the play while he sits in on the discussion. A (vi) Phillips, Jennifer, 'Daughters of Men', Best Radio

library of suitable video plays can easily be built Plays oj 1978, Eyre Methuen.

up. (vii) Heaton, J.B., Writing English Language Tests, Lon-

don, Longman, 1975, pp. 83-84.
c. Other suggestions could be short talks on the (viii) Daiches, David, 'The Place of English in the Sussex
Scheme', The Idea of a New University, Daiches,
work/life and times of major literary figures; and
David (Ed.), MIT Press, 1970, p. 79.
all sorts of ways in
here pictures or slides, useful in (ix) Adeyanju, T.K., 'Teaching Literature and Human
testing spoken English, could be assembled by the Values in ESL: Objectives and Selection', ELT
student, and the project assessed as a whole. Journal, XXXII, 2, 1978.

85
Patricia L McEldowney

A place for visuals in anguage testing

Introduction each other, the one rendering the other to some

If we consider the type of English that is currently extent redundant. If our primary aim is to test lin-
used as a medium of education here and overseas guistic behaviour, we must ensure that in a test of
and also as the medium of the day-to-day conduct reading or listening comprehension, for instance,
of an English-speaking society, we find that, in our candidates are responding to language rather
both its spoken and written forms, verbal com- than finding the meaning in accompanying visuals.
munication of information is commonly associated This can be achieved by the separation of the two
with the use of non-verbal communication devices. types of information device. So, for instance, can-
For instance, either in a text book or at a lecture, didates may read a text on the classification of
a geographical description of volcanos may be different types of volcanic core and then be asked
accompanied by photographs of typical examples, to label a set of blank diagrams as one type or
or by diagrams showing various types of core for- another. Successful labelling would indicate under-
mation; an historical account of the Battle of standing of the language of the text as well as
Waterloo may be accompanied by a map summar- demonstrating a familiarity with the type of
ising the progress of events; in biology, a tabulation diagram used.
summarising the characteristics of living organisms It is the contention of this paper that visuals
may be used to introduce a detailed discussion of used in the way just described have a valuable role
each; in chemistry, a description of how to prepare to play in the development of valid language tests,
and collect electrolytic gas may be accompanied a role which extends far beyond a demonstration
by a diagram to show how to set up the relevant of familiarity with non-verbal information devices.
apparatus; or, in physics, a graph may be used to Let us now examine this role in more detail.
show how the volume of a fixed mass of water
Comprehension
changes with temperature. Similarly, in the world
It is obvious that to produce a valid test we must
outside the educational institution, a person asked
be able to identify what it is exactly that we are
to give a stranger directions to the Post Office may
what he
testing. Towards this end, we note that in any
illustrate is saying with a rough sketch
piece of English there are two types of information
map; a set of instructions for operating a vacuum
and that a consideration of the relationship between
cleaner is usually accompanied by a set of diagrams
the two highlights some central aspects of the
to aid communication; newspaper report of
a
comprehension skill. In the following extract, for
unrest in some part of the world may be illustrated
instance, items communicating content or real-
by a map to pin-point the area and photographs of
world knowledge are italicised:
events; weather reports in newspapers or on tele-
vision are commonly accompanied by weather Of the perianth, the corolla is inside the
maps; and so on. calyx. This section of the flower
This interdependence of verbal and non-verbal
information devices would suggest that a test of Now, though he may not 'know' the words perianth,
the type of language involved might well not be corolla and calyx, the skilled reader can work out
complete without the inclusion of an element from the extract that they are parts of a flower.
which will enable the candidate to demonstrate his His main tool for doing this is the second type of
familiarity with the typically associated range of information in the extract — the language infor-
non-verbal displays. mation.
We note at this point, however, that the real- First of all, the marker the, together with its

world relationship between the two types of infor- position in a preposition group, indicates that
mation device must be altered if a valid language perianth refers to an object. Then, the linkers this
test is to be developed. In the examples cited and of associate it with the word flower. Similarly,
above, the two stand side-by-side to supplement corolla and calyx are marked as nouns by the and

IMMHMMMM
Patricia L McEldowney
their relative sentence positions while the language shows, in addition, an awareness of the function of
item of indicates that corolla and calyx are parts them to refer back to colls.
of the perianth. In addition, the language item In this way then we may be able to find some
inside indicates the relative positions of the calyx evidence of grammatical skill. Can we be sure,
and the corolla. In this way, language information however, that the correct response necessarily
signals the type of referent of each content item demonstrates anything more than mechanical
and also indicates the relationships between them. manipulation?
We note here that to have known the inform- We note that it is quite possible that correct
ation about the perianth, calyx and corolla as responses are triggered by information in the
expressed in the extract would indicate the poss- question and a familiarity with a manipulative
ession of learning. Not to have known but to have technique rather than being a demonstration of
been able to find out in the way described above any real processing of the information in the text.
indicates the possession of a tool for learning. Seen For instance, test questions of the type illustrated

in this way, therefore, the ability to use language in A will be no problem to candidates who have
to discover content seems to be a very basic com- had classroom practice (oral or written) similar to:
prehension skill worthy of testing.
Answer the questions.
How might we go about developing such a test?
Example: Is the rope around the parcel?
In light of the discussion above this can probably
Yes, it's around the parcel.
most clearly be illustrated with regard to a text
Is the table in the corner?
like Globbes in which the content items are
Is the tree in the garden?
unknown to us.

or Look at the picture and answer the questions.

Globbes Example: Where is the pond?
The four trug jigs of the globbe are the colls, The pond is in the park.
the soils, the pals and the tals. They are in Where is the bird?
wongs, one inside the other. First, there are the Where is the tree?
colls in the centre with the soils around them.
Outside the soils is the polnth. Where the polnth or Make some questions.
has two wongs, the jigs of the outer wong are Example: The two boys are in the tree.

the pals which tote the calyth. The jigs inside Are the two boys in the tree?

this are the tals toting the colnth. The four cats are in the basket.
The three pencils are in the box.
In an attempt to test candidates' skills with regard
to using their language knowledge to discover con- Such practice constantly pairs question and state-
tent, we might ask questions like: ment forms so that, given one form, the correct
response is to produce the other and it is not
Test Type A clear from a comprehension test incorporating the
(i) What are the four trug jigs of the globbe?
same principle whether a candidate is capable of
(ii) Where are the soils?
producing the appropriate answer if he does not
(Hi) What totes the colnth?
have the relevant information supplied in the ques-

It can be argued that success in finding the tion. That is, it is not clear to what extent he really
answers to such questions demonstrates some skill 'understands' the text.
in using language knowledge. For instance, in (i), a In an attempt, therefore, to ensure a demon-
solution word which provides noun shows aa stration of some processing of the information we
response of the correct sort to the question word might develop:
what; the provision of four nouns shows a response
Test Type B
to the code item four; the solution colls, soils, pals
(i) components of the globbe.
List the
and tals shows, in addition, an ability to match the
(ii) What surrounds the colls?
general sentence structure of the question and the
(Hi) Describe the construction of the colnth.
statement in the text. In («), the provision of a
preposition group shows a response of the approp- In these questions a rephrasing of the concepts ex-
riate type to the question word where-, the candi- pressed the text demands a greater 'under-
in

date who provides the group around them demon- standing' from the candidate. So, in (i), for in-
strates further an awareness of general sentence stance, the use of the synonym components for
structure; and the provision of around the colls jigs eliminates a direct clue from the question. The

87
Patricia L McEldowney
knowledge of the appropriate response to the verbal response. Many candidates may make
instruction List together with a response to grammatical mistakes — a further potentially sub-

the noun + s form (components), an awareness jective decision for the examiner. We can ignore
that the question being asked about the globbe
is such grammatical mistakes in our marking and so
and a knowledge that, where globbe occurs before go some way towards isolating the comprehension
are, the relevant information follows the verb skill. Even if we do this, however, we are not going

are all factors that will help a successful candidate far enough towards ensuring that poor productive

to produce colls, soils, pals and tals. Thus, without skills do not hinder the demonstration of compre-

aknowledge, of the word components, a candidate hension. Though certain candidates may 'under-
who produces the appropriate response is more stand' the text, their productive skills may be too
likely demonstrating a spontaneous use of his lan- weak to enable them to demonstrate even a small

guage knowledge than was the case in Test Type A. proportion of their understanding.
We note, however, that if a candidate had It can be seen, therefore, that the rephrased
known the meaning of components and if he had question type of B should be made objective. A
known that it is a synonym for jigs, he would have common solution is that demonstrated below:

had a content clue in the question to direct him to

the relevant sentence in the text. Now, it is clear Test Type C
that though language information is the basic tool Tick the appropriate box.

for finding content, the more content items a (i) The main parts of the globbe are the:
reader or listener has at his command, the more
wongs, soils, polnth, jigs Q
efficient is his comprehension. For instance, if in

the first sentence of the Globbes text we know colls, soils, pals, tals [_

two more of the content items:

calyth, jigs, tals, colnth [
The four trug parts of the flower are the
pals, tals, calyth, colnth f
colls

our comprehension task would have been much (ii) In the globbe the sols surround the:
easier. That is, in real-life the efficient reader uses
tals
a combination of language information and known
content to discover unknown content. It would colls
seem, therefore, that a clue in the question of the
polnth [
type illustrated by components in B(i) and
surrounds in B(ii) is justifiable in a way that the calyth [

one-to-one relationship illustrated in Test Type A

is not. (Hi) The colnth is formed by a circle of:
We note now that B(iii) (Describe the colnth)
tals
goes even further than B(i) and B(ii) in eliminating
clues from the question. The candidate must here, pals
in response to the instruction Describe ,
polnths [
gather several pieces of relevant information and
put them together in his own form: calyths |

There are sometimes two wongs in the polnth. Though we have, in this way, allowed for object-
The outside one is the colnth. It is toted by ivity, Test Type C embodies another problem also
the tals. inherent in A and B. All three types of questioning
This, however, highlights a difficulty for the are fragmentary.
examiner that is inherent, to a lesser degree, in all On the whole, we read and listen for two main
of the other items illustrated in A and B. In B(iii), reasons. At times we wish to follow exactly what
though we might agree on the relevant number of is being said. On other occasions we wish to find
points, different candidates will express them in information that is incidental to the speaker or
very different ways and examiners' responses to writer's purpose. In the latter case we might, for
these will also be very different. This situation is instance, skim through an outline of the events
likely to demand subjective judgements from indi- leading up to the sinking of Bismarck in World War
vidual examiners as to whether responses are II merely to find the names of the ships involved,
correct or not. Moreover, the questions demand a ignoring the sequence of events. In this case the

88
Patricia L McEldowney
skill of isolating fragmentary detail seems to be of In Test Type C items (i) and (Hi) emphasise the
relevance and it may well be that Test Type C is firstconcept while (ii) deals with spatial arrange-
valid from this point of view. ment. The same is true of A and B(i) and (ii). We
It seems clear, however, that we also need to note in B(iii), however, there is an attempt to
test whether candidates can follow a writer or broaden the question so that it covers both of the
speaker's intent. In this case, we require a demon- author's purposes. The constraints of objectivity
stration of an awareness of the whole — some already discussed, however, make it very difficult
demonstration of how individual parts fit together. to construct global questions in the genre illus-

Let us now consider the passage Globbes from trated by C.

this point of view. It is, however, possible to get closer to an
It seems that, in this description of the jigs of awareness of the whole:
the globbe, the writer is concerned to show
both a set of relationships of parts to the whole
and a spatial arrangement.

Test Type D
(i) Use the words in the box to complete the
diagram .

calyth, colls, colnth,

pals, polnth, soils, tals

globbe

Inside Outside

(ii) a. Complete the key. To do this use words

from the box below.

calyth, colls
colnth, pals, polnth
soils, tals

89
Patricia L McEldowney
(ii) b. Find suitable words for the circles of information about coracles, the production of
labelled A, B,C and D in the diagram. making one, or a description of its
instructions for
Use words from the box above to com- appearance, or a classification of various types, or
plete the following. Write X if there is a narration of how one was used on a particular
no suitable word. occasion will each require use of a cluster of differ-
ent language items and a different organisation of
Circle A= the content information (see McEldowney, P.,
Circle B =
Test in English (Overseas) The position after ten
Circle C =
years, Joint Matriculation Board, OP 36, September
Circle D =
1976).
Circles A+ B = If we wish to assess the tools of such expression,
Circles B + C = it is important, as indicated above in the discussion
Circles C + D =
about the testing of comprehension, that we isolate
We note, at this point, that the display for (j) is
the thing we wish to test in such a way that our

more abstract than that for (ii). We could offer a impression of linguistic performance is not blurred

third alternative which would be even less abstract. by any extraneous factors.
We could provide a drawing of a flower and ask Let us consider how this might be carried out.
candidates to label the parts. Test Type E
In this way, we can test an awareness of the (f) Describe how to make a coracle,
whole as well as that of spatial arrangement at (n) Describe what a coracle looks like.
whatever level of abstractness seems appropriate to
our particular candidates. Moreover, Test Type D Items like though directed towards a specific
this,

does this while still maintaining the criteria of the purpose, demand
a prior knowledge of coracles.

elimination of overt clues from the question, of If we intend to test such knowledge, then these

objectivity and of the elimination of verbal pro- items might well be valid — perhaps in a local his-
duction. tory or general studies paper. If, however, we
We now note a further important advantage of intend to test language proficiency, a candidate
Test Type D. With appropriate language skill it is who has no knowledge of coracles has nothing to
possible produce correct verbal responses to
to write about and so cannot demonstrate his pro-
questions like those illustrated inC without there ductive skills.

being any assurance of a real-world knowledge of Does this mean our choice of topic is at fault?

the forms being used. For instance, we might read Can we, choose topics previously prepared
rather,
or topics known to be within the experience of
John drew a rectangle, coloured it green and our candidates? In either case we are asking candi-
added a cross flunger the bottom trig corner.
dates to depend on memory of content and cannot,

and respond correctly with flunger the bottom in fact, be sure that a lucky 'question-spotter' has

trig corner to the question not learned an essay or speech off by heart. We
Where did John draw the cross? have not, in fact, isolated the ability to use pro-
We would, however, be at a loss if asked to draw duction tools from some closely associated assess-
John's diagram. ment of content knowledge.
The move from verbal to non-verbal information It would seem, from this point of view, that, as

illustrated in TestType D thus gives the candidate is implicit in the discussion of comprehension

the opportunity to demonstrate testing above, the choice of a topic which is largely
his ability to
'visualise' the relevant spatial relationship and so unfamiliar to our candidates might well provide us
indicate a degree of real-world or content under- with a better means of isolating the language tools
standing. we wish to test. This suggests that our test item
needs to provide the basic information to be used
Production in the production task.
If comprehension can be defined as a means of We might do this by providing a text and asking
using language and known content to discover new questions of the type illustrated above in B(iii), or
content either for one's own purpose or to mirror by asking candidates to write a precis of the text.
the author's purpose, production, in both the Such tasks, however, place too great a reliance on
spoken and written modes, can be defined as the comprehension skills and are no more valid, there-
skill of using language and content information to fore, than Test Type E in isolating production
fulfil a specific purpose. For instance, given a body skills. Moreover, they allow for the (verbatim)
90
Patricia L McEldowney
copying of stretches of the original, and the organ- or (Hi) supply a sequence of pictures outlining a
isation of the original more often than not provides story with the rubric:
the framework of organisation for the candidate to Tell the story of Old Joe and his coracle

follow. This is not likely to allow the candidate to

It is the contention of this paper that, when the
demonstrate spontaneously language and organ-
basic information is provided for candidates in a
isation skills appropriate to a specific task.
non-verbal form, they are more able to demon-
Valuable alternatives seem to be
strate their spontaneous use of language forms and
(i) supply a set of construction diagrams their ability to organise information in a manner
together with the rubric: appropriate to the task indicated by the rubric and
Say how to make a coracle do this with minimum reliance
that they are able to
or («) supply a picture of a coracle with the on verbal comprehension in such a way that they
rubric: demonstrate their familiarity with non-verbal
Describe what a coracle looks like information devices.

91
Arthur Godman

Competence in English used for academic

subject examinations

Introduction son's mind. The structure of a sentence can thus

Nobody can write in a language without a content. be incorrect because the concepts are inadequately
The evaluation of written sentences in English known, or the relation is inadequately formulated.
depends on the reader assessing both the language In general speech, it is often difficult to distinguish
and the content of a sentence. Unless the written between these two sources of error. Question 1 in
material is subject specific, the content may be para. 5.2 affords an example of the semantic im-
vague, or the concepts diffuse. The aim of this plication of grammar in a science register. The
paper is to discuss competence in English using passive voice used to indicate measurement
is

material which is extended to include communi- irrespective of the observer, hence the question
cation in academic subjects. Final exemplification requires an answer describing an accepted method.
will be given from science subjects, as, in these Answers 1 b,, c, d., e. show that this has been
subjects, the content of a sentence must conform understood.
with known concepts in relation.
1.3 Cohesion
1.1 Lexis Cohesion in a text necessitates several factors,
An academic subject has a linguistic register ex- mainly expressed in grammatical functions and by
pressed in grammar and a restricted vocabulary. A the use of adjuncts. As this paper puts forward a
sound knowledge of the lexis is the first requisite tentative suggestion for the analysis of sentences
of competence in an academic subject. Each aca- and not for the analysis of discourse, the subject
demic subject has its own lexis, and the same term is not discussed further. The evaluation of com-
can have different denotations in different subjects. petence in the construction of sentences must be
It is thus important to ascertain whether a student determined before discourse can be analysed.
associates the correct concept with a particular
term when he meets it in a specific situation. In 2.1 The question
para. 5.2, question 1 contains the term 'blood In most tests a question is presented to a student,

pressure'. The term 'pressure' varies with the sub- who then supplies an answer. The first stage of this

ject under discussion. In general speech it describes process involves the student understanding the
the concept of a force applied by means of an area, question, and this, too, has to be evaluated. The
e.g. as in a trouser press. In politics student is required to understand the lexis and the
it describes,
these days, the lobbying of politicians and acts implication of the grammar. These 'two interact to
which endeavour to influence them, but the produce a semantic content. This interaction can
meaning is diffuse. In science it describes force per be illustrated by question 1 in para. 5.2. The key
unit area, and force per unit area can be expressed lexemes are obviously 'measure' 'blood' 'pressure'
in millimetres of mercury. When confronted with with 'blood' and 'pressure' interacting. In several
question 1, how will the student react to the term Oriental languages, the sentence would be reduced
'pressure'? In order to be competent in science, for
to 'How measure blood pressure?' and the answer

example, he must know the precise definition. reduced to 'Doctor measure blood pressure', indi-
cating specialist knowledge is required and shifting
1.2 Grammar responsibility on to an acceptable observer. In
Competence in language depends on the under- English, the answer would be, baldly stated, 'By a
standing of the semantic implications of grammar. sphygmomanometer' (SPHYGMO — pulse; MANO-
Firstly, an understanding of the morphological METER — a pressure measurer). This, too, really
processes words can undergo is necessary. Secondly, evades the question, but would be an acceptable
the syntax of clause and sentence must be under- answer. on the other hand, the question was
If,

stood. A sentence is a statement corresponding to 'Describe measurement of blood pressure',

the
a state of affairs. The state of affairs consists of a then a fuller answer would be obligatory. The
set of concepts in relation to each other in a per- framing of a question, therefore, depends on the
92
Arthur Godman
answer which is required. Sufficient length is of the concepts, as expressed in language, can be
required in the answer to ensure that it is possible evaluated with some degree of success. Questions
to detect the student's understanding of the should preferably be selected from academic sub-
question. jects, particularly where the lexis is defined accu-
rately. The lexis in a sentence in an answer can
2.2 The answer then be examined to see whether it is correct, bears
Having understood the question, does the student some relation to the question, or is completely
know the answer? Response can be at three levels: irrelevant. For example, in answer la. of para.
recall, Application of knowledge, or solution of a
5.2, apart from the repetition of the question, the
problem. The answer thus depends on the student's lexis is mainly irrelevant. An answer can be evalu-
conceptual knowledge and his ability to reason, ated using different criteria. Is the scientific accu-
with these two factors forming the limits of a racy of an answer (in para. 5.2) being evaluated or
spectrum of application. is it the student's competence in language? Both
2.3 Gomposing the written answer evaluations will contain an element of the other,

The student must first marshal his concepts and and hence different methods of scoring the evalu-
choose suitable terms for such concepts. He must ation could be produced. If competence in language
then select the necessary interrelationships between is to be evaluated, then it is first necessary to
the terms and choose suitable syntax and morpho- ascertain the knowledge of scientific
student's
logical structures to express the interrelationship. concepts in the semantic area to be examined. For
The reader of the answer, for perfect communi- example, in question 1 of para. 5.2, does the
cation, should have complete congruity of con- student have any knowledge about blood pressure
cepts with the writer. The sum total of the answer and its measurement? Such knowledge can be pre-
includes not only the lexis and the individual items tested by simple recall, using objective-type
of grammar, but also their interaction. The whole questions. This reduces the possibility of incorrect
of the answer is thus greater than the sum of the lexis through ignorance if the subject is shown to
parts, and this provides a semantic content which be known to the student.
is in addition to lexis and grammar.
4.2 Subjective evaluation
3.1 Difficulties encountered in evaluation Evaluation of language competence needs quantit-
Objective-type questions produce an objective ativemeasurement. Looking at the range of answers
score; they test passive as well as active vocabulary, in para. 5.2, a quantitative evaluationwould seem
but they do not test the student's ability to express extremely difficult. Yet there is a parallel in the
himself. Structured questions can be used to test evaluation of oral English in which a subjective
active vocabulary alone, and also to test sentence rating is made of different qualities. The factors to
structure. Structured questions do involve a degree be measured competence in written
in evaluating
of subjectivity in scoring. Essay-type questions language are (a) lexis, (b) syntax, (c) morphology
always test all stages of sentence composition, but and (d) semantic content. Factors (b) and (c) com-
are highly for scoring. Cloze tests
subjective prise grammar, (a) and (d) comprise the content of
examine and syntax to some degree, but tend
lexis communication. Competence in content is more
to test passive knowledge rather than active appli- important than competence in grammar as the
cation. following two sentences illustrate.

3.2 (A) Hookworm penetrate the not wearing

The brief summary of basic testing procedures out- shoes feet.
lined in para. 3.1 points to structured questions as (B) He is a male with normal vision because his
the best compromise in testing the basic elements mother has a sex-linked characteristic.
of language, with essay-type questions as the sole
In (A) is excellent but the grammar
communication
means of testing connected and logical discourse.
The structured questions must be of a type to pro- is bad. In (B) the grammar is good, and even the
duce a positive response. The set of questions in lexis is correct, but the communication is bad. In

para. 5.2 indicates the degrees of success that can (A) the student knows the correct state of affairs
be obtained in eliciting a positive response. in relation to the question. In (B) the student has
no idea of the state of affairs in relation to the
4.1 Evaluation question.
The concept in an answer must be capable of being On the basis of the facts outlined so far, it is
marked objectively, and then the interrelationships suggested that 60% of a score should be given to
93
Arthur Godman
content and 40% to grammar. The mark for c. The blood pressure of a man measured is by
grammar can be split equally, 20%-20% for each using the checking of the diabetes disease
of morphology and syntax. The mark for content instruments.
has to be split between lexis and semantic com-
d. The blood pressure of a man measure by tieing
petence in indicating interrelationships. Three
a place of cloth like thing to the person arm
schemes are suggested for experiment. Lexis is
and plump it where it is related to a therm-
tie
considered for focal lexemes in a sentence. Syntax
atore in the other box where it is show.
is considered for logical order, and the use of pre-

positions. Morphology is considered for tense, e. It is measured by a pumping artificial organ

aspect, voice, and agreement of verbs, together controlled by a device. Usually it is green in
with correct paradigms for other terms. Semantic colour.
content is considered from the point of view of A man measured of the blood pressure
/. is 37°C.
whether the sentence is correct in its interrelation-
ships and whether communication is adequate. Q.2 Why may it be dangerous to use human
These factors can be measured only on a subjective urine as a fertilizer?
scale, and it is suggested that a total score of 10 be Answers:
allotted to a sentence. The distribution of scores a. Because it contain salts.

for the factors in the three schemes is.-

Scheme A Scheme B Scheme C

Lexis 2 3 4
Syntax 2 2 2
Morphology 2 2 2
Semantic content 4 3 2

Scheme A maximises content, perhaps more suit- b. Human urine contain large of salts this make
able for evaluation of an academic subject. Scheme the plant death.
C minimises content, perhaps more suitable for
c. When a human being eating the plant (veget-
evaluating language competence.
ables) with human urine that person will easily

be injected by a disease.
5.1
The questions and answers in para. 5.2 have been d. Because harmful germs are presented in urine,
selected to show average performance by overseas and it has not been properly washed before
students. With a mark out of 2, 3 or 4, and elimin- cooking it may transmit disease.
ating half-marks, a simple subjective scale is formed
e. It may be dangerous because if that person may
for each factor. By examining ten sentences, a sub-
have any infectious disease.
jective pattern of a student's performance can be
ascertained. By examining the average score for a / Because in human urine it consists of mineral
salts.
class, each factor can be evaluated to see which of
them is weak or strong in sentence structure. Eval- Q.3 Why does Table 2 (No. female mosquitoes
uations of the questions in para. 5.2 are given in trapped/time of day) give results for female
para. 6.2. mosquitoes only?
(Note-, diseases are spread only by female
mosquitoes.)
5.2 Questions and answer in science examinations
Answers:
Q. 1 How is the blood pressure of a man a. Female gives more mosquitoes and more son.
measured?
Answers: b. Because the mosquito landed on human beings
a. The blood pressure of a man measured is cooler to take their meals.
than the woman's blood. Because the blood in Q.4 Why does a farmer use a nitrogen fertilizer

the man body is very little than the woman's for the cereal crop under these conditions?
blood. Answers.-
a. Because they need fertile soil to growth.
b. The blood pressure of a man measured is by a
special of measurement which is put along b. Can grow better after long time, the long the
muscle of our arm. time is the more the crop grow.
94
Arthur God man
c. To enable the cereal crop for consumption. Q. 10 Why
does a female mosquito need a blood
3 meal from a human being or other mammal?
Q.5 Why did the water not rise to the 600cm
Answer:
mark?
A female mosquito rely human being for blood
{Note 25. cm 3 of air in 100 cm
3
of soil added
for reproduction.
to 500 cm water in a measuring cylinder)
Answers: The candidates answering these questions were
a. Because the water fill up the soil hole. overseas students and had been exposed to eleven
years' teaching of English language.
b. Because the soil completely press due to the
water which are shake.
6.1
Q. 6 What would be a suitable precaution to be
In para. 4.1 it was stated that the students' know-
taken by most people to avoid being bitten
ledge of related concepts should be tested by
by this species of mosquito?
objective-type tests. The following test questions
(Note-, this species bites in the middle of the
illustrate how this is envisaged.
night.)
Answers:
Q.l In what units is the blood pressure of a man
a. Throw unwanted can which contain water.
measured?
b. Most people would use mosquito nets before a. calories

they sleep. b. degrees Celsius

c. millimetres of mercury
c. By used a blanket to cover before went to bed. d. newtons per square metre

Q.7 Explain why the information given by the This objective-type question tests the background
graph and histogram agrees with a suggestion knowledge required for question 1 in para. 5.2.
that female mosquitoes lay their eggs before The answers from such a question would show
seeking a blood meal. that answers la. and/, in para. 5.2 could be anti-
Answer: cipated, as these students would have chosen
Because the time lowest number of mosquitoes answer b. above.
are trapped has the highest number of eggs laid
and verse-visa. Q.2 What danger is there in eating unwashed
vegetables?
Q.8 What is the effect of a nitrogen fertilizer on a. The disease of scurvy may be spread
the root crop under these conditions? b. The high salt content can increase blood pressure
(Note: progressive use decreased crop yield.) c. Intestinal diseases may be spread
Answers: d. The dirt on the vegetables can cause gangrene.
a. The root crop does not take nitrogen fertilizer The choice of answer would show whether pupils
and after six years it has drops to —10%. are aware of the method of transmission of intes-
b. The root crops under a nitrogen fertilizer will tinal diseases, knowledge necessary to formulate a

grow fewer than the phosphorus fertilizer. correct response to question 2.

c. It yield is very low and worst some times.

6.2
Q.9 What is the function of a mesentery? Using Scheme C, the answers to questions 1 and 2
Answer: were evaluated, and the average score found.
It flows the blood to the arms. Results were:

Question 1 Question 2
Lexis 2.0 2.5
Syntactical order 1.0 1.2
Morphology 1.1 1.0
Semantic content 0.4 0.7
Total 4.5 5.4

Maximum 10 10

95
Arthur God man
The small number of items does not permit any speakers of English. The questions must be culture-
useful analysis to be made. With more students, free, which eliminates English literature, partic-
each contributing ten sentences, a qualitative ularly for non-Indo-European students. For native
evaluation of basic weaknesses in the four areas English speakers, a wide range of academic sub-
under investigation could be made for the entire jects would be most suitable, as this follows the
set of students. This information would be in advice of Halliday et al. (*), in which he stated that
addition to the evaluation of the efforts of indi- the teaching of English should not be limited to an
vidual students. exclusively arts subject.

6.3
Any academic subject could be used for the pur-
pose of evaluation of language competence. Cer-
(0 Halliday, M.A.K., Mcintosh, Angus, Strevens, Peter,
tain restrictions arise, however, in the evaluation The Linguistic Sciences and Language Teaching,
of language for overseas students or non-native Longmans, London, 1964.

96
AEG Pilliner

Evaluation

To evaluate is to make a judgement of the worth There is no doubt that for many educators and
or value of something. The dictionary definition is researchers the task of evaluating educational pro-
useful in high-lighting the subjective nature of the grammes is associated mainly with the construction

evaluative process. Different evaluators will not and administration of achievement tests. It goes
necessarily arrive at similar judgements of the same without saying that the assessment of student
educational programme. One may endorse a achievement or progress is an essential component
mathematical syllabus because it produces high of the evaluation process and that well-constructed
levels of concept mastery. Another may retort that achievement tests can contribute importantly to
children who have been exposed to it still cannot such evaluation. In this context, evaluators need
add, subtract, multiply or divide. A foreign lan- to consider which of the two test styles, norm-
guage course may be commended for the com- referenced or criterion-referenced, is the more
mand of vocabulary and control of structures it suitable for their purpose. Unless that purpose is

offers to students. It may be open to criticism if it to rank students (which is scarcely an educational
affords them few opportunities to develop com- objective), the evaluator will normally opt for
municative skills. Much depends on the values the criterion-referencing procedures.
evaluator brings to bear in arriving at his/her judge- To repeat, assessment of achievement is an
ments. important element in the evaluative process. But
We begin by asking three questions. What are it is only one such element. Others may have an

we to evaluate? How do we set about it? When do equal claim to importance: attitude scales, ques-
we do it? What we are to evaluate — to judge the tionnaires, probes of opinions of students, parents,
value or worth of —
an educational programme
is teachers, communities; and explorations of other
or project defined, with Astin and Panos, as 'Any some of these several
non-cognitive aspects. All or
ongoing educational activity which is designed to modes of obtaining information may need to be
produce specified changes in the behaviour of the deployed in the global task of evaluation. Other-
individuals exposed to it', (i) wise, the danger exists of painting an incomplete
Traditionally, educational evaluation has been or even a distorted picture if the search for inform-
identified with curriculum evaluation. The defin- ation is restricted to those aspects of the edu-
itionproposed above is both broader and narrower. cational programme which are more readily meas-
Examples of educational programmes are: a single urable at the expense of those less so. In technical
classroom lesson; a visit to a museum or factory; a terms, validity may be sacrificed to reliability.
particular method of instruction; the content of a Stake makes the point cogently: 'It is a great mis-
particular text-book; a remedial programme; the fortune that the best-trained evaluators have been
environment in which learning occurs; the study of looking at education with a microscope, rather
parental attitudes to the education of their chil- than with a panoramic viewfinder'. (ii)
dren; the re-organisation of a school system. The third question was concerned with when
Clearly, a massive programme such as the last evaluation should take place. In an important
mentioned above will comprise a whole range seminal article Scriven (Hi) has distinguished be-
of smaller and different programmes or sub- tween evaluation occurring during the educational
programmes, designed to modify people's be- programmes — he calls it formative evaluation —
haviour in different ways; and since the programmes and evaluation deferred until its conclusion — sum-
are different, the methods used to evaluate them mative evaluation. Broadly speaking, the distinction
will also be different. he makes is between 'How are we doing?' and
This brings us to our second question. How do 'How did we do?' More specifically, formative
we evaluate? We start from the premise that evaluation refers to data emerging on taking stock
evaluation involves the collection of information at some intermediate stage, leading probably to
about the impact of the educational programme. slight modification or possibly even to substantial
How should we collect this information? What design of subsequent procedures. Summative
tools are available?Which are appropriate in evalu- evaluation, on the other hand, refers to an evalu-
ating which programmes or sub-programmes? ation of the effectiveness or success of the pro-

97
AEG Pilliner

gramme as a whole after it has been completed. assess the extent towhich these objectives have in
Particularly with extensive programmes, both fact been accomplished. Third, decision-making
formative and summative styles are essential. It should occur either during (formative) or at the
would be unrealistic to suppose that no change conclusion (summative) of the programme.
need ever be made from initial plans. The pro- Let us look a little more closely at what is in-
gramme would be pointless if no-one were con- volved here. In regard to the first point, the planner
cerned to establish its overall and final effective- assumes an implicit causal realationship between
ness. the stated objectives and the means proposed to
The roles of the formative and summative promote them. In regard to the second point, the
evaluator are in strong contrast. Though both are evaluator assumes that the tools used in assessment
concerned in making judgements, their standpoints are valid indicators of the extent to which this
are very different. The essential thing in formative causal relationship exists and the objectives are
evaluation is close cooperation between evaluator achieved.
and programme developer, interplay and involve- In practice, neither of these assumptions is

ment in smoothing out difficulties as they occur .necessarily valid. On the planning side, the pro-
and in maintaining momentum. The essential thing vision of a computer
in every classroom will not
in summative evaluation is total independence on necessarily lead to anenhanced grasp of mathema-
the part of the evaluator, disinterest and uninvolve- tical concepts or better mathematics learning in

ment and commitment only to dispassionate general — though it may do so. The installation of
analysis and reporting. a well-equipped language laboratory may or may
Let us sum up so far. Evaluation was defined as not lead to an improvement in students' language
judging worth. We have discussed what is to be performance. Again, it is part of the folk-lore that
evaluated —
an educational programme defined as a more favourable staff-student ratio will improve
an on-going activity designed to modify people's the quality of school education. It may, or it may

behaviour in desirable ways. We have noted some not. On might be concluded,

the evaluation side, it

(but by no means all) of the tools available for as a result of applying achievement tests alone,

evaluating a programme. We have drawn a distinc- that education has failed to benefit from the pro-
tion between formative and summative evaluation, vision of a new area school when there are in fact
the first occurring during the operation of the pro- handsome dividends in the way of improved
gramme, the second at its conclusion. relationships between community and school staffs
So much for 'what', 'how' and 'when'. There which other assessment techniques might have
remains the question "Why evaluate?' brought to light.
The fundamental purpose of evaluation is to Let us summarise again. An evaluation procedure
produce information and use it to make decisions has three aspects: an educational programme in
about an educational programme. The operative which there is an assumed causal relationship be-
word is decisions, stressed here in order to bring tween the stated objectives and the means proposed
out the distinction between, on the one hand, edu- to achieve them; an accumulation of relevant
cational evaluation used in making decisions which information about the extent to which the object-
may directly affect the futures of many people, ives are achieved by these means; and the use of
and, on the other hand, educational experiment- this information to reach a decision about how
ations aimed at extending the boundaries of know- best to operate the programme in the future.
ledge but without special regard to its immediate An educational programme comprises three
practical utility. components which for evaluation purposes it is

Evaluation, then, is about decision-making. A useful to keep conceptually distinct. These are:
decision might be to continue an existing pro- inputs, process and outputs.
gramme, to terminate it, or perhaps to modify it. Inputs, sometimes called antecedents, include
Or it might be to develop a new programme with a the talents, skills and other potentials for growth
view to possible adoption. and learning that the students bring with them to
In principle, the process of decision-making the educational programme. They also include the
should go something like this. First, the programme characteristics of the students' familes and of the
planners should specify some educational objective culture in which they live. The child who comes
or set of objectives and in due course devise and from a family or culture which values educational
implement some means of accomplishing these achievement is more likely to benefit from school
objectives. Second, the evaluator should bring to than one less fortunate in this respect. Regional
bear whatever tools are deemed appropriate to or cultural differences in input may give rise to

98
AEG Pilliner

quite different outcomes even with the same pro- Also to be taken into account are unintended
gramme. outcomes or 'side-effects'. For instance, loss of
Process, sometimes called operations, includes identity with family or community is perhaps too
those characteristics of the educational programme high a price to pay for high academic achievement.
itself which affect, or could affect, the outcome. Again, class grouping by ability, while enabling the
Process includes curricula, experimental treat- brighter children to achieve their potential may
ments, learning strategies, instructional techniques, discourage those in lower groups to the extent that
teacher styles, educational interventions, environ- their performance is uncharacteristically poor. On
mental experiences — in short, the whole range of the other hand, mixed ability grouping on egali-
environmental variables that characterise the edu- tarian grounds may hold back the brighter child.
cational programme — the means by which the The conclusions drawn from an evaluative study
educational ends are to be achieved. may be incomplete or even misleading unless the
Outputs are the ends or objectives of the pro- possibility of such unintended outcomes is taken
gramme, otherwise referred to as criteria, out- into account.
comes, goals, achievements or dependent variables. To summarise once more: an educational pro-
They are sometimes expressed at a high level of gramme has three components. First, an input: a
abstraction (for example, the development of condition existing at the start, the status of the
critical thinking). The trouble with such outcomes, student — his/her aptitude, previous experience,
desirable though they are, is the practical difficulty interest, willingness. Second, process-, encounters
of assessing the extent to which they are achieved. of student with teacher, student with student,
Evaluation is be more efficient if the out-
likely to student with environment, the succession of
comes are capable of more specific statement — engagements which the educational process com-
pupil achievement, knowledge, skills, attitudes, prises. Third, outputs: student achievements, atti-
aptitude for future learning, inter-personal relation- tudes, aspirations, resulting from the educational
ships. Such outcomes are more readily assessed experience: the consequences of education,
using currently available instruments — achieve- immediate and long-range, cognitive and effective,
ment tests, attitude scales, questionnaires, inter- personal and community wide.
views and the like. Analysing the programme in this way helps
These are strictly pupil-oriented outcomes, evaluators to pay due regard to each of these three
needing little justification. But there are other out- components. They have a dual role to play: first,
comes best described as intermediate: a reduction through accumulating information about the pro-
in operational cost, recruitment of highly qualified gramme they must provide a full
in all its aspects,
staff. These tend, only too become
easily, to description of and secondly, on the basis of this
it;

regarded as ends in themselves. There are two description, they must arrive at judgements on the
reasons for this. First, they are more readily speci- programme in order to reach decisions about it:
fied. Second, their achievement is more easily whether to recommend its continuation, modi-
measured. It iseasier to demonstrate a per pupil fication or abandonment.
reduction in expenditure than to monitor the Robert Stake (ii) has proposed a model which
possibly unfavourable consequences for the pupils. brings together all of these aspects of evaluation.
Administrators may proudly announce
an increase The diagram shows a layout of statements and data
in the proportion of graduate teachers. They have to be completed by the evaluator.
yet to show that pupils' development has improved
in consequence.

Intents Observations Standards Judgements

1 4 Inputs 7 10
2 5 Process 8 11

3 6 Outputs 9 12

Objectives Description Matrix Judgement Matrix

99
AEG Pillmer

Inputs, Process and Outputs, the components of providing answers to the questions originally asked?
the programme already discussed, have their place In summary, the descriptive aspects of the eval-
in both matrices, Description and Judgement. The uator's task are: First, to assess the extent to

Description matrix is further divided into Intents which the educational programme as analysed in

and Observations; and the Judgement matrix into the Intents column reflects the basic educational
Standards and Judgements. Each matrix thus con- purpose stated in the Objectives box.
tains six cells. Second, to assess the extent to which the three
The first column in the Description matrix is a Intents aspects are logically connected — the extent
declaration of the educational programmer's to which the intended programme makes sense.
intent, a statement of the programme as originally Third, to describe the extent to which intended
planned so as to achieve the global objectives inputs, process and outputs correspond to what
specified in the box on the left. Cell 1, input, des- actually happened.
cribes the students to be included, their number We now turn to the Judgement matrix.
and distribution, their prior achievements, their First, the Standards column. Its purpose is to
backgrounds, their environments and any other indicate acceptable levels or standards for inputs,
information about them he/she considers should process and outputs. What is acceptable is partly a
be seen as Input. Cell 2, process, indicates the matter of experience and partly one of judgement.
processes he/she intends to operate with these The declaration of intent in Cells 1-3 of the Des-
students: the special teaching he/she hopes they cription matrix is translated, in this Standards
will receive, the new equipment, materials and column, into a statement of what the evaluator is
text-books he/she hopes will be available to assist prepared to accept. However carefully planned the
this special teaching: in short, the whole range of original programme may have been, it is unlikely
processes he/she hopes to engage the students in that all contingencies will have been foreseen and
so as to achieve the output hopefully specified in that no problems will be encountered in practice.
Cell 3. The Standards column is a statement of the ex-
The chief concern of the evaluator with the tent to which the evaluator is prepared to settle
Intents column will be the logical relationships for less than perfection —
always provided that the
vertically displayed in this column. Are the success of the programme is not materially preju-
intended processes specified in Cell 2 logical in diced by this degree of tolerance. In the inputs
the light of the intended input in Cell 1? That is, cell (7), a limited departure from the complete
are the lessons, learning experiences, equipment, randomisation of student input envisaged in the
etc. specified in Cell 2 appropriate for the students corresponding cell in the intents column (1) may
described in Cell 1? Moreover, is it logical to not be disastrous. In the process cell (8) a slight
expect the outputs specified in Cell 3 if the oper- fallbelow the specified teacher student ratio (2)
ations listed in Cell 2 are conducted with the may be tolerated. In the output cell (9) 85 per
students described in Cell 1? cent of students achieving mastery in a criterion-
Still in the Description matrix we move from referenced achievement test instead of the 90 per
the hopeful statements of the Intents column to cent hopefully specified in the intents column (3)
the harsh realities of the Observations column, is not to be despised. In short, the standards
which is a statement of what, in the event, actually column is a realistic statement from the evaluator
happened. Horizontal comparison of Cells 4, 5 and of the several criteria by which the educational
6 with the corresponding Cells 1, 2 and 3 will indi- programme's success or failure is to be judged.
cate the extent to which original intentions were Finally, Judgements in the last column are
or were not achieved in input, process and output. based on the degree of matching between Observ-
Cells 4 to 6 should indicate not only the extent of ation entries in the Description matrix and Stan-
these short-falls but also the modifications and dards entries in the Judgements matrix. The pro-
adaptations of the original plan these short-falls cedure is first, to compare, then, to judge. To what
made necessary. extent does the actual course of events, as recorded
making these horizontal comparisons, Cells 1
In in the Observation column cells, measure up to the
and 4, 2 and 5, 3 and 6, the evaluator should have criteria supplied by the corresponding Standards
in mind these questions. How far adrift is actuality cells? Against all the odds, maybe, some, though
from intention? How different from those origin- not all, community have
of the parents in a rural
allyintended are the inputs, processes and outputs been persuaded that their daughters would benefit
that actually occurred? Has the programme been from formal education. A hostel is built to accom-
so materially altered as to be no longer capable of modate women teachers. Potential success or cer-

100

am —^t^—^^^mm~m
AEG Pil liner

tain failure of this educational programme will marily descriptive, primarily judgemental, or both?
depend on whether women teachers can be per- Is it to emphasise input conditions, processes or

suaded to live and work within the community. outputs alone, or a combination of all three, and
The word 'criteria' has just been used in the their logical connections? Is it to be concerned
context of comparisons between Observations and with the degree of correspondence between what
Standards. We are becoming increasingly familiar is intended and what occurs? In seeking answers to

with the notion of criterion-referenced testing and questions such as these, the evaluator may hope to
the underlying concepts. It is suggested that an keep all his options in mind and to establish prior-
extension of the notion of criterion-referencing be ities among them.
made to the present wider context. It may be help-
ful to think of the comparison between corres-
ponding cells in the Observations and Standards References
columns as criterion-referenced. (0 Astin, A.W., and Panos, R.J., 'The Evaluation of
Educational Programmes'. In Educational Measure-
It is not to be expected that every evaluation
ment, (Ed. Thorndyke, R.L.), American Council of
plan will take account of every aspect of the Stake Education, Washington D.C., 1971, pp. 733-751.
model. The point is that this analysis indicates («) Stake, R.E., The Countenance of Educational Evalu-
twelve sub-areas within which and among which ation, Teachers College Record, 1967, Vol. 68,
pp. 523-40.
evaluation can take place. Emphasis will vary from
(Hi) Scriven, M., 'The Methodology of Evaluation'. In
one educational programme to another. The evalu- Perspectives of Curriculum Evaluation: AERA
ator must clarify his responsibility by answering monograph series on curriculum evaluation,
questions such as these: is the evaluation to be pri- Chicago, Rang-McNally, 1967, pp. 39-83.

101
Frank Chaplen

Measuring student achievement:

some practical considerations

1. Introduction (a) teacher's assessment based on a student's

This article summarises some of our experience achievement in class activities (oral and
gathered over the past 5 years with the ESP pro- written), in delivering a prepared talk, in
grammes for premedical and paramedical students homework, etc.

in Kuwait University. In our teaching situation, the (b) tests and examinations.
yearly intake of each group of students is divided For the first 15-week course in our programme
into 5 by different teachers, each
classes taught the weighting of the different components is as
class following the same weekly teaching/study follows:
programme as the others, and each class taking the
same tests and examinations as the others. Thus, Test 1 (after 30 class hours) 10%
each semester in the 4-semester premedical and Mid-semester Exam (after 70
paramedical English programme each premedical class hours) 20%
student and each paramedical student is assessed Test 2 (after 110 class hours) 20%
on virtually the same scale of achievement as every Final Examination (after 150
other premedical and paramedical student. To class hours) 40%
achieve this requires a rather more complex and Teacher's assessment 10%
systematized approach to evaluation than is
100%
necessary in some teaching situations. Nevertheless,
our experience should be of interest to any teacher The difference in the weighting of Test 1 (10%)
or administrator who is responsible for developing and the Final Examination (40%) is intended to
assessment procedures. take account of two facts. First, the students come
straight from secondary school, so few of them
know what is expected of them at university for at
2. Setting Target Dates for Tests and
least the first several weeks. Second, the compara-
Examinations
tively heavy weighting of the Final enables slow
For several reasons the dates for examinations
starters to compensate for low achievement earlier
need to be fixed far in advance. Teachers prefer
in the course.
this, students demandand administrators can
it,
The teacher's assessment is intended to provide
be extremely unsympathetic if you try to give
students with some incentive for working con-
them only one week's warning of the fact that you
tinuously both in and out of class throughout the
need a large examination room with film projection
course. It also provides the teacher with an oppor-
or video facilities from 8.00 to 10.00 a.m. In our
tunity to evaluate elements of the course which it
experience, a host of problems can be alleviated if
is difficult to measure in a formal test or exam,
a list of target dates and responsibilities such as
e.g. oral communication. In earlier years, we gave
that in Table 2.1, is routinely prepared at the
a weighting of 20% to the teacher's assessment,
beginning of each course. It reduces arguments
but this tended to have an adverse effect: the
later if this is prepared during a meeting of all the
weaker students copied their homework assign-
teachers involved in the course.
ments from those written by the more proficient
students (usually in other classes so that a direct
3. Weighting the different parts of the assessment check by the teacher was impossible). Since this
component defeated the purpose of the teacher's assessment
The term 'assessment component' is intended to element, the weighting was dropped to 10%. This
encompass all the forms of assessment on the basis seems sufficient to persuade the weaker students
of which a student's final grade for a course is to work reasonably consistently while not being
decided. In our teaching situation, these include enough to encourage them to go to the trouble of
the following: copying the work of more proficient students.
102
Frank Chaplen
In the second, third and fourth courses, the Consequently, when the students' total marks for
teacher's assessment element is increased to 20% a course have been calculated (column G in Table
because by this time all but a few students recog- 3.2), these have to be converted to letter grades,
nise that they will make little progress except grades.
through their own unaided efforts. When only one teacher is involved in deciding
The number of marks for each element in the which students should receive A's, which should
assessment component will vary, of course. The receive B's, etc., the conversion of marks to letter
maximum mark for Test 1 might be 115, that for grades is a relatively painless process. However,
Test and that for the Final, 135. Therefore,
2, 95, when 4 or more teachers are involved, each res-
marks obtained
to obtain the desired weighting, the ponsible for 15 or more of the total number of
by students in each element must be converted. students, decisions are far more difficult to make.
Let us assume that the assessment component Most teachers identify very strongly with their
of a course consists of the 3 elements listed in students, and would like to see the majority receive
Table 3.1, and that their maximum marks and high grades; but this is not always possible, partic-
desired weightings are as entered in columns 2 and ularly if the students are assigned to classes on the
3. (see page 105). basis of placement test results in order to produce
To convert Testmarks to the required weighting
1 fairly homogeneous groups. The procedure that we
(20%), multiply each student's mark by 20, then have evolved over the past 5 years to decide the
divide the result by 115. To convert Test 2 marks, students' final grades seems to satisfy both teachers
multiply by 30, then divide by 95. To convert and students.
Final Exam marks, multiply by 50, then divide by The first step in this procedure is to prepare a
135. marks for each class (columns
distribution of final
The task of calculating these conversions (i) is 1, 2, 3 and 4 in Table 4.1), and for the entire
considerably lightened if a class mark grid like the intake (column 5). Column 6 contains the cumu-
one in Table 3.2 on page 105 is constructed at the lative frequency of final marks, e.g. 43 students
beginning of each course. This grid also simplifies scored 73% or above, 56 scored 61% or above.
record keeping, and makes it relatively easy for a Column 7 contains
cumulative frequency
the
second person to check each teacher's calculations. percentage of the final marks, e.g. 66.2% of the 65
A cheap electronic calculator is an essential tool in students scored 73% or above, 86.2% scored 61%
these operations. or above. Column 7 provides a convenient check
in later years on the comparative standards of
4. Deciding a student's final grade for a course
successive intakes of students.
End-of-course results are rarely expressed as per-
Note that columns 1, 2, 3 and 4 contain tallies.
centages or raw marks because these are not very
These are entered on the distribution sheet in the
meaningful. A mark of 80%, for example, may rep-
following manner: one person reads out the final
resent an outstanding achievement in one course,
marks from each class mark grid (Table 3.2,
but only an average achievement in another. For
column G), a second person makes a tally on the
this reason, course results are commonly reported
distribution sheet in the appropriate class column
on a letter scale: a grade of A representing an out-
standing achievement, B representing an excellent
as each mark is called out. A decimal of .5 or
above is rounded up to the next whole number,
achievement, etc.
e.g. 64.51 becomes 65. A decimal of .49 or below
In our teaching situation, we are required to
is rounded down, e.g. 71.47 becomes 71.
report course grades on the following 10-point
The first person calls out the mark as it appears
letter scale:
on the class mark grid, e.g. 71.49. The second per-
A son calls out the rounded-up or rounded-down
A- OUTSTANDING figure, e.g. 71, before entering the tally on the dis-

B+ EXCELLENT tribution sheet. The first person listens to provide

B GOOD a check on the calculation.
B- QUITE GOOD Column 6 on the distribution sheet, the cumu-
lative frequency column, provides a check that all
C+
CLEAR PASS the final marks have been tallied, then added
c
correctly horizontally. For example, if the cumu-
D+ BORDERLINE PASS lative frequency total is 64, and if there are 65
D BORDERLINE FAIL students, it is clear that an error has been made
F FAIL either in calling out final marks, or entering tallies,

103
Frank Chaplen
or totalling tallies, or calculating the cumulative their finalexamination papers are studied before a
frequency. This error needs to be rectified. final decisionis taken. In this case, it is decided to

The distribution of final marks is considered at leave the boundary where it is: between 84% and
a meeting of all teachers, and initial decisions taken 85%, but in other cases the boundary will be
on where to set the boundaries between the letter moved.
grades on the mark scale. The first year that a The most painful decisions concern the placing
course is taught, these initial decisions are necess- of the lower boundaries, particularly that between
arily somewhat arbitrary. It might be decided, for D+ and C. Inevitably there will be one or two
example, to base the initial tentative distribution students just below this borderline who have made
of grades approximately on the normal distribution extraordinary efforts during the course, and one or
curve, that is, to determine by purely statistical two students who have done very little work. Does
means what proportion of the students should fall one give all 4 a grade of C, and thereby risk con-
between each grade boundary. Diederich (i) vincing the lazy ones that they really do not need
suggests that teachers use a modified stannine towork in the remainder of the English language
score scale for this purpose; applying his suggestion programme courses? Or does one give all 4 a grade
to the letter grade scale described above, we get of D+, and risk convincing the serious students
something like this: that no amount of effort is worthwhile? However

Letter Grade A A- B+ B B- C+ C D+ D and F

% of Students
8% 4%
4% 8% 12% 16% 20% 16% 12%
in each Grade

Cumulative % 4% 12% 24% 40% 60% 76% 88% 96% 100%

The second year that the course is taught, the painful these decisions may be, one must resist
tentative grade boundaries can be based on those the temptation to 'find an extra mark somewhere'
finally decided the previous year. On whatever basis for the serious students; that would be certain to
the initial setting of the grade boundaries is done, create manner of problems in the future.
all

however, the next stage in the grading procedure is The main point to notice is that final decisions
the critical one: the initial grades for each indiv- about where to place grade boundaries are based
idual student are written on the class mark grid on consensus, and that this consensus arises only
lightly in pencil, and each teacher considers each after considerable discussion of the individual
student's tentative grade in the light of his know- students concerned. seems to us that it is only
It

Now comes the time-

ledge of that student's work. in this way that one can come as close as possible
consuming discussion of borderline students that to measuring each student's achievement with the
usually leads to the shifting of the tentative grade same yardstick, no matter who his teacher is, while
boundaries up or down 1, 2 or even 3 percentage at the same time retaining an intense concern for
points. the individual student. It is perhaps an indication
Look at the distribution of tallies in Fig. 4.1 on of our students' recognition of the objectivity yet
either side of the boundary between the grades B+ fairness of the grading system employed that
and B. The teacher of class II may feel that his during the past five years we have not had one
student who gained 84% deserves a B+, but a serious complaint about the grades that we award.
student in class III and one in class IV have also
gained 84%; therefore the teachers of those classes
are necessarily involved in this decision about («') The conversions demonstrated here take no account of
the standard deviation of the scores for each element.
where the B+/B boundary shall be placed. They
Therefore, only an approximate weighting is obtained.
feel that their two students merit B's, not B+'s. The
But it is certainly a considerable improvement over
teacher of class II admits that because he has only simply adding the raw scores for each evaluation ele-
one other student with a grade higher than B, this ment to obtain a final mark for the course.
may be affecting his judgement. Nevertheless, all
(»') Paul B. Diederich, Short-cut Statistics for Teacher-
the marks that the 5 students on 84% and 85%
Made Tests, Evaluation Advisory Series No. 5, Educa-
gained in all tests are examined and discussed, and tional Testing Service, 1960, p. 37.

104
Frank Chaplen

Table 2.1: Example of a Target Dates Information Sheet

Assessment and Count-Down Dates, Semester 1, 1980-1981

Course Assessment Date, time and Posting of Preliminary 1st Draft Final Draft Completion Posting of
and person place of notice for drafting for typing of marks provisional
responsible assessment students meeting processing grades

101 Test One: Mon. Oct. 6 Wed. Oct. 1 Tues. Sept. 10 Wed. Oct. 1 Sat. Oct. 4 Sat. Oct. 11 Sun. Oct. 12
A.M. 10.00 Room 101

Mid-Sem Wed. Nov. 12 Wed. Nov. 5 Sun. Nov. 2 Tues. Nov. 2 Sat. Nov. 8 Wed. Nov. 19 Sat. Nov. 22
P.S. 9.00 Room 312

Test Two Tues. Dec. 9 Tues. Dec. 2 Wed. Nov. 26 Sun. Nov. 30 Wed. Dec. 3 Sun. Dec. 14 Mon. Dec. 15
G.L. 10.00 Room 102

Final Sun. Jan. 4 Sun. Dec. 28 Tues. Dec. 23 Sat. Dec. 27 Mon. Dec. 29 Sat. Jan. 10 Mon. Jan. 12
S.A. + A.M. 9.00 Room 213

Table 3.

maximum marks desired weightings

Test 1 115 20%

Test 2 95 30%
Final Exam 135 50%

Table 3.2. Example of a Class Mark Grid

Course 101 Semester 1, 1980-81 Class IV Teacher: J. Doe

TEST1 TEST 2 FINAL EXAM TOTAL FINAL

AX 20 CX 30 EX 50
STUDENT Max. = 115 Max. = 95 Max. = 135 B+ D + F GRADE
115 95 135

A B C D E F G H
1. Ahmed 85 14.78 81 25.58 101 37.41 77.77% B
2. AH 49 8.52 51 16.11 86 31.85 56.48% D+
3. Farced 53 9.2 60 18.95 89 32.96 61.11% C
etc. etc. etc. etc. etc. etc. etc. etc. etc.

15. Mohammed 87 15.13 83 26.21 114 42.22 83.56% B+

105
Frank Chaplen
Table 4.1: Example of a Mark Distribution Sheet

English Language Division Course 101 Semester 1, 1981-82

Final Mark Distribution and Grade Boundaries

1 2 3 4 5 6 7

MARK CLASSES TOTAL GRADE

I II III IV I-IV c.f. c.f.% BOUNDS
95 2 2 3.1
/ /

2 4 6.2
A
94 / /

93
92 / 1 5 7.7 A-
91 / / 2 7 10.8

90
89 III 3 10 15.4
88 1 11 16.9
1
B+
87 / 1 12 18.5
86 // 2 14 21.5
85 1 / 2 16 24.6

84 / 1 / 3 19 29.2
83 II 2 21 32.3 B
82 / II 3 24 36.9

81
80 3 27 41.5
79
/ II

1 28 43.1
B—
1

78 / 1 2 30 46.2

77 / II 3 33 50.8
76 / 1 1 3 36 55.4
75 / 1 37 56.9 C+
74 / 1 1 3 40 61.5
73 1 II 3 43 66.2

72
71 / 1 44 67.7
70
69 / 1 45 69.2
68 / 1 46 70.8
67 / 1 47 72.3
c
66 / 1 48 73.8
65 III 3 51 78.5
64 / II 3 54 83.1
63
62 / 1 55 84.6
61 / 1 56 86.2

60
59 4 60 92.3
58
// II
D+
57 / 1 61 93.8

56
55
54 D
53
52 1 1 62 95.4

51
50 / 1 63 96.9
49 F
48 // 2 65 100%

106
JohnWOIIerJ,

Appendix: Research
A comment on specific variance versus global
variance in certain EFL tests

Perhaps the basic statistical problem in the deter- respond briefly to claims by Abu-Sayf, Herbolich,
mination of what a test measures is the assessment and Spurling (1979) concerning 'unique nonchance
of the sources of the variance across individuals variance' (which, according to the classical factor
that the test produces. Put in nontechnical terms, model, is specificity) in each of four parts of an
the deeper problem is to find out what factors in EFL proficiency exam recently developed by them
the behaviour of test-takers result in differences in at Kuwait University. In subscores of 139 adult
the performances of various individuals and non-native speakers of English, they claimed to
groups. According to the classical factoring model have identified (using a method recommended by
(cf. Harman 1976: 18-20), the standardized unit Davis 1968, 1972) four specificities — Grammar
variance of any test j can be composed (at least 22%, Listening Comprehension 41%, Reading
theoretically) into three uncorrelated components: Comprehension 32%, and Translation 24%. They
1) variance that is common to other tests, referred further suggested that these findings were in con-
2
to as the communality which is designated hj ; flict with 'Oiler's hypothesis (1973) of there being
2) variance that isj but nonrandom,
unique to only one global proficiency [test] such as a cloze
known as the specificity designated bj2 and 3) vari- ; or a dictation' (p. 117).
1

ance which is unique to j but random, referred to Actually, their paper raises two substantive
as error or unreliability designated e? . These three issues. First there is the question of test specificities
terms must add up to 100% of the total variance in in relation to the global factor, and second, there
j if the assumptions underlying the classical model
is the question of what is the most plausible ten-
are correct. According to that model, in order for tative conclusion regarding such a global factor. In
a test to achieve a satisfactory level of validity, we the first matter we may ask whether the claimed
should expect its communality with
aimed at tests specificities actually exist in the reported magni-
the same construct(s) to be high while its com- tudes, and in the second what the implication is

munality with tests aimed at disparate constructs for the existence of a large global factor of language
should be low. When we examine tests aimed at proficiency.
distinct constructs, we expect them to have rela- In regard to the question of specificities, Oiler
tively high specificities and low communalities. and Khan (1980) demonstrate that the application
Always we hope for low unreliabilities. of the modified Davis method of obtaining esti-
It is generally conceded (not quite gleefully) mates of unique nonchance variance (or specificity)
that there is no determinate single best solution which was applied by Abu-Sayf et al., is flawed in
for any given factoring problem. The variance in two ways: first it overestimates specificity by con-
any given test may be partitioned in an infinitude flating it with error variance, and second, it under-
of ways, as has been demonstrated in theory, and estimates communalities by overcorrecting for bias
arguments about the best possible solution are in squared multiple correlations. In fact, squared
probably misguided (Harman 1976: 27f). However, multiple correlations are already conservative
this is not the same as saying that all possible solu- estimates of communalities due to the fact that
tions are equal for all purposes. In some cases it is they are known to constitute the lower bounds of
possible to show that one solution is decidedly true communalities. Oiler and Khan show that the
better than a number of others. In most cases, the more probable limits of the specificities properly
arguments must be thrashed out by appealing to obtainable from the correlations in the Abu-Sayf
theoretical reasoning that goes beyond statistics et al. study are near zero — this compared with a
per se. Nevertheless, the application of statistical large global factor accounting for as much as 95%
methods seems indispensable. of the variance in the Grammar Test by one
With the foregoing as background, this note will method and never less than 75% of the variance
107
John W Oiler J
by any one of three different be to obtain all possible data on all possible tests —
in any of the tests
methods (squared multiple correlations corrected a clear impossibility. In fact, all of the evidence
for bias, communalities estimated by principal that I know of points to the conclusion that a
factoring with iterations, and communalities esti- multitude of language processing tasks aimed at
mated by Rao's canonical factoring with iterations). the kinds of things language users will actually be

Even by the most conservative method of esti- expected to do with language makes the best lan-
mating communalities, the respective specificities guage test in any given set of circumstances. If
were Grammar —.01, Listening Comprehension students are expected to learn to use the target

.04, Reading Comprehension .00, and Translation

language in contexts where they will be required
—.11. In no case did the specificity estimated for to listen, speak, read, and write the language, it

any test by any method amount to as much as half makes sense to use a plurality of testing procedures

the error variance in the test in question. for many reasons. I alsostill believe the remark

Coming now to the second question concerning that was written in 1977 though not published
the global factor of language proficiency, what until 1979, that 'it is probably safe to say
does the foregoing mean? In practical terms can that the best pragmatic testing procedures have yet

we conclude that there is only one factor, a general to be invented' (Oiler, 1979: 416).

factor? It seems to me that we cannot. The evi- The exigencies of practical life often force us to

dence suggests that following the classical factor leap beyond the empirical evidence. Theories in

model there is no reason to believe that any of the general are not based exclusively on substantiated

four tests produced by Abu-Sayf et al. generates a empirical findings either. Even if it turned out that
reliable specific variance. The variance generated there were only one general factor of language pro-

by any triplet of the tests pretty much exhausts ficiency, it would still make sense to use a multi-
the variance generated by the remaining single test. plicity of testing methods (as John Carroll, 1980,
That is, most of the reliable variance in each of and others have recently observed). Moreover, in
these four tests is common to the remaining three. spite of the pervasiveness and general strength of a

But these four tests do not by any means exhaust global factor of language proficiency underlying
the universe of possible language tests! Therefore, educational and psychological tests of all sorts,

we cannot on the basis of this study or any there is recent evidence that suggests a multiplicity
previous study conclude that there is only one of specific factors will yet be found (see Bachman
factor underlying the variance in all language tests. and Palmer, 1980, and Upshur and Homburg,
On the other hand, on the basis of many previous 1980). However, I personally doubt (at this
studies (see Oiler, 1979, Appendix for a brief and moment) that the general factor can be explained
already somewhat dated review), we can say that away satisfactorily by even the newer and more
there appears to be a large general factor of lan- powerful methods of confirmatory factor analysis.
guage proficiency in nearly all of the tests so far Nor have I seen any evidence as yet that would

studied (an exception is a narrowly defined spelling refute the claim that Spearman's general factor of

score, see Oiler, 1979: 281). intelligence may indeed turn out to be indis-

There is no basis, in spite of these findings, to tinguishable from proficiency in one's primary or

conclude that a single test such as a dictation or a strongest language (see Streiff and Oiler in press).

cloze test, is the best way to measure that general Still, we are speaking here of hypotheses rather

factor. In isolated cases of certain sets of tests, than proven facts, and it is my belief that one
results favour the interpretation that one or more of should not place too much weight on hypotheses
the input tests are better at measuring the general and hunches but carefully regard them as precisely

factor than other tests, but the only way we could what they are.
even theoretically find the single best test would

108
JohnWOIIerJ,

Perhaps the cited remark can be reasonably inferred composition, and oral interview, are Upshur's
from things I have said or written, but I do not believe test of productive communication (Upshur, 1969),
that I have ever actually advocated the use of any single reading aloud, and some multiple choice tests of
test or pair of tests as measures of language proficiency reading comprehension and other skills' (1973: 11). In
in an all encompassing general sense. While the evidence the reference to multiple choice tests I intended to
seems to suggest that dictation and cloze procedure include some of those developed in connection with
along with a number of other integrative tests, or more the testing of foreign students at UCLA during my
specifically pragmatic tests, are useful practical tools three years there, as well as tests like the Listening
for assessing language proficiency (whatever it may Comprehension and Reading Comprehension sections
turn out to be), I have long tried to stress that the con- of the TOEFL. Parish's Grammar Test (see Oiler and
cept of pragmatic testing extends to an implicit infini- Perkins, 1980, Appendix, item 22) is an integrative or
tude of test procedures (so do the terms cloze and dic- pragmatic test in this sense.
tation). While some tests appear to be better measures
of general language proficiency than others, there is no
reason to suppose that the class of best tests has yet By contrast, communality estimates (which may be
been identified. Nevertheless, in the article referred to read as indicants of a global factor in this case and
by Abu-Sayf et al., I indicated my advocacy (at that many similar studies) ranged from a low of 60% to a
time) of 'integrative testing' and suggested that 'some high above 85%. Further, in this particular case we are
of the types of tests that qualify as belonging to the referring to estimates based on the squares of multiple
integrative family besides dictation, cloze procedure, correlations, a lower bound for the true values.

References
Abu-Sayf, F.K., Herbolich, James B., and Spurling, S., 1979, Language tests at school: a pragmatic
Oiler, J.W., Jr.,
1979, 'The identification of the major components for approach, London, Longman.
testing English as a foreign language', TESOL Quarterly Oiler, J.W., Jr. (Ed.). In press, Issues in Language Testing
13, pp. 117-20. Research, Rowley, Massachusetts, Newbury House.
Bachman, Lyle F. and Palmer, Adrian S., 1980, 'The con- Oiler, J.W., Jr., and Khan, R., 1980, 'Is there a global
struct validation of oral proficiency tests'. Paper pre- factor of language proficiency?' Paper presented by
sented at the Fourteenth Annual TESOL Convention, the first author at the 1 5th Regional Seminar sponsored
San Francisco, March 1980. Also in TESL Studies 3, by the South East Asian Ministers of Education Organ-
pp. 1-20 (University of Illinois at Urb ana-Champaign). ization, at Regional English Language Center,
the
To appear in Oiler (in press). Singapore, April 1980. In the proceedings edited by
Carroll, John B., 1980, 'Language testing and psychometric John Read (to appear).
Second Inter-
theory'. Closing plenary lecture at the Oiler, J.W., Jr. and Perkins, Kyle, 1980, Research in lan-
national Language Testing Sumposium, Darmstadt, guage Rowley, Massachusetts, Newbury House.
testing,
West Germany, May 1980. Also presented at the Lan- Oiler, J.W., Jr. and Streiff, Virginia A. In press, The lan-
guage Testing Conference, Albuquerque, New Mexico, guage factor, more tests of tests, Rowley, Massa-
University of New Mexico, June 1980. To appear in chusetts, Newbury House.
Oiler (in press). Upshur, John A., 1969, 'Productive communication
Davis, Frederick B., 1968, 'Research in comprehension in testing'. Paper presented at the Second International
reading'. Reading Research Quarterly 3, pp. 499-545. Congress of Applied Linguistics, Cambridge, England.
Davis, Frederick B., 1972, 'Psychometric research on In G. Perren and J.L.M. Trim (Eds.), Applications of
comprehension in reading', Reading Research Quarterly linguistics, Cambridge, England, Cambridge University,
7, pp. 628-78. 1971, pp. 435-42. Also in Oiler, J. and Richards, J.
Guilford, J.P. and Fruchter, B., 1978, 'Fundamental (Eds.),Focus on the learner, Rowley, Massachusetts,
statistics in psychology and education', 6th edition Newbury House, 1973, pp. 177-83.
revised, New York, McGraw Hill. Upshur, John A. and Homburg, Taco J., 1980, 'Some lan-
Harman, Harry H., 1976, Modern factor analysis, 3rd guage test relations at successive ability levels'. Paper
revised edition, Chicago, University of Chicago. presented at the Second International Language Testing
Oiler, J.W., Jr., 1973, 'Pragmatic language testing', Lan- Symposium, Darmstadt, West Germany, May 1980.
guage Sciences 28, pp. 7-12. To appear in Oiler (in press).

109
Profi les

Profi les

JOSEPH BOYLE teaches in the English Depart- since 1976. Listening comprehension is to him the

ment of the Chinese University of Hong Kong. He most fundamental of all four language skills. Lan-
has previously taught in South America, India and guage acquisition he regards as an essentially intui-
the Philippines. He studied English Language and tive, emotional and non-cerebral experience. He is

Literature at Oxford and has done the Leeds ESL suspicious of 'academic postures'. Being himself a
Postgraduate Diploma Course. He works with linguist, he prefers to call the other kind 'linguis-
Chinese students who have chosen English as their ticians '.

major subject. He also runs extra-mural courses in PENNY FRANTZIS has taught English in Spain
Business English and Medical English. and Saudi Arabia, and directed an English Language
BRENDAN CARROLL had extensive ELT exper- Course in Switzerland. For the last eight years she
ience in Kenya, India and Nigeria before becoming has been lecturing and teaching at the University
Director of the British Council English Language of Leeds. Her work has involved the preparation of
Teaching Institute in London. He left his last post course materials for the academic needs of overseas
in the Council as head of their English Language students in Britain and ESP (English for Specific
Testing Service Liaison Unit to take charge of Purposes) programmes for such diverse groups as
Pergamon English Testing, Oxford. He also works Kuwaiti hospital administrators, overseas psychia-
as a private consultant and
the author of several
is trists, engineering students from the Middle East,

books, his most recent being Testing Communi- etc.

cativePerformance (Pergamon Press, 1980). ARTHUR GODMAN is in the Department of

FRANK CHAPLEN has been Director of the Eng- South-East Asian Studies, University of Kent. He
lish Language Division, Faculty of Medicine, has been a consultant for the British Council on
Kuwait University since 1975. Previously, whilst EST, and has written on that subject for the
Research Officer to the University of Cambridge Regional Language Centre, Singapore. He has
Local Examinations Syndicate, he was responsible examined science subjects for overseas' examin-
for developing the new form of the Proficiency ations in both English and Malay, and was science
and First Certificate examinations. He has also consultant for the Nucleus series (Longman).
been engaged as a tests and examinations con- Current publications include the Longman Dic-
sultant by the UN, the FAO, UNESCO, and the tionary of Scientific Usage and the Longman
Council of Europe. Dr. Chaplen is the author of Illustrated Science Dictionary.
severalEFL textbooks, his latest being a course in BRIAN HEATON is Lecturer in English for Over-
scientific English. seas Students at the University of Leeds and is the
ALAN DAVIES teaches Applied Linguistics in the author of several books concerned with English as
Department of Linguistics, University of Edin- a second and foreign language. His experience
burgh, where he is a senior lecturer. He is the includes teaching and lecturing throughout Europe
author of the English Proficiency Test Battery and Asia. He taught and lectured in Hong Kong for
(used for some years by the British Council) and twelve years, becoming Senior Inspector for Eng-
has edited 'Language Testing Symposium' (1968) lish, and was Visiting Professor in Education in
and the, 'Testing and Experimental Studies' volume Singapore from 1976 to 1979.
of the Edinburgh Course in Applied Linguistics ROBERT KEITH JOHNSON is Senior Lecturer in
(1977) as well as numerous articles on language the School of Education, University of Hong Kong.
testing. He is currently carrying out a validation He taught English in Zambia before completing an
study of the new British Council-Cambridge ELTS M.A. in Applied Linguistics at Essex in 1970.
(English Language Testing Service). Since then, he has been involved in ESL teacher-
PETER FABIAN. Co-founder of Arels and its training and Applied Linguistics Programmes, first
chairman, 1962-64, he has been in EFL for 42 with the Faculty of Education, University
of
years, interrupted by 10 years in industry. Soon Papua New Guinea, for nine years, and since 1979
after becoming owner-principal of the London in Hong Kong.
School of English (I960), he led a team of enthus- PAT McELDOWNEY taught in New Zealand and
iasts to develop the Arels Oral Examinations; he Libya before joining the University of Manchester
has been Chairman of the Arels Examination Trust in 1971. She is now a Lecturer in Teaching English

110

imm
Profi es

Overseas in the Department of Adult and Higher language testing project for overseas doctors. Her
Education at the University. Apart from various current research interests are related to language
short courses, she also runs the In-Service teacher- testing, curriculum development, and ESP materials
training course for the Lancashire Education production.
Authority. She is Chief Examiner for the Joint JOHN ROGERS is a Senior Lecturer at the English

Matriculation Board's Test in English (Overseas), Language Institute Victoria University, Wellington,
,

the North West Regional Examinations Board's New Zealand, where he has been teaching on Dip.
English as a Second Language and is Moderator for TESL courses for teachers from Southeast Asia,
the Yorkshire Regional Examinations Board's the South Pacific and New Zealand since 1971. He
English as a Second Language. Her new book Eng- spent two years teaching English to adults and
lish in Context is due for publication early in 1982, secondary school students in Sweden from 1955
published by Thomas Nelson & Sons, Walton on to 1957, and from 1957 to 1961 he helped to
Thames. train secondary school English teachers at Univer-
KEITH MORROW is an Assistant Director of the sitasAirlangga, Indonesia. He worked for the British
Bell Educational Trust, based in Norwich. He was Council in Nigeria (1961-1963) and in Ethiopia
formerly a lecturer at the University of Reading. (1963-1969), where he was the co-adaptor of
He is the Chief Examiner for the Royal Society of several books. From 1976
to 1978 he was seconded
Arts 'Examinations in the Communicative Use of to the SEAMEO Regional Language Centre, Singa-
English as a Foreign Language '. pore, as Specialist in the Psychology of Second
PAUL NATION is a senior lecturer at Victoria Language Learning and Applied Linguistics. In
University in Wellington, New Zealand. He has also Singapore he compiled Group Activities for Lan-
taught in Indonesia and Thailand. His special inter- guage Learning (RELC Occasional Papers).
ests are in teaching techniques and code-based IAN SEATON is Head of the Liaison Unit for the
approaches to language teaching. English Language Testing Service in the British
JOHN W. OLLER, Jr. received his doctorate in Council. He taught ESP programmes for two years
general linguistics from the University of Rochester, at the University of Tripoli, Libya and for two
in Rochester, New York in 1969. He has served on years at the University of Helsinki, Finland before
the faculty at UCLA and the University of New joining the Council in 1976.
Mexico and has held visiting appointments at BILL SHEPHARD. Academic training consisted of
Southern Illinois University and Concordia (in escape from the Cambridge English
systematic
Montreal). From 1971 to 1976 he served on the course via non-compulsory Old English, linguistic
Committee of Examiners for the Test of English gossip (no department at that time), phonetics and
as a Foreign Language at ETS. Presently, he is dialect research at Leeds. This was followed by
Professor of Linguistics at the University of New EFL teaching and finally adminstration of the
Mexico. Cambridge EFL examinations. With colleague
ALBERT PILLINER was, until his retiral in 1978, Harold Otter, he has tried to absorb usefully into
Director of the Godfrey Thomson Unit for Educ- the examination structure the successive waves of
ational Research and Senior Lecturer in the revolution and counter-revolution in EFL teaching
Department of Education, University of Edinburgh. and testing.
He is especially interested in the testing of English NIC UNDERHILL. Educational Co-ordinator,
as a foreign or second language. Sponsored by International Language Taught EFL at
Centres.
UNESCO and by the British Council, he has taught various schools in London and Sussex and then
(and continues to teach) in Europe, West Africa, worked for ILC for two years at the Kuwait Oil
South America and in the Middle and Far East. He Company Training Centre before returning to
has also directed language testing courses for England to do an M.A. in Applied Linguistics at
international groups in UK on behalf of the British the University of Reading.
Council and, more recently, the University of CHRISTOPHER WARD is head of the Testing
Edinburgh Institute of Applied Language Studies. Department, International Language Centre (Japan)
PAULINE M. REA is Senior Lecturer in the in Tokyo. After obtaining a Diploma in English as

Department of Foreign Languages and Linguistics, a Second Language at Leeds University, he taught
and Co-ordinator of the Communication Skills immigrants Bradford, Yorkshire for two years.
in
Unit at the University of Dar es Salaam. She has Then he went to Japan and taught at ILC for three
EFL experience at secondary level and in teacher years before taking up his present post six years
training programmes in Africa and Europe. She has ago.
worked on the General Medical Council's English SIDNEY WHITAKER has directed the TESL
111
Profines

training course at University College, Bangor, since shorter assignments in India, Bangladesh, China,
1964. He previously taught French at Glasgow Egypt, Jordan, and Yugoslavia. He regularly collab-
University,and English and language-teaching orates with English teachers in Spain as well as
methodology in Vietnam and Venezuela, with with teachers of immigrant pupils in Britain.

Bibliography

Alderson, C. and Hughes, A. (Eds.) (1981) Issues and Language Teaching, Special Issue on Lan-
in Language Testing, ELT Documents 111. guage Testing, No. 4, Hong Kong: Language
London: British Council Centre, University of Hong Kong
Allen, and Davies, A. (Eds.) (1977) 'Testing
J. P. B. Grieve, D.W. (1964) English Language Examining .-

and experimental methods', Edinburgh Course Report of an Inquiry into English Language
in Applied Linguistics, Vol. 4. London: O.U.P. Examining, Lagos African Universities Press
:

Beardsmore, H.B. (1974) 'Testing oral fluency', Harris, D.P. (1969) Testing English as a Second
IRAL, 12, 4, pp. 317-26 Language, New York: McGraw-Hill
Briere, E.J. (1971) 'Are we really measuring pro- Heaton, J.B. (1975) Writing English Language
ficiency with our foreign language tests?', Tests, London: Longman
Foreign Language Annals, 4, May Ibe, M.D. (1975) 'A comparison of cloze and
(1969) 'The main stages in the
Burstall, Clare multiple-choice tests for measuring the English
development of language tests', Stern, H.H. reading comprehension of South-East Asian
(Ed.) Languages and the Young School Child, teachers of English', RELC Journal, 6.2. Singa-
London: O.U.P. pore: SEAMEO Regional Language Centre
Carroll, B.J. (1980) Testing Communicative Jones, R.L. and Spolsky, B. (Eds.) (1975) Testing
Performance Oxford: Pergamon
, Language Proficiency Washington, D.C.:
,

Clark, J.L.D. (1972) Foreign Language Testing: Centre for Applied Linguistics
Theory and Practice, Philadelphia, Pa, Centre Lado, R. (1961) Language Testing: the Construc-
for Curriculum Research tion and Use of Foreign Language Tests,
Crocker, A.C. (1969) Statistics for the Teacher (or London: Longman
How To Put Figures in their Place), Harmonds- Lee, Y.P. and Low, G.D. (1981) 'Classifying tests
worth: Penguin of language use'. Paper presented at 6th AILA
Davies, A. (1968) Language Testing Symposium, World Congress, Lund, Sweden
London: O.U.P. Moller, A. (1975) 'Validity in Proficiency Testing',
Davies, A. (1978) 'Language Testing (Survey ELT Documents, 3, pp. 5-18, London: British
Articles)' Language Teaching and Linguistics-. Council
Abstracts, Cambridge: Cambridge University Morrow, K.E. (1977) Techniques of Evaluation for
Press, Vol. II a Notional Syllabus, Reading: Centre for
Davies, S. and West, R. (1981)The Pitman Guide Applied Language Studies, University of Reading
to English Language Examinations for Overseas (for the Royal Society of Arts)
Candidates, London: Pitman Morrow, K.E. (1979) 'Communicative language
Douglas, D. (1978) 'Gain in reading proficiency in testing: revolution or evolution', C.J. Brumfit
English as a Foreign Language measured by and K.J. Johnson (Eds.) The Communicative
three cloze scoring methods', Journal of Approach to Language Teaching, London.-
Research in Reading, 1, 1, pp. 67-73 O.U.P.
English Speaking Board (1981), Oral Assessments Munby, J.L. (1978) Communicative syllabus
in Spoken English as an Acquired Language, design, Cambridge: Cambridge University Press
Southport Oiler, J.W. (1971) 'Dictation as a device for testing

Fok, A.; Lord, R.; Low, G.;T'sou, B.K.; and foreign-language proficiency', English Language
Lee, Y.P. (1981) Working Papers in Linguistics Teaching, 25, 3, pp. 254-9

112
Bibliography

Oiler, J.W.(1972) 'Cloze tests of second language Read, J. A.S. (Ed.) (1981), Directions in Language
proficiency and what they measure', Language Testing, RELC Anthology Series 9, Singapore:
Learning, 23, 1, pp. 105-18 SEAMEO Regional Language Centre
Oiler, J.W. (1979) Language Tests at School, Schulz, Renate A. (1977) 'Discrete-point versus
London: Longman simulated communication testing in foreign
Oiler, J.W. & Streiff, Virginia (1975) 'Dictation: a languages', Modern Language Journal, 61, 3,

test ofgrammar-based expectancies', English pp. 91-101

Language Teaching Journal, 30, 1, pp. 25-36 Spolsky, B., Murphy, P., Holm, W. and Ferrel, A.
Palmer, A.S. (1972) 'Testing communication', (1972) 'Functional tests of oral fluency',

IRAL, 10, pp. 35-45 TESOL Quarterly, September

Palmer, A.S. (1981) 'Measures of achievement, Stein, Oswald (1972) 'Was Prufen Wir Eigentlich?',
communication, incorporation, and integration Verlag Lambert Lensing GmbH, Dortmund,
for two classes of formal EFL learners', RELC pp. 357-65
Journal, 12, 1, pp. 37-61 Stubbs, J.B. & Tucker, G.R. (1974) 'The cloze test
Palmer, L. & Spolsky, B. (Eds.) (1975) Papers on as a measure of English proficiency', Modern

Language Testing, 1967-1974, Washington Language Journal, 58, 5/6, pp. 239-41
D.C.: TESOL Upshur, J. A. & Fata, J. (1968) 'Problems in foreign
Perren, G.E. (1967) 'Testing ability in English as a language testing', Language Learning, Special
second language', English Language Teaching; Issue, No. 3
21, l.pp. 129-36;21,2,pp. 99-106; 21, 2, Upshur, John A. (1971) 'Objective evaluation of
pp. 197-202 oral proficiency in the ESOL classroom',
Perren, G.E. (Ed.) (1977) Foreign Language TESOL Quarterly, 47-60
5, pp.
Testing: Specialised Bibliography, Centre for Valette, R.M. (1977) Modern Language Testing
Information on Language Teaching and Research (2nd ed.), New York: Harcourt Bruce Jovanovich
Rea, Pauline M. (1978) 'Assessing language as Valette, R.M. & Disick, R.S. (1972) Modern Lan-
communication', MALS Journal, New series, guage Performance Objectives and Individual-
No. 3, University of Birmingham, Department ization, New York: Harcourt Bruce Jovanovich
of English

113
r
New JMB Practice Books
Patterns of Fact Text to Note
Practice in reading and writing English for Study skills for advanced learners
academic purposes Alex Adkins and Ian McKean
Judith Kennedy and Susan Hunston * Reading exercises teach students to read for
* Reading and writing tasks are grouped
texts gist and for specific information
* Listening exercises teach students to extract
in functional areas and progress in difficulty
* Grammar is consolidated in cloze exercises salient points from lectures
* Vocabulary expanded in labelling exercises
is
* Note-making from written and spoken texts
* Two JMB practice papers are included is taught and practised

£2.95 approx 96 pages approx £2.95 approx 128 pages approx

Publication August Cassette £7 non-net approx
Publication September

/^\ Edward Arnold ELT

VB/ 41 Bedford Square, London WC1B 3DQ

Language testing? Consult the Longman experts.

A practical guide for the teacher in the classroom:

Writing English Language Tests
J B Heaton
(Longman Handbooks lor Language Teachers series)

A practical handbook which describes the principles of language testing,

examines seven groups of tests in detail, and then shows the teacher how to
construct his own tests and choose those most suitable for his purposes. The
last part of the book contains a programme of practical work, to give the teacher
thorough practice in the processes of constructing a range of useful test items.

And a handbook, a textbook and a reference book - in one

volume:
Language Tests at School
W Oiler Jr
John
(
Longman Applied Linguistics and Language Study series)

Or Oiler discusses the importance, to all educational tasks, of language testing.

He advocates a pragmatic approach, emphasises the differences between
pragmatic and discrete point tests, and gives practical suggestions, with
examples, on testing procedures Discussion questions, ideas for further
reading, recent research findings and an extensive reference section make
Language Tests at School the complete volume on language testing.

Longman English Teaching Services, Longman Group Limited, Burnt Mill,

Harlow, Essex CM20 2JE, England

Longman «s
SOURCE BOOKS FOR TEACHERS
Discover English
ROD BOLITHO and BRIAN TOMLINSON
This invaluable book helps to sensitize teachers to the English Language, making them more
acutely aware of the difficulties experienced by foreign learners.

Towards the Creative Teaching of English

LYDIA LANGENHEIM, MAGGIE MELVILLE,
MARIO RINVOLUCRI and LOU SPAVENTA (Editor)
A collection of tested language learning activities in the areas of drama, mime, role play,

problem solving, groupvvork and music, designed to stimulate language communication.

Source Book for Teaching English Overseas

MICHAEL LEWIS and JIMMIE HILL
A useful guide for the inexperienced language teacher who needs practical advice and ideas for
lesson plans.

Teaching Practice Handbook

ROGER GOWER, STEVE WALTERS, PETER MAINGAY
Practical Approaches to English as a Second Language
Edited bv HEATHER RICHARDSON
For further information, please write to:

Heinemann Educational Books

ELT WC1B 3HH
Promotion, 22 Bedford Square, London

Cambridge UT
Practice Test s for the
Cambridge Examinations in EFL
2 new sets of practice tests containing authentic papers from the June
1979 to June 1981 First Certificate and Proficiency examinations.
* All the papers are reproduced exactly as they originally appeared.
* Teacher's Books provide answer keys, marking schemes and instructions on how
to assess the candidates' performance.
* A new feature of the Teacher's Books is the inclusion of sample essays written by
students the examination. This is intended to serve as a guide to teachers who
in
find it difficult to assess students in non-objective Paper 2.
FCE Practice Tests 2 Student's Book £1 .75 CPE Practice Tests 2 Student's Book £2.25
Teacher's Book £2.25 Teacher's Book £2.75
Also published:
FCE Practice Tests and CPE Practice Tests (sets of papers from the June 1976 to June 1978
examinations).
Available from all leading ELT booksellers or. in case of difficulty, direct from Cambridge University Press.

CAMBRIDGE UNIVERSITY PRESS

I The Cambridge CB2
Edinburgh Building, Shaftesbury Road, 2RU, England
NELSON - The First Name
in Language Testing
PRACTICE TESTS FOR CAMBRIDGE PRACTICE FOR THE JMB TEST IN
PRELIMINARY ENGLISH ENGLISH (OVERSEAS)
M Archer and E Nolan-Woods Patricia L McEldowney
Students' Book
Teacher's Book Students' Book
Cassette Teacher's Book
PRACTICE TESTS FOR CAMBRIDGE Cassette

FIRST CERTIFICATE ENGLISH TEST YOUR ENGLISH

AA Archer and E Nolan-Woods
Students' Book 1
W S Fowler and Norman Coe
Teacher's Book 1 Students' Book 1
Set of 2 Cassettes Students' Book 2
Students' Book 2 Students' Book 3
Teacher's Book 2 Teacher's Guide
Cassette
Students' Book 3 PRACTICE TESTS FOR RSA
Teacher's Book 3
Cassette Stephen Edmonds, Richard Frizell

Students' Book 4 and Don Kindler

Teacher's Book 4
Set of 2 Cassettes Stage 1 Students' Book
Stage Teacher's Book
PRACTICE TESTS FOR PROFICIENCY btage
1

Cassette
M Archer and E Nolan-Woods Stage
1

2 Students' Book
Series 1 Students' Book Stage 2 Teacher's Book
Series1 Teacher's Book Stage 2 Cassette
Set of 2 Cassettes Stage 3 Students' Book
Series 2 Students' Book Stage 3 Teacher's Book
Series 2 Teacher's Book Stage 3 Cassette
NELSON ENGLISH LANGUAGE PRACTICE TESTS FOR MICHIGAN
TESTS
W S Fowler and Norman Coe CERTIFICATE ENGLISH
Students' Book 1 George P McCallum
Students' Book 2
Students' Book 3 Book
Teacher's Book Cassette
ENGLISH TESTS FOR DOCTORS PRACTICE TESTS FOR TOEIC
Dick Alderson and Vivienne Ward
George W Pifer
Students' Book
Teacher's Book Students' Book
3 Cassettes 3 C45 Cassettes

If you would like to know more abou t NELSON ENGLISH LANGUAGE

TESTING MATERIALS, please write tc
ELT Promotions, THOMAS NELSON & SONS LTD, Nelson House, Walton-on-
Thames Surrey KT12 5PL, England.
An Essential Text . .

The Pitman Guide to

English Language
"•""Sic
Examinations
for Overseas Candidates
Susan Davies and Richard West
"Already essential for every EFL/ESL administrator,
thisbook must be hailed by everyone for whom it was
intended.How it was never written before no-one
who's seen it can understand, and everyone in ELF/ESL
must be kicking themselves for not being the ones to
produce such a simple, obvious and essential book." -
Arels Journal

Can you afford to be without a copy?

Order from your bookseller, or from Cashpost Service, Book Centre, Southport, PR9 9YF
quoting the ISBN number 273 01592 3, and making your cheque/p.o. payable to Pitman
Books Ltd. The price is £3.50, post and packing are free.

fiBfifiSfiSB Pitman Books.

Cambridge Examinations
The principal world-wide course target and definition of
standard. Taken by over 80,000 candidates yearly in over
60 countries.

Full details of centres, syllabus, dates etc for

and Proficiency (also Diploma and
First Certificate

Preliminary Test) from the address below or direct from

local centres.

The Secretary, Examinations in English,

University of Cambridge Local Examinations Syndicate,

17 Harvey Road, Cambridge CB1 2EU, England
performance in different parts of the tests, and
Pergamon Institute of the results are often not given until long after the
examination has taken place.
English (Oxford) The Pergamon English Certificate
examination, which has already been used with
Specialist Consultancy Services special groups of learners from the Middle East,
offers far more flexibility.
• The tests can be held at the time required.
These services consist on request by
of visits
• The tests are tailor-made to the candidates'
testing consultants who will
advise on all
needs.
aspects of language testing problems, including
• Staff are involved in all stages of design,
the preparation of personal performance
application and marking.
profiles, target level profiles showing the
• The results of the tests are given as profiles
English level needed for particular jobs, and
and show candidates' levels in each part.
language tuition outlines which give guidance
• All stages are closely supervised by
on the duration and kind of study needed to
Pergamon consultants, who also ratify the
reach target level.
certificate.
In addition, the Pergamon English Certificate • Results are available within a few days of
examination is available to meet the needs test completion.
which other examinations cannot usually
Details of these services (including cost) are
satisfy.
available on request.
In Britain and the United States, there are
several widely-used examinations in English for
non-native speakers. Generally these exams are
held on fixed dates and their content is rarely
related to specific job needs. Furthermore, little Pergamon Institute of English,
is divulged about the level of the testees' Headington Hill Hall, Oxford, England
J
Oxford Dave Allan
GET OFF TO A GOOD START WITH THE
Placement OXFORD PLACEMENT TEST!

Tests
Highly researched, quick and easy to administer,
consistently reliable in its results, this test will
place, any number of students in order of rank
from 'false beginners' to post-Proficiency on the
first day of a course or term.

The components of the test are: * Test pack containing 50 copies of the test;
* Marker's Kit containing plastic masks as an
aid to quick marking, an introduction to the
placement test theory, and an administrative
guide;
* Cassette of the listening and reading test.

For further Information please write to English Language Teaching Department, Oxford University Press, Walton
Street, Oxford OX2 6DP

Oxford University Press

INTERNATIONAL LANGUAGE CENTRES
is a dynamic worldwide organisation whose aim is to teach you effective communication
in English and other languages.

Client Specific Programmes — including testing —

English language and technical training can be arranged for you anywhere, anytime, any
umber of students, any duration.

Testing Services
Specialist English language testing in all our centres to meet your specific needs

In Japan — BETA — Businessmen's English Test and Appraisal — has already been
administered to 28,000 businessmen. It is designed to tell clients when their company
staff can operate successfully in English abroad or in an English speaking environment

If you have any practical testing queries, consult LC. Whether you want a piece of
I

advice or a tailor-made testing programme, LC can help you to develop a better system
I

of evaluation. Jm

ILC (France) Ltd ILC (London) Ltd ILC (East Asia) L

Group HQ -Guif — Japan

86 Maryleborte High Street 86 Marylebone High Street 20 Passage Dauphine P.O. Box 7647 Iwanami Jimbo-cho Btdg
London W1M3DE London W1M3DE Paris 75006 Feheheel 2-1 Kanda Jimbo-cho
Tel: (01) 486 1760/1770 Tel: (01) 486 1222
FRANCE KUWAIT Cbiyoda-ku

Telex: 299196 LCSHQG

I Telex: 27636 1NTLAN G 325 4137
Tel: Paris
Arabian Gulf JAPAN
Telex: 202349 F (LCFRA Tel: Kuwait 981 430 Tel: Tokyo 264 7464/S
Telex: 44270 IMCO KT Telex: 25556 J ILCTOKY
Centre for Information on

CILT Language Teaching and Research

20 Carlton House Terrace, London SW1Y 5AP

THE REVISED (JULY 1981) EDITION OF

foreign language testing

CONTAINS

A SUMMARY OF EVERY MAJOR ARTICLE ON TESTING

WHICH HAS APPEARED IN JOURNALS OVER THE
U\ST FIFTEEN YEARS

Available from CILT @ £3.95 + 40p postage and packing

r
Centre for
Applied Language Studies
University of Reading, Engbnd

Teacher Training Courses

The Centre offers 3-week teacher training courses every winter and summer, together with special
teacher training courses on request throughout the year, both in the UK and overseas. The cost of
such courses includes tuition, accommodation (either in a university hall of residence or with a
family) and a full excursion programme.

Language Improvement Courses

The Centre offers courses in language improvement to special groups on request throughout the
year, and study skills and language improvement courses every summer to overseas post-graduate
students entering British universities.

Consultancy Services
The Centre undertakes consultancy and advisory services in such fields as syllabus and course
design, testing and materials production.

Information
For information on Centre courses and services, unite to:
The Administrative Assistant,
Centre for Applied Language Studies, Language Resource Centre,

^ University of Reading, Whiteknights, Reading RG6 2AP, England.

UNIVERSITY OF EDINBURGH
Institute for Applied Language Studies

Language Testing
Courses
• Courses in Language Testing at all levels
Short summer testing courses
Initial testing training courses
Masters degree programmes
PhD programme
Production
• Test writing, validation and analysis service

Consultancy
• Testing Consultancy Service from the country's leading experts in testing

Technical Support
• Full University Computing and Statistical Back-up

for details apply:

Institute for Applied Language Studies

Department T/S
21 Hill Place
Edinburgh EH1 1LS
Scotland Tel: 031 667-1011 Telex: 727442
MODERN ENGLISH
PUBLICATIONS LTD

LANGUAGE TESTING
A collection of articles written by testing
specialists, teachers and teacher trainers involved
in the testing of English in a number of different
countries.

The articles range from those concerned with

language testing in general, including recent
developments in communicative testing, to more
specific ones on the testing of listening, speaking,
reading and writing. Integrative tests such as
dictation and cloze procedure are discussed, as is
the testing of grammar, the place of visuals in
language tests and the assessment of performance
in academic subjects examined in English.

Evaluation of language programmes and the

measurement of student achievement on ESP
courses, as well as research notes and a select
bibliography complete the collection.

Various theories of testing are discussed, and the

emphasis is on those practical aspects which are
likely to be of most use to the practising teacher.

This collection, edited by JB Heaton, is the fifth

in the series of Special Issues of Modern English
Teacher magazine. Other titles in the series are
English for Specific Purposes, Visual Aids for
Classroom Interaction, Teacher Training, Teaching
Children (edited by Susan Holden) and
fndividualisation (edited by Marion Geddes and
Gill Sturtridge).

906149 29