Language Testing
Language Testing
A test is a sample of an individuals behaviour/performance on the basis of which inferences are made about the more general underlying competence of that individual. Language tests refer to any kind of measurement/examination technique which aims at describing the test takers foreign language proficiency, e.g. oral interview, listening comprehension task, free composition writing.
Diagnostic tests help identify learners' strengths and weaknesses in L2. Their main aim is to help teachers decide what needs to be taught to students.
Placement tests
With the help of placements tests students can be placed in the learning group that is appropriate for their level of competence.
Direct versus indirect testing Direct tests: candidates are required to perform the skill the test intends to measure. Indirect tests want to measure skills that underlie performance in a particular task.
40
30
20
Frequency
10
TOTAL
Independent user B2Can understand the main ideas of complex text on both concrete and abstract topics, including technical discussions in his/her field of specialisation. Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without strain for either party. Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages and disadvantages of various options. B1 Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc. Can deal with most situations likely to arise whilst travelling in an area where the language is spoken. Can produce simple connected text on topics which are familiar or of personal interest. Can describe experiences and events, dreams, hopes and ambitions and briefly give reasons and explanations for opinions and plans.
Basic User
A2 Can understand sentences and frequently used expressions related to areas of most immediate relevance (e.g. very basic personal and family information, shopping, local geography, employment). Can communicate in simple and routine tasks requiring a simple and direct exchange of information on familiar and routine matters. Can describe in simple terms aspects of his/her background, immediate environment and matters in areas of immediate need. A1 Can understand and use familiar everyday expressions and very basic phrases aimed at the satisfaction of needs of a concrete type. Can introduce him/herself and others and can ask and answer questions about personal details such as where he/she lives, people he/she knows and things he/she has. Can interact in a simple way provided the other person talks slowly and clearly and is prepared to help.
Reliability
Consider the tasks in the 199 exam: 1. C-test 2. Gap-fill task 3. Summary writing Decide whether these tasks are direct or indirect, subjective or objective, integrative or discrete point tasks.
Reliability is the extent to which a test is free of random measurement error and produces consistent results when administered under similar conditions. This means that a reliable test is not affected by circumstances outside the test (e.g. the people who administer and mark the test, the time and place of the test) Types of reliability: internal consistency: whether the test items are related to each other and measure the same ability parallel or alternate form reliability: how well parallel or alternate forms of the same test measure the same ability
Validity
test-retest reliability: whether test-takers perform similarly each time they complete the test intra-rater reliability: whether the same rater assesses the test-takers' performance in the same way each time he/she evaluates the test inter-rater reliability: whether two raters assess the test-takers' performance in the same way
Validity is the extent to which a test measures what it is supposed to measure and nothing else. content validity: whether the test measures the ability it intends to measure; concurrent validity: whether the test takers' performance in a test correlates with their results in a different type of test; predictive validity: whether the test results accurately predict future performance; construct validity: whether the test appropriately represents the theory of language competence it is based on; face validity: whether the test looks as if it measures what it is supposed to measure.
C-test In the C-test the second half of every second word is left out. C-tests can provide a rough measure of learners' global level of proficiency. Dictation The basis of the procedure is that each individual dictated chunk is long enough (10-25 words) to exceed the learners short-term memory, and so the forgotten items have to be filled in from the context and the learners knowledge of the language. Editing The editing test is the is reverse of the cloze test. For example: extra words extra are inserted put placed gone into to a text, and testees are is required to crossing cross these out.
Matching Candidates are given a list of possible answers which they have to match with another list of words. For example: Match the words on the left with those on the right to make other English words. 1 head A partner 2 room B wife 3 business C master 4 house D mate Ordering In ordering tasks, candidates have to put a group of words, sentences or paragraphs in order. For example: Put the following words in order to complete the sentence: went yesterday I cinema friend to with.
20
10
Std. Dev = 3,25 Mean = 8,3 0 0,0 2,0 4,0 6,0 8,0 10,0 12,0 14,0 N = 61,00
Score
The distribution curve should be bellshaped. Facility values should be between 0.3 and 0.7 (or in more lenient approaches to test design 0.2-0.8). Discrimination indices should be above 0.4 (or in more lenient approaches to test design above 0.3).
Washback
Washback is the effect tests have on teaching and learning. A beneficial washback effect can be if a so far neglected skill (e.g. listening) is put into the focus of teaching as a result of the introduction of a test where scores in this skill are important in determining the candidates' grades. A negative washback effect can be if most of the time in lessons in secondary schools is spent on practising multiple choice tests. Tests have effect on those who take the test, the teachers who prepare the students for the tests, the teaching materials (e.g. course-books), the society and the educational system.
1. Explain the difference between proficiency and achievement tests; b) diagnostic and placement tests; c) direct and indirect tests; d) subjective and objective tests; e) norm-referenced and criterion referenced tests; f) integrative and discrete point tests. 2. What is reliability? List the various types of reliability. 3. What is validity? List the various types of validity. 4. What are the most frequently used objective test tasks? 5. What are the most frequent statistical measures of test performance? 6. What effects can tests have on teaching and learning?