Versant English Test Description Validation Report

Uploaded by

deepika Anbu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

100% found this document useful (1 vote)

296 views34 pages

Versant English Test Description Validation Report

Uploaded by

deepika Anbu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 34

(()) VERSANT Versant”™ English Test Test Description and Validation Summary(()) VERSANT Table of Contents 1. Introduction. 2. Test Descriptio DI Test DOSigh nnn 2.2 Test Administration _ 2.2.1 Mobile App Administration 2.2.2 Computer Administration 23 Test Format.. Speech Sample Part A: Reading wniunnnnn Part B: Repeat Part C: Short Answer Questions. Part D: Sentence Builds... Part E: Story Retelling.. Part F: Open Questions... rite o 2A Number Of IMS. ccncrsnnnnnnnnntniinnn 25 Test Construct 3, Content Design and Development. 3.1 Vocabulary Selection men 3 esi Delo tne sanssonsanonascnsansesnsinnnseon 3.3 Item Prompt Recording.. 3.3.1 Voice Distribution 3.3.2 Recording Review... 4, Score Reporting... 4.1 Scores and Weights. 42 Score Use... 3 SRP I RCE xcs asec eae eneaetareaamemmaas tae ceanmnarsmaserenass 5, Validation. 5.1 Validity Study Design. — eae 5.1.1 Validation Sample. 15 5.2 Internal Validity 5.2.1 Standard Error of Measurement. 16 5.2.2 Reliability se sn : : : 16 5.2.4 Correlations between the Versant English Test and Human Scores 20 5.3 Relationship to Known Populations: Native and Non-native Group Performan cccvrnnnnnnnnnen DA 5A Relationship to Scores of Tests with Related Construct nnn 6. Conclusion 7. About the Company @ Pearson 1 eect:(()) VERSANT 1. Introduction The Versant™ English Test, powered by Versant technology, is an assessment instrument designed to measure how well a person understands and speaks English. The Versant English Test is intended for adults and students over the age of 15 and takes approximately 15 minutes to complete. Because the Versant English Test is delivered automatically by the Versant testing system, the test can be taken at any time, from any location using the phone app or a computer. A human examiner is not required. The computerized scoring allows for immediate, objective, and reliable results that correspond well with traditional measures of spoken English performance, ‘The Versant English Test measures facility with spoken English, which is a key element in English oral proficiency. Facility in spoken English is how well the person can understand spoken English on everyday topics and respond appropriately at a native-like conversational pace in English. Academic institutions, corporations, and government agencies throughout the world use the Versant English Test to evaluate the ability of students, staff, and officers to understand spoken English and to express themselves clearly and appropriately in English. Scores from the Versant English Test provide reliable information that can be applied to placement, qualification and certification decisions, as well as monitor progress and measure instructional outcomes. 2. Test Description 2.1 Test Design ‘The Versant English Test may be taken at any time from any location using a mobile phone app or computer. During test administration, the Versant testing system presents a series of recorded spoken prompts in English at a conversational pace and elicits oral responses in English. The voices of the item prompts are from native speakers of English from several different regions in the US, providing a range of speaking styles. ‘The Versant English Test has six item types: Reading, Repeats, Short Answer Questions, Sentence Builds, Story Retelling, and Open Questions. All item types except for Open Questions elicit responses that can be analyzed automatically. These item types provide multiple, fully independent measures that underlie facility with spoken English, including phonological fluency, sentence construction and comprehension, passive and active vocabulary use, listening skill, and pronunciation of rhythmic and segmental units. Because more than one item type contributes to each subscore, the use of multiple item types strengthens score reliability. The Versant testing system analyzes the candidate's responses and posts scores to a secure website usually within minutes of the completed test. Test administrators and score users can view and print out test results from a password-protected website. The Versant English Test provides numeric scores and performance levels that describe the candidate's facility in spoken English - that is, the ability to understand spoken English on everyday topics and to @ Pearson 2 rant Hints ragceattiaterramecere seme(()) VERSANT respond appropriately at a native-like conversational pace in intelligible English. The Versant English Test score report is comprised of an Overall score and four diagnostic subscores: Sentence Mastery, Vocabulary, Fluency, and Pronunciation. Together, these scores describe the candidate's facility in spoken English. 2.2 Test Administration Administration of the Versant English Test generally takes 15 to 17 minutes via a mobile app or computer. The delivery of the recorded test questions is interactive - the system detects when the candidate has finished responding to one item and then presents the next item. 2.2.1 Mobile App Administration Test administration on a mobile phone requires Pearson’s Versant Test app and an Internet connection. The testing app can be downloaded at no cost from the iOS App Store or Google Play store. The candidate canuse a headset or earbuds with microphone or speakerphone. The testing app prompts the candidate to enter the Test Identification Number they have received from their test administrator. This identification number is unique for each candidate and keeps the candidate's information secure. A single examiner voice presents all the spoken instructions for the test. The spoken instructions for each section are also displayed verbatim on the device screen to help ensure that candidates understand the directions. Candidates interact with the test system in English, going through all six parts of the test until they complete the test and close the testing app. 2.2.2 Computer Administration For computer administration, the candidate may take the test either via the online testing site or computer software. The computer used must have an Internet connection and, if the software option is used, Pearson's Computer Delivered Test (CDT) software installed. The candidate is fitted with a microphone headset. The system allows the candidate to adjust the volume and calibrate the microphone before the test begins. The instructions for each section are spoken by an examiner voice and are also displayed on the computer screen. Candidates interact with the test system in English, speaking their responses into the microphone. When a test is finished, the candidate clicks a button labeled, “End Test’. 2.3 Test Format The following subsections provide brief descriptions of the item types and the abilities required to respond to the items in each of the six parts of the Versant English Test. Speech Sample In this task, candidates listen to a spoken question that asks them to describe something or give their opinion on a topic. Candidates have up to 30 seconds to respond to the question. @ Pearson 3 rant Hints ragceattiaterramecere seme(()) VERSANT Examples: Do you prefer speaking with someone by a voice call or a video call? Explain why. Do you think i's important to learn English? Why or why not? This task is used to collect a longer spontaneous speech sample. Candidates’ responses to items in this section are not scored but are available for review by authorized listeners. These questions are not considered test items. Part A: Reading In this task, the candidate reads numbered sentences, one at a time, as prompted. The sentences are displayed on the mobile phone or computer screen. Reading items are grouped into sets of four sequentially coherent sentences, as in the examples below. Examples: 41. Lanty’s next door neighbors are awful. 2. They play loud music all night when he's trying to sleep. 3. Ifhe tells them to stop, they just tur it up louder. 4, He wants to move out of that neighborhood. Presenting the sentences as part of a group helps the candidate disambiguate words in context and helps suggest how each individual sentence should be read aloud. The device screen contains two groups of four sentences (Le,, 8 items). Candidates are prompted to read the eight sentences one at a time, starting with number 1 and ending with number 8. The system tells the candidate which of the numbered sentences to read aloud (e.g, "Now, please read sentence 7.”). After the candidate has read the sentence (or has remained silent for a period of time), the system prompts him or her to read the next sentence from the list. The sentences are relatively simple in structure and vocabulary, so they can be read easily and in a fluent, manner by literate speakers of English. For candidates with litle facility in spoken English but with some reading skills, this task provides samples of their pronunciation and reading fluency. The readings appear first in the test because, for many candidates, reading aloud presents a familiar task and is a comfortable introduction to the interactive mode of the test as a whole. Part B: Repeat In this task, candidates are asked to repeat sentences that they hear verbatim. The sentences are presented to the candidate in approximate order of increasing difficulty. Sentences range in length from three words to 15 words. The audio item prompts are spoken in a conversational manner. @ Pearson 4 rant Hints ragceattiaterramecere seme(()) VERSANT Examples: Get some water. Let's meet again in two weeks. Come to my office after class if you need help. To repeat a sentence longer than about seven syllables, a person must recognize the words as spoken in a continuous stream of speech (Miller & Isard, 1963). Highly proficient speakers of English can generally repeat sentences that contain many more than seven syllables because these speakers are very familiar with English words, phrase structures, and other common syntactic forms. If a person habitually processes five-word phrases as a unit (eg., “the really big apple tree”), then that person can usually repeat utterances of 15 or 20 words in length. Generally, the ability to repeat material is constrained by the size of the linguistic unit that a person can process in an automatic or nearly automatic fashion. As the sentences increase in length and complexity, the task becomes increasingly difficult for speakers who are not familiar with English sentence structure. Because the Repeat items require candidates to organize speech into linguistic units, Repeat items assess the candidate's mastery of phrase and sentence structure. Given that the task requires the candidate to repeat full sentences (as opposed to just words and phrases), it also offers a sample of the candidate's fluency and pronunciation in continuous spoken English. Part C: Short Answer Questions In this task, candidates listen to spoken questions and answer each question with a single word or short phrase. The questions generally present at least three or four lexical items spoken in a continuous phonological form and framed in English sentence structure. Each question asks for basic information or requires simple inferences based on time, sequence, number, lexical content, or logic. The questions do not presume any knowledge of specific facts of culture, geography, history, or other subject matter; they are intended to be within the realm of familiarity of both a typical 12-year-old native speaker of English and an adult who has never lived in an English-speaking country. Examples: What is frozen water called? How many months are in a year and a hall? Does a tree usually have more trunks or branches? To correctly respond to the questions, a candidate must identify the words in phonological and syntactic context, and then infer the demand proposition. Short Answer Questions measure receptive and productive vocabulary within the context of spoken questions presented in a conversational style. @ Pearson 5 rant Hints ragceattiaterramecere seme(()) VERSANT Part D; Sentence Builds For the Sentence Builds task, candidates hear three short phrases and are asked to rearrange them to make a sentence, The phrases are presented in a random order (excluding the original word order), and the candidate says a reasonable and grammatical sentence that comprises exactly the three given phrases. Examples: in / bed / stay she didn't notice / the book / who took we wondered / would fit in here / whether the new piano To correctly complete this task, a candidate must understand the possible meanings of the phrases and know how they might combine with other phrasal material, both with regard to syntax and pragmatics. The length and complexity of the sentence that can be built is constrained by the size of the linguistic unit (e.g,, one word versus a three-word phrase) that a person can hold in verbal working memory. This is important to measure because it reflects the candidate's ability to access and retrieve lexical items and to build phrases and clause structures automatically. The more automatic these processes are, the more the candidate's facility in spoken English. This skill is demonstrably distinct from memory span (see Section 2.5, Test Construct, below). The Sentence Builds task involves constructing and articulating entire sentences. As such, it is a measure of candidates’ mastery of sentences in addition to their pronunciation and fluency. Part tory Retelling In this task, candidates listen to a brief story and are then asked to describe what happened in their own words. Candidates have thirty seconds to respond to each story. Candidates are encouraged to tell as much of the story as they can, including the situation, characters, actions and ending. The stories consist of three to six sentences and contain from 30 to 90 words. The situation involves a character (or characters), setting, and goal. The body of the story describes an action by the agent of the story followed by a possible reaction or implicit sequence of events. The ending typically introduces a new situation, actor, patient, thought, or emotion. Example: ‘Three girls were walking along the edge of a stream when they saw a small bird with its feet buried in the mud. One of the girls approached it, but the small bird flew away. The girl ‘ended up with her own feet covered with mud. The Story Retelling items assess a candidate's ability to listen and understand a passage, reformulate the passage using his or her own vocabulary and grammar, and then retell it in detail. This section elicits longer, more open-ended speech samples than earlier sections in the test and allows for the assessment of a wider range of spoken abilities. Performance on Story Retelling provides a measure of fluency, pronunciation, vocabulary, and sentence mastery. @ Pearson 6 rant Hints ragceattiaterramecere seme(()) VERSANT Part F: Open Questions In this task, candidates listen to spoken questions that elicit an opinion, and are asked to provide an answer with an explanation. Candidates have 40 seconds to respond to each question. The questions relate to day-to-day issues or the candidate's preferences and choices. Examples: Do you think television has had a positive or negative effect on family lite? Please explain. Do you like playing more in individual or in team sports? Please explain. This task is used to collect longer spontaneous speech samples. 2.4 Number of Items In the administration of the Versant English Test, the testing system presents approximately 63 items in six separate sections to each candidate. The items are drawn at random from a large item pool. This means that most or all items are different from one test administration to the next. Proprietary algorithms are used by the testing system to select from the item pool - the algorithms take into consideration, among other things, an item's difficulty level and similarity to other presented items. Table 1 shows the approximate number of items presented in each section. The exact number of items in each test may change from time to time as new, unscored items are added to and removed from the test. The responses to the unscored items do not impact the candidates’ scores nor do they impact the test experience. The responses are used to build scoring models for new items, which allows Pearson to add new content to the test in order to keep the item bank secure and up-to-date. Table 1. Approximate Number of Items Presented per Section ocrd ‘A. Reading B. Repeat C. Short Answer Questions D. Sentence Builds E, Story Retelling F. Open Questions Total 2.5 Test Construct For any language test, itis essential to define the test construct as explicitly as possible (Bachman, 1990; Bachman & Palmer, 1996). The Versant English Test is designed to measure a candidate's facility in @ Pearson 7 SS eames(()) VERSANT spoken English - that is, the ability to understand spoken English on everyday topics and to respond appropriately at a native-like conversational pace in intelligible English. Another way to describe the construct facility in spoken English is “the ease and immediacy in understanding and producing appropriate conversational English” (Levelt, 1989). This definition relates to what occurs during the course of a spoken conversation. While keeping up with the conversational pace, a person has to track what is being said, extract meaning as speech continues, and then formulate and produce a relevant and intelligible response. These component processes of listening and speaking are schematized in Figure 1. ee Aeon bet tte Build clause structure Listen Decode propositions —— —- Construct rases Infer demand (fany) ssa Decide on response ‘Adopted from Levelt, 1989 Figure 1. Conversational processing components in listening and speaking. During a test, the testing system presents a series of discrete prompts to the candidate at a conversational pace as recorded by several different native speakers who represent a range of native accents and speaking styles. These integrated “listen-then-speak” items require real-time receptive and productive processing of spoken language forms. The items are designed to be relatively independent of social nuance and higher cognitive functions. The same facility in spoken English that enables a person to participate in everyday native-paced English conversation also enables that person to satisfactorily understand and respond to the listening/speaking tasks in the Versant English Test. ‘The Versant English Test measures the candidate's control of core language processing components, such as lexical access and syntactic encoding. For example, in normal everyday conversation, native speakers go from building a clause structure to phonetic encoding (the last two stages in the right-hand column of Figure 1) in about 40 milliseconds (Van Turennout, Hagoort, & Brown, 1998). Similarly, the other stages shown in Figure 1 must be performed within the short period of time available to a speaker during a conversational turn in everyday communication. The typical time window in turn taking is about 500-1000 milliseconds (Bull & Aylett, 1998). If language users involved in communication cannot successfully perform the complete series of mental activities presented in Figure 1 in real-time, both as listeners and as speakers, they will not be able to participate actively in conversations and other types of communication. @ Pearson 8 rant Hints ragceattiaterramecere seme(()) VERSANT Automaticity in language processing is required in order for the speakerflistener to be able to pay attention to what needs to be said/understood rather than to how the encoded message is to be structured/analyzed. Automaticity in language processing is the ability to access and retrieve lexical items, to build phrases and clause structures, and to articulate responses without conscious attention to the linguistic code (Cutler, 2003; Jescheniak, Hahne, & Schriefers, 2003; Levelt, 2001). Some measures of automaticity in the Versant English Test may be misconstrued as memory tests. Because some tasks involve repeating long sentences or holding phrases in memory in order to piece them together into reasonable sentences, it may seem that these tasks are measuring memory capacity rather than language ability. However, psycholinguistic research has shown that verbal working memory for such things as remembering a string of digits is distinct from the cognitive resources used to process and comprehend sentences (Caplan & Waters, 1999). ‘The fact that syntactic processing resources are generally separate from short-term memory stores is also evident in the empirical results of the Versant English Test validation experiments (see Section 5: Validation). Virtually all native English speakers achieve high scores on the Versant English Test, whereas non-native speakers obtain scores distributed across the scale. If memory, as such, were being measured as an important component of performance on the Versant English Test, then native speakers would show greater variation in scores as a function of their range of memory capacities. The Versant English test would not correlate as highly as it does with other accepted measures of oral proficiency, since it would be measuring something other than language ability. The Versant English Test probes the psycholinguistic elements of spoken language performance rather than the social, rhetorical, and cognitive elements of communication. The reason for this focus is to ensure that test performance relates most closely to the candidate's facility with the language itself and is not confounded with other factors. The goal is to separate familiarity with spoken language from other types of knowledge including cultural familiarity, understanding of social relations and behavior, and the candidate's own cognitive style. Also, by focusing on contextindependent material, less time is spent developing a background cognitive schema for the tasks, and more time is spent collecting data for language assessment (Downey et al., 2008). ‘The Versant English Test measures the real-time encoding and decoding of spoken English. Performance on Versant English Test items predicts a more general spoken language facility, which is essential in successful oral communication. The reason for the predictive relation between spoken language facility and oral communication skills is schematized in Figure 2. This figure puts Figure 1 into a larger context, as one might find in a social-situated dialog. @ Pearson 9 rant Hints ragceattiaterramecere seme(()) VERSANT RealTime Figure 2. Message decoding and message encoding as a real-time chain-process in oral interaction. ‘The language structures that are largely shared among the members of a speech community are used to encode and decode various threads of meaning that are communicated in spoken turns. These threads of meaning that are encoded and decoded include declarative information, as well as social information and discourse markers. World knowledge and knowledge of social relations and behavior are also used in understanding and in formulating the content of the spoken turns. However, these social-cognitive elements of communication are not represented in this model and are not directly ‘measured in the Versant English Test. 3. Content Design and Development ‘The Versant English Test measures both listening and speaking skills, emphasizing the candidate's facility (ease, fluency, immediacy) in responding aloud to common, everyday spoken English. All Versant English Test items are designed to be region neutral. The content specification also requires that both native speakers and proficient non-native speakers find the items very easy to understand and to respond to appropriately. For English learners, the items cover a broad range of skill levels and skill profiles. Except for the Reading items, each Versant English Test item is independent of the other items and presents unpredictable spoken material in English. The test is designed to use context-independent material for three reasons. First, contextindependent items exercise and measure the most basic meanings of words, phrases, and clauses on which context-dependent meanings are based (Perry, 2001). Second, when language usage is relatively context-ndependent, task performance depends less on factors such as world knowledge and cognitive style and more on the candidate's facility with the language itself. Thus, the test performance on the Versant English Test relates most closely to language abilities and is not confounded with other candidate characteristics. Third, context-independent tasks maximize response density; thatis, within the time allotted, the candidate has more time to demonstrate performance in speaking the language. Less time is spent developing a background cognitive schema needed for successful task performance. Item types maximize reliability by providing multiple, fully @ Pearson 10 rant Hints ragceattiaterramecere seme(()) VERSANT independent measures. They elicit responses that can be analyzed automatically to produce measures that underlie facility with spoken English, including phonological fluency, sentence comprehension, vocabulary, and pronunciation of lexical and phrasal units. 3.1 Vocabulary Selection The vocabulary used in all test items and responses is restricted to forms of the 8,000 most frequently used words in the Switchboard Corpus (Godfrey & Holliman, 1997), a corpus of three million words spoken in spontaneous telephone conversations by over 500 speakers of both sexes from every major dialect of American English. in general, the language structures used in the test reflect those that are common in everyday English. This includes extensive use of pronominal expressions such as “she” or “their friend” and contracted forms such as “won't” and “"'m." 3.2 Item Development Versant English Test items were drafted by native English-speaking item developers from different regions in the U.S. In general, the language structures used in the test reflect those that are common in everyday conversational English. The items were designed to be independent of social nuance and complex cognitive functions. Lexical and stylistic patterns found in the Switchboard Corpus guided item development. Draft items were then reviewed internally by a team of test developers, all with advanced degrees in language-related fields, to ensure that they conformed to item specifications and English usage in different English-speaking regions and contained appropriate content. Then, draft items were sent to external linguists for expert review to ensure 1) compliance with the vocabulary specification, and 2) conformity with current colloquial English usage in different countries. Reviewers checked that items would be appropriate for candidates trained to standards other than American English. Allitems, including anticipated responses for short-answer questions, were checked for compliance with the vocabulary specification. Most vocabulary items that were not present in the lexicon were changed to other lexical stems that were in the consolidated word list. Some off-list words were kept and added to a supplementary vocabulary list, as deemed necessary and appropriate. Changes proposed by the different reviewers were then reconciled and the original items were edited accordingly. For an item to be retained in the test, it had to be understood and responded to appropriately by at least 90% of a reference sample of educated native speakers of English. 3.3 Item Prompt Recording 3.3.1 Voice Distribution ‘Twenty-six native speakers (13 men and 13 women) representing various speaking styles and regions were selected for recording the spoken prompt materials. The 26 speakers recorded items across different tasks fairly evenly. @ Pearson " rant Hints ragceattiaterramecere seme(()) VERSANT Recordings were made in a professional recording studio in Menlo Park, California. In addition to the itern prompt recordings, all the test instructions were recorded by a professional voice talent whose voice is distinct from the item voices. 3.3.2 Recording Review Multiple independent reviews were performed on all the recordings for quality, clarity, and conformity to natural conversational styles. Any recording in which reviewers noted some type of error was either re-recorded or excluded from insertion in the operational test. 4. Score Reporting 4.1 Scores and Weights The Versant English Test score report is comprised of an Overall score and four diagnostic subscores (Sentence Mastery, Vocabulary, Fluency’ and Pronunciation). Scores are reported in the range from 10 to 90 on Pearson’s Global Scale of English (GSE). The corresponding Common European Framework of Reference for Languages (CEFR) level is also displayed. Overall: The Overall score of the test represents the ability to understand spoken English and speak it intelligibly at a native-like conversational pace on everyday topics. Scores are based on a weighted combination of the four diagnostic subscores. Sentence Mastery: Sentence Mastery reflects the ability to understand, recall, and produce English phrases and clauses in complete sentences. Performance depends on accurate syntactic processing and appropriate usage of words, phrases, and clauses in meaningful sentence structures. Vocabulary: Vocabulary reflects the ability to understand common everyday words spoken in sentence context and to produce such words as needed. Performance depends on familiarity with the form and meaning of everyday words and their use in connected speech. Fluency: Fluency is measured from the rhythm, phrasing and timing evident in constructing, reading and repeating sentences. Pronunciation: Pronunciation reflects the ability to produce consonants, vowels, and stress in a native-like manner in sentence context. Performance depends on knowledge of the phonological structure of everyday words as they occur in phrasal context. * Within the context of language acquisition the term fuency" is sometimes used in the broader sense of general language mastery. Inthe narrower senso used in the Versant English Test score reporting, “huency” is taken as a component of oral proficiency that describes certain ‘haracterstes ofthe ooservable performance. Following this usage, Lennon (1990) Kentied fluency as “an impression onthe listers pat that ‘the psycholinguistic processes of speech planning and speech production are functioning easily and efcenty"(p. 391). In Lennon's view, surface fluenoy isan indication ofa fuent process of encoding. The Versant English Test fluency subscore is based on measurements of surface features such as the response latency, speaking rato, and continuity in speech flow, but as a constituent of the Overall score its also an indication ofthe fase ofthe underying encoding process. @ Pearson 12 rant Hints ragceattiaterramecere seme(()) VERSANT ‘Among the four subscores, two basic types of scores are distinguished: scores relating to the content of what a candidate says (Sentence Mastery and Vocabulary) and scores relating to the manner (quality) of the response production (Fluency and Pronunciation). This distinction corresponds roughly to Carroll's (1961) distinction between a knowledge aspect and a control aspect of language performance. In later publications, Carroll (1986) identified the control aspect as automatization, which suggests that people speaking fluently without realizing they are using their knowledge about a language have attained the level of automatic processing as described by Schneider & Shiffrin (1977). In all but the Open Questions section of the Versant English Test, each incoming response is recognized automatically by a speech recognizer that has been optimized for non-native speech. The words, pauses, syllables, phones, and even some subphonemic events are located in the recorded signal. The content of the responses to Reading, Repeats, SAQs, and Sentence Builds is scored according to the presence or absence of expected correct words in correct sequences. The content of responses to Story Retelling items is scored for vocabulary by scaling the weighted sum of the occurrence of a large set of expected words and word sequences that are recognized in the spoken response. Weights are assigned to the expected words and word sequences according to their semantic relation to the story prompt using a variation of latent semantic analysis (Landauer et al, 1998). Across all the items, content accuracy counts for 50% of the Overall score, and reflects whether or not the candidate understood the prompts and responded with appropriate content. ‘The manner-of-speaking scores (Fluency and Pronunciation, or the control dimension) are calculated by measuring the latency of the response, the rate of speaking, the position and length of pauses, the stress and segmental forms of the words, and the pronunciation of the segments in the words within their lexical and phrasal context. These measures are scaled according to the native and non-native distributions and then re-scaled and combined so that they optimally predict human judgments on manner-of-speaking. The manner-of-speaking scores count for the remaining 50% of the Overall score and reflect whether or not the candidate speaks in a native-like manner. In the Versant English Test scoring logic, content and manner (i.e. accuracy and control) are weighted equally because successful communication depends on both. Producing accurate lexical and structural content is important, but excessive attention to accuracy can lead to disfluent speech production and can also hinder oral communication; on the other hand, inappropriate word usage and misunderstood syntactic structures can also hinder communication. 4.2 Score Use Once a candidate has completed a test, the Versant testing system analyzes the spoken performances and posts the scores to the password-protected test administration platform, ScoreKeeper. Test administrators can choose to make scores available to test takers. If this option is selected, test takers may be able to see them on ScoreKeeper or using the score look up function on the Pearson website. Scores from the Versant English Test have been used by educational and government institutions as well as commercial and business organizations. Pearson endorses the use of Versant English Test scores for @ Pearson 3 SS eames(()) VERSANT making valid decisions about oral English interaction skills of individuals, provided score users have reliable evidence confirming the identity of the individuals at the time of test administration. Score users may obtain such evidence either by administering che Versant English Test themselves or by having trusted third parties administer the test. In several countries, education and commercial institutions provide such services. Versant English Test scores can be used to evaluate the level of spoken English skills of individuals entering into, progressing through, and exiting English language courses. Scores may also be used effectively in evaluating whether an individual's level of spoken English is sufficient to perform certain tasks or functions requiring mastery of spoken English. The Global Scale of English covers a wide range of abilities in spoken English communication. In most, cases, score users must decide what Versant English Test score is considered a minimum requirement in their context (Le. a cut score). Score users may wish to base their selection of an appropriate cut score on their own localized research. Pearson can provide a Benchmarking Kit and further assistance in establishing cut scores. 4.3 Score Interpretation ‘Two summary tables offer a quick reference for interpreting Versant English Test scores in terms of the Common European Framework of Reference descriptors. Table 1 in the Appendix presents an overview relating the Common European Framework global scale (Council of Europe, 2001:24) to Versant English Test Overall scores as reported on the Global Scale of English. Table 2 in the Appendix provides the more specific scale of Oral Interaction Descriptors used in the studies designed to align Overall scores to the CEFR levels. The method used to create the reference tables is described in the Can-Do Guide. Please contact Pearson for this report. 5. Validation The scoring models used in the first version of the Versant English Test were validated in a series of studies to over 4,000 native and non-native English speakers. In the initial validation study, the native group comprised 376 literate adults, geographically representative of the U.S. population aged 18 to 50. Ithad a female:male ratio of 60:40 and was 18% African-American. The non-native group was a stratified random sample of 514 candidates sampled from a larger group of more than 3,500 non-native candidates. Stratification was aimed at obtaining an even representation for gender and for native language. Over 40 different languages were represented in the non-native norming group, including Arabic, Chinese, Spanish, Japanese, French, Korean, Italian, and Thal. Ages ranged from 17 to 79 and the female:male ratio was 50:50. More information about these initial validation studies can be found in Validation Summary for PhonePass SET-10. Please contact Pearson for this report. ‘The Versant English Test has undergone several modifications. The test has been previously known as PhonePass, SET-10, and Versant for Engilsh. Because of the introduction of several modifications, a number of additional validation studies have been performed. With each modification, the accuracy of the test has improved but the scores are still correlated highly with previous versions. The additional @ Pearson 14 rant Hints ragceattiaterramecere seme(()) VERSANT validation studies used a native norming group of 775 native speakers of English from the U.S. and the UX. and a non-native norming group of 603 speakers from a number of countries in Asia, Europe and South America, The native norming group consisted of approximately 33% of speakers from the U.K. and 66% from the U.S. and had a female:male ratio of 55:45, Ages ranged from 18 to 75. The non-native norming group had a female:male ratio of 62:38. Ages ranged from 12 to 56. ina recent version of the Versant English Test, Story Retelling items were introduced. Scores on Story Retelling items contribute to all four subscores. A correlation of 0.99 (n=149) was found between the current version of the Versant English Test and the version on which previous validation studies were conducted. The high correlation suggests that many of the inferences from validation studies conducted with the previous releases also apply to the new version. Some of the data presented in this section were collected in validation studies for the previous versions and are assumed to generalize to the most recent version of the test. All scores, statistics, and results in the validation studies below (§5.1-5.4) use the original Versant scale of 20 to 80 rather than the GSE of 10 to 90. 5.1 Validity Study Design Validity analyses examined three aspects of the Versant English Test scores: 1. Internal quality (reliability and accuracy): whether or not the Versant English Test a) provides consistent scores that accurately reflect the scores that human listeners and raters would assign and b) provides distinct subscores that measure different aspects of the test construct. 2. Relation to known populations: whether or not the Versant English Test scores reflect expected differences and similarities among known populations (e.g, natives vs. English learners). 3. Relation to scores of tests with related constructs: how closely Versant English Test scores predict the reliable information in scores of well-established speaking tests. 5.1.1 Validation Sample From the large body of spoken performance data collected from native and non-native speakers of English, a total of 149 subjects were set aside for a series of validation analyses. Over 20 different languages were represented in the validation sample. Ages ranged from 20 to 55 and the female:male ratio was 42:58, Care was taken to ensure that the training dataset and validation dataset did not overlap. That is, the spoken performance sample provided by the validation candidates were excluded from the datasets used for training the automatic speech processing models or for training any of the scoring models. A total of seven native speakers were included in the validation dataset but have been excluded from the validity analyses so as not to inflate the correlations. 5.2 Internal Validity To understand the consistency and accuracy of the Versant English Overall scores and the distinctness of the subscores, the following indicators were examined: the standard error of measurement of the Versant English Overall score; the reliability of the Versant English Test (split-half and test-retest); the correlations between the Versant English Overall scores and subscores, and between pairs of subscores; @ Pearson 15 rant Hints ragceattiaterramecere seme(()) VERSANT. comparison of machine-generated Versant English scores with listener-judged scores of the same Versant English tests. These qualities of consistency and accuracy of the test scores are the foundation of any valid test (Bachman & Palmer, 1996). 5.2.1 Standard Error of Measurement The Standard Error of Measurement (SEM) provides an estimate of the amount of error in an individual's observed test scores and “shows how far it is worth taking the reported score at face value” (Luoma, 2004: 183). The SEM of the Versant English Overall score (on the Versant scale) is 2.8, 5.2.2 Reliability Split-half Reliability Score reliabilities were estimated by the split-half method (n=143). Split-half reliability was calculated for the Overall score and all subscores. The split-half reliabilities use the Spearman-Brown Prophecy Formula to correct for underestimation and are similar to the reliabilities calculated for the uncorrected equivalent form dataset. The human scores were calculated from human transcriptions (for the Sentence Mastery and Vocabulary subscores) and human judgments (for the Pronunciation and Fluency subscores). Table 2 presents split-half reliabilities based on the same individual performances scored by careful human rating in one case and by independent automatic machine scoring in the other case. The values in Table 2 suggest that there is sufficient information in a Versant English Test item response set to extract reliable information, and that the effect on reliability of using the Versant speech recognition technology, as opposed to careful human rating, is quite small across all score categories. The high reliability score is a good indication that the computerized assessment will be consistent for the same candidate assuming no changes in the candidate's language proficiency level. Table 2. Split-Half Reliabilities of Versant English Test Machine Scoring versus Human Scoring De am OEE ae aed Cee (n= 143) Overall 0.97 0.99 Sentence Mastery | 0.92 0.95 Vocabulary 0.92 0.93 Fluency 0.97 0.99 Pronunciation 0.97 0.99 @ Pearson ne Heyareaereeereetae(()) VERSANT. Test-Retest Reliability Score reliabilities were also estimated by the test-retest method (n=140), Three randomly generated test forms were administered in a single session to 140 participants. Tests were administered via telephone and computer. The participants were adult learners of English studying at a community college or university and came from a wide range of native language backgrounds. The mean age was 32 years (sd .75). Test administrations are referred to as Test 1, Test 2, and Test 3. Comparisons between Test 1 and Test 2 represent test-retest reliability in the absence of a practice test, while comparisons between Test 2 and Test 3 represent test-retest reliability in the presence of a practice test (ie., Test 1). Comparisons between Test 1 and Test 3 represent “repetition effects” or “practice effects’, which is the possibility that test scores naturally improve with increased experience with the task. Test-retest reliability was estimated using Pearson’s correlation coefficient applied to overall Versant English Test scores at the three different administrations. Results of the correlation analyses are summarized in Table 3 below. Table 3. Correlations between Versant English Test Overall Scores (n=140) Err enon Without a practice test (Test 1 vs. Test 2) | 0.97 With a practice test (Test 2 vs. Test 3) 0.97 Repetition effects (Test 1 vs. Test 3) 07 ‘These data show that test-retest reliability is high with or without a practice test. It also suggests that increasing familiarity with the tasks does not result in any consistent change in Versant English overall scores. To determine whether there were any statistically significant differences between scores on any of the three administrations, a separate single-factor Analysis of Variance (ANOVA) was performed with Administration Order (Test 1, 2, or 3) as a factor. Descriptive results of the scores are summarized in Table 4. Table 4, Mean Overall Versant English Test Scores and Standard Deviations across Administration Order (n=140) Test 1 Test2 Test3 Mean (sd) [44.46 (15.30) | 44.99 (14.25) | 44.72 (15.17) There were no statistically significant differences in administration order. Mean score differences are <1 point between each administration of the test, which is well within the standard error of measurement (28 points). @ Pearson W7(()) VERSANT The above data were also used to examine the possible grading differences between two different Versant administration modalities: computer-delivered ("CDT") and telephone, a previously available test ‘modality. The order of presentation of the CDT versus phone modality of the test was randomized and counterbalanced across participants. Test 1 was treated as a practice test; Tests 2 and 3 were the CDT and telephone versions of the test. The difference of overall scores was analyzed using a paired, two-tailed t-test (a = .05). No significant difference was found between the overall scores of the CDT version (rm = 52.3, sd = 13.9) and the telephone delivered version (m= 52.7, sd= 14.5) (467) = -0.66, n.s.). These results strongly suggest that there is no systematic difference between Versant English Test scores from the same candidate when the test is taken via CDT or by telephone. 5.2.3 Dimensionality: Correlation between Subscores Ideally, each subscore on a test provides unique information about a specific dimension of the candidate's ability. For spoken language tests, the expectation is that there will be a certain level of covariance between subscores given the nature of language learning. When language learning takes place, the candidate's skills tend to improve across multiple dimensions. However, if all the subscores were to correlate perfectly with one another, then the subscores might not be measuring different aspects of facility with the spoken language. Table 5 presents the correlations among the Versant English Test subscores and the Overall scores for a semi-randomly selected non-native sample. Table 5. Correlations among Versant English Test Subscores for a Semi-randomly Selected Non-Native Sample (n=1152) ren ea ceiaces Teen Overall Sentence Mastery | - O72 0.55 0.56 0.85 Vocabulary : 051 0.53 0.78 Pronunciation = 0.80 0.86 Fluency : 0.88 Overall As expected, test subscores correlate with each other to some extent by virtue of presumed general covariance within the candidate population between different component elements of spoken language skills. The correlations between the subscores are, however, significantly below unity, which indicates that the different scores measure different aspects of the test construct, using different measurement methods, and different sets of responses. This data set (n=1152) was semi-randomly selected from tests delivered over a six month period. A broad range of native languages is represented. A different pattern may be found when different native languages are sampled. Figure 3 illustrates the relationship between two relatively independent machine scores (Sentence @ Pearson 18 eect:

New VEPT Opinion
100% (1)
New VEPT Opinion
3 pages
Versant P
No ratings yet
Versant P
28 pages
Voice Versant 4 Answers
100% (1)
Voice Versant 4 Answers
4 pages
Versant Helper
No ratings yet
Versant Helper
15 pages
Versant Handouts (Day 3)
No ratings yet
Versant Handouts (Day 3)
8 pages
Versant English Test: Official Guide For Test-Takers
No ratings yet
Versant English Test: Official Guide For Test-Takers
21 pages
Tutorial Guide VPro Speaking Test
100% (1)
Tutorial Guide VPro Speaking Test
13 pages
Versa NT Writing Test Validation
No ratings yet
Versa NT Writing Test Validation
30 pages
Versa NT English Test Validation DK
No ratings yet
Versa NT English Test Validation DK
31 pages
VEPT TestDescription ValidationSummary
No ratings yet
VEPT TestDescription ValidationSummary
38 pages
Versant English Test Description Validation Report
No ratings yet
Versant English Test Description Validation Report
34 pages
Versant Pro - Writing: Test Overview
No ratings yet
Versant Pro - Writing: Test Overview
5 pages
Test Format: Part A: Reading
100% (1)
Test Format: Part A: Reading
3 pages
Versant Writing Test: Official Guide For Test-Takers
No ratings yet
Versant Writing Test: Official Guide For Test-Takers
18 pages
Versant English Test: Can Do Guide
No ratings yet
Versant English Test: Can Do Guide
25 pages
Pearsons Automated Scoring of Writing Speaking and Mathematics
100% (1)
Pearsons Automated Scoring of Writing Speaking and Mathematics
24 pages
UPS - Versant Sheet - 52-101-44 PDF
No ratings yet
UPS - Versant Sheet - 52-101-44 PDF
1 page
Versant Sheet
No ratings yet
Versant Sheet
1 page
Versant 4 Skills - For Candidates
No ratings yet
Versant 4 Skills - For Candidates
10 pages
Tutorial Guide VPro Writing Test
No ratings yet
Tutorial Guide VPro Writing Test
10 pages
Test Paper Be, Break, Bring, Carry, Cut, Come, Go, Hold I. Complete The Blanks With The Phrasal Verbs
No ratings yet
Test Paper Be, Break, Bring, Carry, Cut, Come, Go, Hold I. Complete The Blanks With The Phrasal Verbs
2 pages
Repat
No ratings yet
Repat
17 pages
Session 1 - Versant Preparation - Handout
100% (1)
Session 1 - Versant Preparation - Handout
6 pages
Versant Handouts (Day 4)
No ratings yet
Versant Handouts (Day 4)
7 pages
Versant Dinesh
No ratings yet
Versant Dinesh
4 pages
VEPT - Nguyen Viet Quan
No ratings yet
VEPT - Nguyen Viet Quan
1 page
3 RD
No ratings yet
3 RD
3 pages
Vocabulary Study For Exam 2
100% (1)
Vocabulary Study For Exam 2
3 pages
VersantPartA Readingpracticepassages
No ratings yet
VersantPartA Readingpracticepassages
7 pages
Fill in The Blanks Questions
100% (1)
Fill in The Blanks Questions
4 pages
Versant Test Sorular
No ratings yet
Versant Test Sorular
12 pages
Voice Versant 6 Answers
No ratings yet
Voice Versant 6 Answers
5 pages
Part - A
No ratings yet
Part - A
3 pages
Versant Handouts (Day 2)
No ratings yet
Versant Handouts (Day 2)
7 pages
Read Them and Find Out Why These Lines Have This Answer - 103758
No ratings yet
Read Them and Find Out Why These Lines Have This Answer - 103758
12 pages
Amazon Online Test Reference Guide - ERP
No ratings yet
Amazon Online Test Reference Guide - ERP
2 pages
Versant Handouts (Day 1)
100% (1)
Versant Handouts (Day 1)
8 pages
Passage Reconstruction With Out Summary
No ratings yet
Passage Reconstruction With Out Summary
7 pages
Sentence Completion
No ratings yet
Sentence Completion
5 pages
Sentence Completion Test
100% (1)
Sentence Completion Test
6 pages
Scoring Way in Versant Test
50% (2)
Scoring Way in Versant Test
2 pages
Read Them and Find Out Why These Lines Have This Answer
No ratings yet
Read Them and Find Out Why These Lines Have This Answer
10 pages
Versant Training Amazon April Drive
No ratings yet
Versant Training Amazon April Drive
27 pages
Versant 1
100% (1)
Versant 1
6 pages
Tarea Gleny
100% (1)
Tarea Gleny
6 pages
Sentence Completetion Examples
100% (1)
Sentence Completetion Examples
4 pages
1545821266
No ratings yet
1545821266
9 pages
Grammar
No ratings yet
Grammar
4 pages
Versant A
100% (2)
Versant A
6 pages
Amcat Test Briefing
No ratings yet
Amcat Test Briefing
2 pages
Amazon Test
No ratings yet
Amazon Test
79 pages
Versant Database v.7.0.1.0 Administration Manual
No ratings yet
Versant Database v.7.0.1.0 Administration Manual
465 pages
Complete Videos Versant Test
100% (1)
Complete Videos Versant Test
6 pages
Written Assignment
No ratings yet
Written Assignment
5 pages
TalentLens Versant Global Flyer
No ratings yet
TalentLens Versant Global Flyer
4 pages
Versant English Placement Test Description Validation Summary PDF
No ratings yet
Versant English Placement Test Description Validation Summary PDF
28 pages
Versant by Pearson Speaking and Listening Test Official Test Guide
No ratings yet
Versant by Pearson Speaking and Listening Test Official Test Guide
20 pages
VERSANT
No ratings yet
VERSANT
1 page
Versant by Pearson English Placement Test Official Test Guide
No ratings yet
Versant by Pearson English Placement Test Official Test Guide
26 pages
Product Info Sheet Versant English Test
No ratings yet
Product Info Sheet Versant English Test
2 pages

Versant English Test Description Validation Report

Uploaded by

Versant English Test Description Validation Report

Uploaded by

You might also like