Assignment: Language Assessment, Pertemuan 13

Summary

ASSESSING LISTENING

In earlier chapters, a number of foundational principles of language assessment were introduced. Concepts like practicality, reliability, validity, authenticity, wash-back, direct and indirect testing, and formative and summative assessment are by now part of your vocabulary. Now our focus will shift away from the standardized testing juggernaut to the level at which you will usually work: the day-to-day classroom assessment of listening, speaking, reading, and writing. Since this is the level at which you will most frequently have the opportunity to apply principles of assessment.

A. OBSERVING THE PERFORMANCE OF THE FOUR SKILLS

Before focusing on listening itself, think about the two interacting concepts of performance and observation. All language users perform the acts of listening, speaking, reading, and writing. When you propose to assess someone's ability in one or a combination of the four skills, you assess that person's competence, but you observe the person's performance. So, one important principle for assessing a learner's competence is to consider the fallibility of the results of a single performance, such as that produced in a test. That could take the form of one or more of the following designs:

Several tests that are combined to form an assessment
A single test with multiple test tasks to account for learning styles and performance variables
In-class and extra-class graded work
Alternative forms of assessment (e.g., journal, portfolio, conference, observation, self-assessment, peer-assessment).

A second principle is one that we teachers often forget. We must rely as much as possible on observable performance in our assessments of students. The process of the listening performance itself is the invisible inaudible process of internalizing meaning from the auditory signals being transmitted to the ear and brain. The productive skills of speaking and writing allow us to hear and see the process as it is performed. Writing gives a permanent product in the form of a written piece. But unless you have recorded speech, there is no permanent observable product for speaking performance because all those words you just heard have vanished from your perception and (you hope) have been transformed into meaningful intake somewhere in your brain.

Receptive skills, you cannot observe the actual act of listening or reading, nor can you see or hear an actual product you can observe learners only while they are listening or reading. The upshot is that all assessment of listening and reading must be made on the basis of observing the test-taker's speaking or writing (or nonverbal response), and not on the listening or reading itself. So, all assessment of receptive performance must be made by inference.

B. THE IMPORTANCE OF LISTENING

Listening has often played second fiddle to its counterpart~ speaking. In the standardized testing industry. One reason for this emphasis is that listening is often implied as a component of speaking. Every teacher of language knows that one's oral production ability-other than monologues, speeches, reading aloud and the like-is only as good as one's listening comprehension ability. But of even further impact is the likelihood that input in the aural-oral mode accounts for a large proportion of successful language acquisition. We therefore need to pay close attention to listening as a mode of performance for assessment in the classroom.

C. BASIC TYPES OF LISTENING

As with all effective tests, designing appropriate assessment tasks in listening begins with the specification of objectives, or criteria. Those objectives may be classified in terms of several types of listening performance. Each of these stages represents a potential assessment objective:
Comprehending of surface structure elements such as phonemes, words, intonation, or a grammatical category
Understanding of pragmatic context
Determining meaning of auditory input
Developing the gist, a global or comprehensive understanding

From these stages we can derive four commonly identified types of listening performance, each of which comprises a category within which to consider assessment tasks and procedures.
1. Intensive. Listening for perception of the components (phonemes, words, intonation, discourse markers, etc.) of a larger stretch of language.
2. Responsive. Listening to a relatively short stretch of language (a greeting, question, command, comprehension check, etc.) in order to make an equally short response.
3. Selective. Processing stretches of discourse such as short monologues for several minutes in order to "scan" for certain information. The purpose of such performance is not necessarily to look for global or general meanings, but to be able to comprehend designated information in a context of longer stretches of spoken language (such as classroom directions from a teacher, TV or radio news items, or stories). Assessment tasks in selective listening could ask students, for example, to listen for names, numbers, a grammatical category, directions (in a map exercise), or certain facts and events.
4. Extensive. Listening to· develop a top-down, global understanding of spoken language. Extensive performance ranges from listening to lengthy lectures to listening to a conversation and deriving a comprehensive message or purpose. Listening for the gist, for the main idea, and making inferences are all part of extensive listening. 

D. MICRO AND MACROSKILLS OF LISTENING

A useful way of synthesizing the above two lists is to consider a finite number of micro- and macro skills implied in the performance of listening comprehension. Richards' (1983) list of micro skills has proven useful in the domain of specifying objectives for learning and may be even more useful in forcing test makers to carefully identify specific assessment objectives.

E. DESIGNING ASSESSMENT TASKS: INTENSIVE LISTENING

Once you have determined objectives, your next step is to design the tasks, including making decisions about how you will elicit performance and how you will' expect the test-taker to respond.

1. Recognizing Phonological and Morphological Elements
A typical form of intensive listening at this level is the assessment of recognition of phonological and morphological elements of language.
2. Para-phrase Recognition
The next step up on the scale of listening comprehension micro skills is words, phrases, and sentences, which are frequently assessed by providing a stimulus sentence and asking the test-taker to choose the correct paraphrase from a number of choices.

F. DESIGNING ASSESSMENT TASKS: RESPONSIVE LISTENING

The objective of this item is recognition of the wh-question bow much and its appropriate response. Distractors are chosen to represent common learner errors: (a) responding to how much vs. how much longer; (c) confusing how much in reference to time vs. the more frequent reference to money; (d) confusing a wh-question with a yes/no question. If open-ended response formats gain a small amount of authenticity and creativity, they of course suffer some in their practicality, as teachers must then read students' responses and judge their appropriateness, which takes time.

G. DESIGNING ASSESSMENT TASKS: SELECTIVE LISTENING

A third type of listening performance is selective listening, in-which the test-taker listens to a limited quantity of aural input and must discern within it some specific information. A number of techniques have been used 'that require selective listening.

1. Listening Cloze
Listening cloze tasks (sometit11es called cloze dictations or partial dictations) require the test-taker to listen to a story monologue or conversation and simultaneously read the written text in which selected words or phrases have been deleted. In its generic form, the test consists of a passage in which every nth word (typically every seventh word) is deleted and the test-taker is asked to supply an appropriate word. In a listening cloze task, test-takers see a transcript of the passage that they are listening to and fill in the blanks with the words or phrases that they hear.

Other listening cloze tasks may focus on a grammatical category such as verb tenses, articles, two-word verbs, prepositions, or transition words/phrases. Listening cloze tasks should normally use an exact word method of scoring, in which you accept as a correct response only the actual word or phrase that was spoken and consider other appropriate words as incorrect.

2. Information Transfer
Selective listening can also be assessed through an information transfer technique in which aurally processed information must be transferred to a visual representation, such as labeling a diagram, identifying an element in a picture, completing a form, or showing routes on a map.

3. Sentence Repetition
The task of simply repeating a sentence or a partial sentence, or sentence repetition, is also used as an assessment of listening comprehension. Incorrect listening comprehension, whether at the phonemic or discourse level, may be manifested in the correctness of the repetition. A miscue in repetition is scored as a miscue in listening. In the case of somewhat longer sentences, one could argue that the ability to recognize and retain chunks of language as well as threads of meaning might be assessed through repetition. Sentence repetition is far from a flawless listening assessment task. Buck (2001, p.79) noted that such tasks "are not just tests of listening, but tests of general oral skills."

H. DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING
Drawing a clear distinction between any two of the categories of listening referred to here is problematic, but perhaps the fuzziest division is between selective and extensive listening.

1. Dictation
Dictation is a widely researched genre of assessing listening comprehension. In a dictation, test-takers hear a passage, typically of 50 to 100 words, recited three times: first, at normal speed; then, with long pauses between phrases or natural word groups, during which time test-takers write down what they have just heard; and finally, at normal speed once more so they can check their work and proofread. Dictations have been used as assessment tools for decades. The difficulty of a dictation task can be easily manipulated by the length of the word groups (or bursts, as they are technically called), the length of the pauses, the speed at which the text is read, and the complexity of the discourse, grammar, and vocabulary used in the passage.

2. Communicative Stimulus Response Tasks
Another-and more authentic-example of extensive listening is found in a popular genre of assessment task in which the test-taker is presented with a stimulus monologue or conversation and then is asked to respond to a set of comprehensions.

3. Authentic Listening Tasks
Ideally, the language assessment field would have a stockpile of listening test types that are cognitively demanding communicative, and authentic, not to mention interactive by means of an integration with speaking. However, the nature of a test as a sa1nple of performance and a set of tasks with limited time frames implies an equally limited capacity to mirror all the real-world contexts of listening performance. “There is no such thing as a communicative test” stated Buck (2001,p.29). “Every test requires some components of communicative language ability, and no test covers them all. Similarly, with the notion of authenticity, every task shares some characteristics with target-language tasks, and no test is completely authentic”.  Here are some possibilities

1. Note-taking. In the academic world, classroom lectures by professors are common features of a non-native English-user's experience.
2. Editing. Another authentic task provides both a written and a spoken stimulus, and requires the test-taker to listen for discrepancies.
3. Interpretive tasks. One of the intensive listening tasks described above was paraphrasing a story or conversation. An interpretive task extends the stimulus material to a longer stretch of discourse and forces the test-taker to infer a response.
4. Retelling. In a related task, test-takers listen to a story or news event and simply retell it, or summarize it, either orally (on an audiotape) or in writing.

ASSESSING SPEAKING

From a pragmatic view of language performance, listening and speaking are almost always closely interrelated. While speaking is a productive skill that can be directly and empirically observed, those observations are invariably colored by the accuracy and effectiveness of a test-taker’s listening skill, which necessarily compromises the reliability and validity of an oral production test. Another challenge is the design of elicitation techniques. Because most speaking is the product of creative construction of linguistic strings, the speaker makes choices of lexicon, structure, and discourse. As tasks become more and more open ended, the freedom of choice given to test-takers creates a challenge in scoring procedures. In receptive performance, the elicitation stimulus can be structured to anticipate predetermined responses and only those responses.

A. BASIC TYPES OF SPEAKING

a. Imitative. At one end of a continuum of types of speaking performance is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely phonetic level of oral production, a number of prosodic, lexical, and grammatical properties of language may be included in the criterion performance. We are interested only in what is traditionally labeled "pronunciation"; no inferences are made about the test-taker's ability to understand or convey meaning or to participate in an interactive conversation. The only role of listening here is in the short-term storage of a prompt, just long enough to, allow the speaker to retain the short stretch of language that must be imitated.

b. Intensive. A second type of speaking frequently employed in assessment contexts is the production of short stretches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements-intonation, stress, rhythm, juncture). The speaker must be aware of semantic properties in order to be able to respond, but interaction with an interlocutor or test administrator is minimal at best. Examples of intensive assessment tasks include directed response tasks, reading aloud, sentence and dialogue completion; limited picture-cued tasks including simple sequences; and translation up to the simple Sentence level.

c. Responsive. Responsive assessment tasks include interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like. The stimulus is almost always a spoken prompt (in order to preserve authenticity), with perhaps only one or two follow-up questions or retorts.

d. Interactive. The difference between responsive and interactive" speaking is in the length and complexity of the interaction, which sometimes includes multiple exchanges and/or multiple participants. Interaction can take the two forms of transactional language which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships.

e. Extensive (monologue). Extensive oral production tasks include speeches, oral presentations, and story-telling, during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out altogether. Language style is frequently more deliberative (planning is involved) and" formal for extensive tasks, but we cannot rule out certain informal monologues" such as casually delivered speech (for example, my vacation in the mountains, a recipe for outstanding pasta primavera, recounting the plot of a novel or movie).

B. MICRO AND MACROSKILLS OF SPEAKING

A list of listening micro- and macro skills enumerated the various components of listening that make up criteria for assessment. A similar list of speaking skills can be drawn up for the same purpose: to serve as a taxonomy of skills from which you 'will select one or several that will become the objective(s) of an assessment task. The micro skills refer to producing the smaller chunks of language such as phonemes, morphemes, words, collocations, and phrasal units. The macro skills imply the speaker's focus on the larger elements: fluency, discourse, function, style, cohesion, nonverbal communication, and strategic options.

As you consider designing tasks for assessing spoken language, these skills can act as a checklist of objectives. While the macro skills have the appearance of being more complex than the micro skills, both contain ingredients of difficulty, depending on the stage and context of the test-taker. Below is a consideration of the most common techniques with brief allusions to related tasks consider three important issues as you set out to design tasks:
1. No speaking task is capable of isolating the single skill of oral production. Concurrent involvement of the additional performance of aural comprehension, and possibly reading, is usually necessary.
2. Eliciting the specific criterion you have designated for a task can be tricky because beyond the word level, spoken language offers a number of productive options to test-takers. Make sure your elicitation prompt achieves its aims as closely as possible.
3. Because of the above two characteristics of oral production assessment, it is important to carefully specify scoring procedures for a response so that ultimately you achieve as high a reliability index as possible.

C. DESIGNING ASSESSMENT TASKS: IMITATIVE SPEAKING

An occasional phonologically focused repetition task is warranted as long as repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment, and as long as you artfully avoid a negative wash-back effect. Such tasks range from word level to sentence level, usually with each item focusing on a specific phonological criterion. In a simple repetition task, test-takers repeat the stimulus, whether it is a pair of words, a sentence, or perhaps a question (to test for intonation production). A variation on such a task prompts test-takers with a brief written, stimulus which they are to read aloud.

D. DESIGNING ASSESSMENT TASKS: INTENSIVE SPEAKING

At the intensive level, test-takers are prompted to produce short stretches of discourse (no more than a sentence) through which they demonstrate linguistic ability at a specified level of language. Many tasks are "cued" tasks in that they lead the test-taker into a narrow band of possibilities. Parts C and D of the Phone Pass test fulfill the criteria of intensive tasks as they elicit certain expected forms of language.

Antonyms like high and low, happy and sad are prompted so that the, automated scoring mechanism anticipates only one word. The either/or task of Part D fulfills the same criterion. Intensive tasks may also be described as limited response tasks (Madsen, 1983), or mechanical tasks (Underhill, 1987), or what classroom pedagogy would label as controlled responses
1. Directed Response Tasks
In this type of task, the test administrator elicits a particular grammatical form or a transformation of a sentence. Such tasks are clearly mechanical and not communicative but they do require minimal processing of meaning in order to produce the correct grammatical output.

2. Read Aloud Tasks
Intensive reading-aloud tasks include reading beyond the sentence level up to a paragraph or two. This technique is easily administered by selecting a passage that incorporates test specs and by recording the test-taker's output; the scoring is relatively easy because all of the test-taker's oral production is controlled. If reading aloud shows certain practical advantages (predictable output, practicality, reliability in scoring), there are several drawbacks to using this technique for assessing oral production.

3. Sentence/Dialogue Completion Tasks and Oral Questionnaires
Another technique for targeting intensive aspects of language requires test-takers to read dialogue in which one speaker's lines have been omitted. Test-takers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. Then as the tape, teacher, or test administrator produces one part orally, the test-takers responds. An advantage of this technique lies in its moderate control of the output of the test-taker. While individual variations in responses are accepted, the technique taps into a learner's ability to discern expectancies in a conversation and to produce sociolinguistic correct language. One disadvantage of this technique is its reliance on literacy and an ability to transfer easily from written to spoken English. Underhill (1987) describes yet another technique that is useful for controlling the test-taker's output: form-filling, or what I might rename "oral questionnaire." Here the test-taker sees a questionnaire that asks for certain categories of information (personal data, academic information, job experience, etc.) and supplies the information orally.

4. Picture Cued Tasks
One of the more popular ways to elicit oral language performance at both intensive and extensive levels is a pictl1re-cued stimulus that requires a description from the test-taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and "busy"; or composed of a series that tells a story or incident. Opinions about paintings, persuasive monologue and directions on a map create a more complicated problem for scoring.

More demand is placed on the test administrator to make calculated judgments, in which case a modified form of a scale such as the one suggested for evaluating interviews (below) could be used: grammar, vocabulary, comprehension, fluency, pronunciation, task (accomplishing the objective of the elicited task).

5. Translation (of Limited Stretches of Discourse)
Translation is a part of our tradition in language teaching that we tend to discount or disdain, if only because our current pedagogical stance plays down its importance. Translation methods of teaching are certainly pass in an era of direct approaches to creating communicative classrooms. But we should remember that in countries where English is not the native or prevailing language, translation is a meaningful communicative device in contexts where the English user is called on to be an interpreter. Also, translation is a well-proven communication strategy for learners of a second language. Under certain constraints, then, it is not far-fetched to suggest translation as a device to check oral production.

E. DESIGNING ASSESSMENT TASKS: RESPONSIVE SPEAKING

Assessment of responsive tasks involves brief interactions with an interlocutor, differing from intensive tasks in the increased creativity given to the test-taker and from interactive tasks by the somewhat limited length of utterances.

1. Question and Answer
Question-and-answer tasks can consist of one or two questions from an interviewer, or they can make up a portion of a whole battery of questions and prompts in an oral interview. The first question is intensive in its purpose; it is a display question intended to elicit a predetermined correct response.

2. Giving Instructions and Directions
We are all called on in our daily routines to read instructions on how to operate an appliance, how to put a bookshelf together, or how to create a delicious clam chowder. Somewhat less frequent is the mandate to provide such instructions orally, but this speech act is still relatively common. Using such a stimulus in an assessment context provides an opportunity for the test-taker to engage in a relatively extended stretch of discourse, to be very clear and specific, and to use appropriate discourse markers and connectors.

3. Para-phrasing
Another type of assessment task that can be categorized as responsive asks the test-taker to read or hear a limited number of sentences (perhaps two to five) and-produce a paraphrase of the sentence. The advantages of such tasks are that they elicit short stretches of output and perhaps tap into test-takers' ability to practice the conversational art of conciseness by reducing the output/input ratio.

F. DESIGNING ASSESSMENT TASKS: INTERACTIVE SPEAKING

The final two categories of oral production assessment (interactive and extensive speaking) include tasks that involve relatively long stretches of interactive discourse (interviews, role plays, discussions, games) and tasks of equally long duration but that involve less interaction (speeches, telling longer stories, and extended explanations and translations).The obvious difference between the two sets of tasks is the degree of interaction with an interlocutor. Also, interactive tasks are what some would describe as interpersonal, while the final category includes more transactional speech events.

1. Interview
When “oral production assessment" is mentioned, the first thing that comes to mind is an oral interview: a test administrator and a test-taker sit down in a direct face-to face exchange and proceed through a protocol of questions and directives. The interview, which may be tape-recorded for re-listening, is then scored on one or more parameters such as accuracy in pronunciation and/or grammar, vocabulary usage, fluency, sociolinguistic/pragmatic appropriateness, task accomplishment, and even comprehension. Every effective interview contains a number of mandatory stages. 1W'0 decades ago, Michael Canale (1984) proposed a framework for oral proficiency testing that has withstood the test of time. He suggested that test-takers will perform at their best if they are led through four stages: Warm-up, Level check, Probe, Wind-down.

2. Role Play
Role playing is a popular pedagogical activity in communicative language-teaching classes. Within constraints set forth by the guidelines, it frees students to be somewhat creative in their linguistic output. In some versions, role play allows some rehearsal time so that students can map out what they are going to say. And it has the effect of lowering anxieties as students can, even for a few moments, take on the persona of someone other than themselves. As an assessment device, role play opens some windows of opportunity for test-takers to use discourse that might otherwise be difficult to elicit.

3. Discussions and Conversations
As formal assessment devices, discussions and conversations with and among students are difficult to specify and even more difficult to score. But as informal techniques to assess learners, they offer a level of authenticity and spontaneity that other assessment techniques may not provide. Discussions may be especially appropriate tasks through which to elicit and observe such abilities as
Topic nomination, maintenance, and termination;
Attention getting, interrupting, floor holding, control;
Clarifying, questioning, paraphrasing;
Comprehension Signals (nodding, "uh-huh,""hmm," etc.);
Negotiating meaning; • intonation patterns for pragmatic effect;
Kinesics, eye contact, proxemics, body language; and
Politeness, formality, and other sociolinguistic factors.

4. Games
Clearly, such tasks have wandered away from the traditional notion of an oral production test and may even be well beyond assessments. As assessments, the key is to specify a set of criteria and a reasonably practical and reliable scoring method.

G. DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING

Extensive speaking tasks involve complex, relatively lengthy stretches of discourse. They are frequently variations on monologues, usually with minimal verbal interaction.

1. Oral Presentations
In the academic and professional arenas, it would not be uncommon to be called on to present a report, a paper, a marketing plan, a sales idea, a design of a new product, or a method. A summary of oral assessment techniques would therefore be incomplete without some consideration of extensive speaking tasks. Once again the rules for effective assessment must be invoked: (a) specify the criterion, (b) set appropriate tasks, (c) elicit optimal output, and (d) establish practical, reliable scoring procedures.

2. Picture Cued Story Telling
One of the most common techniques for eliciting oral production is through visual pictures, photographs, diagrams, and charts. We have already looked at this' elicitation device for intensive tasks, but at this level we consider a picture or a series of pictures as a stimulus for a longer story or description.

3. Retelling a Story, News Event
In this type of task, test-takers hear or read a story or news event that they are asked to retell. This differs from the paraphrasing task discussed above (pages 161-162) in that it is a longer stretch of discourse and a different genre. The objectives in assigning such a task vary from listening comprehension of the original to production of a number of oral discourse features (communicating sequences and relationships 01 events, stress and emphasis patterns, "expression" in the case of a dramatic story), fluency, and interaction with the hearer. Scoring should of course meet the intended criteria.

4. Translation (of Extended Prose)
Translation of words, phrases, or short sentences was mentioned under the category of-intensive speaking. Here, longer texts are presented for the test-taker to read in the native language and then translate into English those texts could come in many forms: dialogue, directions for assembly of a product, a synopsis of a story or play or movie, directions on how to find something on a map, and other genres.

References:
Brown, H. Douglas. Language Assessment: Principles and Classroom Practices. Longman.


Komentar

Postingan populer dari blog ini

Assignment Language Assessment, pertemuan 14

Assignment 5, Summary Designing classroom Language Tests

Assignment Language Assessment, Pertemuan 15