Assignment Language Assessment, Pertemuan 15

Summary

ASSESSING VOCABULARY

Chapter 1: The place of vocabulary in language assessment

At first glance, it may seem that assessing the vocabulary knowledge of second language learners is both necessary and reasonably straightforward. It is necessary in the sense that words are the basic building blocks of language, the units of meaning from which larger structure such as sentences, paragraphs and whole texts are formed. The widespread acceptance of the validity of these criticism has led to the adoption particularly in the major English-speaking countries-of the communicative approach to language testing. Today’s language proficiency tests do not set out to determine whether learners know the meaning of magazine or put on or approximate; whether they can distinguish ship and sheep. Instead, the test are based on tasks simulating communication activities that the learners are likely to be engaged in outside of the classroom.

Following Bachman’s (1990) earlier work, the authors see the purpose of language testing as being to allow us to make inferences about learner’ language ability, which consist of two components. One is language knowledge and the other is strategic competence. That is to say, learners need to know a lot about vocabulary grammar, sound system and spelling of the target language, but also need to be able to draw on that knowledge effectively for communicative purpose under normal time constraints.

Chapter 2: The Nature of Vocabulary

This chapter takes up the question of what we mean by vocabulary. We tend to think of it as consisting of individual words, as in the headwords of a dictionary; however, even the definition of a `word' is by no means straightforward. It is also necessary to consider lexical units that are larger than single words, such as compound nouns, phrasal verbs, idioms and fixed expressions of various kinds. For assessment purposes, vocabulary is not just a set of linguistic units but also an attribute of individual language learners, in the form of vocabulary knowledge and the ability to access that knowledge for communicative purposes.

At the simplest level vocabulary consist of words, but even the concept of a word is challenging to define and classify. For a number of assessment purpose, it is important to clarify what is meant by a word if the correct conclusion are to be drawn from the test result. Construct, Chaplle’s work points the way toward a definition of vocabulary ability that covers a winder range of assessment purpose and at the same time is consistent with Bachman and Palmer’s general construct of language ability. Whereas a construct of vocabulary knowledge may be satisfactory as the basis for the design of discrete, selective and context-independent test, Chapelle’s definition provides a better theoretical foundation for a construct that can incorporate embedded, comprehensive and context-dependent vocabulary measures as well.

Chapter 3: Research on Vocabulary Acquisition and Use

This chapter review the main lines of enquiry by researchers on second language vocabulary acquisition. Apart from the extensive work on methods of conscious vocabulary learning, researchers are investigating how acquisition of word knowledge occurs in a more incidental fashion through reading and listening activities. Other areas of interest are the ability of learners to guess the meaning of unknown words which they encounter in their reading, and the strategies they use to overcome gaps in their vocabulary knowledge when engaged in speaking and writing tasks.

Language acquisition research, L1 and L2, makes use of vocabulary assessment to explore how language skill develops; in tum, research informs our testing constructs. The ensuing review of research on vocabulary acquisition studies is concise and well presented. Of particular interest is Read's discussion of 'incidental vocabulary learning' and its relevance to the level of knowing a word that vocabulary tests tap. Read also notes that much of the research on vocabulary has been related to reading, leaving a gap in our knowledge of spoken language vocabulary.

Chapter 4: Research on Vocabulary assessment

Consider research in language testing that either has involved the investigation of vocabulary tests or has a bearing on vocabulary assessment. One issue in this area is whether the notion of a `pure' vocabulary test is at all tenable. I trace the move away from discrete-point vocabulary tests and look in some detail at the extent to which the cloze procedure and its variants can be regarded as measures of vocabulary. Much recent work on vocabulary testing has focused on estimating how many words learners know (or their vocabulary size). A complementary perspective is provided by other studies that seek to assess the quality (or `depth') of their vocabulary knowledge. Here, the previous threads are knitted with various types of vocabulary testing (eg vocabulary size, quality of vocabulary knowledge, doze testing).

Chapter 5: Vocabulary Test, Four Case Studies

Presents case studies of four vocabulary tests:
• Nation's Vocabulary Levels Test
• Meara and Jones's Eurocentres Vocabulary Size Test
• Paribakht and Wesche's Vocabulary Knowledge Scale; and
• The vocabulary items in the Test of English as a Foreign Language (TOEFL).
In addition to being influential instruments in their own right, these tests exemplify several of the main currents in vocabulary testing discussed in the previous chapter. Practical issues in the design of vocabulary tests are discussed in

Chapter 6: The Design of Discrete Vocabulary Test

The chapter includes discussion of two specific examples of test design from my own experience. One looks at some typical items for classroom progress tests, and the other is an account of my efforts to develop a workable test to measure depth of vocabulary knowledge. The reader might assume, given Read's framework that discrete item testing would receive a negative review in this book, but that is not the case. Read argues for the appropriateness of the test to the purpose for which the test is used- for example, in assessing the progress of vocabulary learning in a classroom situation, the discrete test may be quite appropriate. Read lists the advantages of discrete vocabulary testing and gives practical examples of the difficulties involved with various test designs.

As noted previously, Read argues in Chapter 6 that the contrast between receptive and productive vocabularies may be misleading. Instead, Read suggests two dimensions of this contrast: recognition-recall and comprehension-use. Recognition is where the test-taker's understanding of the meaning of a word is assessed, whereas recall refers to the ability to remember, having encountered the word (such as in an experiment). Comprehension, of course, is the understanding of meanings encountered when listening or reading; use refers to the vocabulary that actually appears in speech or writing. Thus, recognition and comprehension are different aspects, or levels if you will, for testing receptive vocabulary, and recall and use are aspects of productive vocabulary. For the language teacher, Chapter 6 is perhaps the most practical part of the book, for it is this type of testing that will most likely be used in classroom situations.

Chapter 7: Comprehensive Measure of vocabulary

The largest section of the chapter covers procedures that have been applied to the assessment of learners' writing. These include `objective' counts of the relative proportions of different types of word in a composition, as well as `subjective' rating scales. I also consider the application of comprehensive measures, such as read ability formulas, to the analysis of input material for tests involving reading and listening tasks. This chapter also introduces assessing speech, noting available studies in this area. Also included in this section is a rather general discussion of readability and calculating lexical density.

Chapter 8: Further Development in Vocabulary Assessment

This includes discussion of ways in which computer-based corpus research can contribute to the development of vocabulary measures. A second major theme is the need to broaden our view of the nature of vocabulary. More consideration should be given to the role of multi-word lexical items in language use. Another priority is to gain a better understanding of the vocabulary of speech, as distinct from written language. There should also be more focus on the social dimension of vocabulary use.
Read underlines throughout the book that much of the work on vocabulary has come from studies of reading, with little work on spoken vocabulary. There is a very real need for more work on spoken vocabulary and how to assess it. Read also notes that there is a need to assess longer lexical items, rather than the more traditional focus on single words. He also sees great promise from the increasing use of computers in second language testing. Another need is for a current frequency list of word use, which would also take into account current knowledge of specialized vocabularies and multiword items.

References: John. 2002. ASSESSING VOCABULARY. United Kingdom: University Press Cambridge

ASSESSING GRAMMAR

1. Differing notions of ‘grammar’ for assessment

The study of grammar has had a long and important role in the history of second language and foreign language teaching. Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be suﬃcient for learners to actually acquire another language (Rutherford, 1988). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century.

a. Grammar and linguistics
Such linguistic grammars are typically derived from data taken from native speakers and minimally constructed to describe well-formed utterances within an individual framework. These grammars strive for internal consistency and are mainly accessible to those who have been trained in that particular paradigm. Since the 1950s, there have been many such linguistic theories – too numerous to list here – that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently deﬁne grammar in educational contexts.
b. Form-based perspectives of language
Several syntactocentric, or form-based, theories of language have provided grammatical insights to L2 teachers. One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntactocentric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions.
c. Form and use-based perspectives of language
The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote diﬀerent meanings, we might turn to those theories that study grammatical form and use interfaces.
d. Communication-based perspectives of language
Other theories have provided grammatical insights from a communication based perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, ﬁxed meaning in language use (as seen in sentences 1.5 and 1.7 above), but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often coterminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy. In this respect, we will examine how grammatical ability has been conceptualized in L2 grammar teaching and learning, and how L2 grammar teaching and learning are intrinsically linked to assessment.

2. Research on L2 grammar teaching, learning and assessment

As we saw in the last chapter, second and foreign language educators have looked to diﬀerent schools of linguistics for insights about language. This has considerably broadened our notion of grammar and has led to a deeper understanding of the role that grammar plays in conveying meaning in communication. However, although linguistic analysis can tell us what the language system is and how it works, it still cannot tell us how second or foreign languages are best learned or what teaching practices most eﬀectively promote L2 learning.

In this chapter, I have demonstrated how the teaching, learning and assessment of L2 grammatical ability are intrinsically related. Language educators depend on linguists for information on the nature of language, so that teaching, learning and assessment can reﬂect current notions of language. Language educators also depend on experience, other language teachers and SLA researchers for insights on teaching and learning, so that the processes underlying instruction and acquisition can be obtained and so that information on how learning can be maximized can be generated. Finally, both language educators and SLA researchers depend on language testers for expertise in the design and development of assessments so that samples of learner performance can be consistently elicited, and so that the information observed from assessments can be used to make claims about what a learner does or does not know. In the next two chapters I will discuss how grammar has been deﬁned in models of language proﬁciency and will argue for a coherent model of grammatical ability – one that could be used for test development and test validation purposes.

3. The role of grammar in models of communicative language ability

Implicit in this discussion was the notion that knowledge of the L2 grammatical system can be demonstrated by a learner on some outcome measure, whatever form that might take, and that teaching can potentially inﬂuence the results obtained on this measure. From the results of these assessments, we can then make inferences about the students’ grammatical ability, which would subsequently provide an empirical basis for decision-making. For example, language teachers use test results to make decisions about student placement in a language program or about the degree to which their students have mastered the material in a course, and SLA researchers use test results to make decisions about whether young learners acquire grammatical features better than older learners.

a. The role of grammar in models of communicative competence
Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of deﬁning grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has signiﬁcantly increased the number of components that could be called ‘grammar’. In short, deﬁnitions of grammar and grammatical knowledge have changed over time and across context.

Given the central role that construct deﬁnition plays in test development and validation, my intention in this chapter has been to discuss the ‘what’ of grammar assessment. I have examined how grammar has been depicted in models of communicative language ability over the years, and have argued that for assessment purposes grammar should be clearly diﬀerentiated from pragmatics. Grammar should also be deﬁned to include a form and meaning component on both the sentence and discourse levels. I have also argued that meaning can be characterized as literal and intended. Also the pragmatic dimension of language constitutes an extrapolation of both the literal meaning and the speaker’s intended meaning, while using contextual information beyond what is expressed in grammatical forms. I have argued that pragmatic meanings may be simultaneously superimposed upon grammatical forms and their meanings (e.g., as in a joke).

In short, grammar should not be viewed solely in terms of linguistic form, but should also include the role that literal and intended meaning plays in providing resources for all types of communication. Although forms and meanings are highly related, it is important for testers to make distinctions among these components, when possible, so that assessments can be used to provide more precise information to users of test results. In the next chapter, I will use this model of grammar as a basis for deﬁning second or foreign language grammatical ability for assessment.

4. Towards a definition of grammatical ability

Given the central role that construct deﬁnition plays in test development and validation, my intention in this chapter has been to discuss the ‘what’ of grammatical knowledge invoked by grammar assessment. After describing grammatical constructs and deﬁning key terms in this book, I have proposed a theoretical model of grammatical ability that relates grammatical knowledge to pragmatic knowledge and that speciﬁes grammatical form and meaning on the sentence and discourse levels. I have provided operational descriptions of each part of the model along with examples that differentiate knowledge of grammatical form and meaning from knowledge of pragmatic meaning. This model aims to provide a broad theoretical basis for the deﬁnition of grammatical knowledge in creating and interpreting tests of grammatical ability in a variety of language-use settings. In the next chapter, I will discuss how this model can be used to design tasks that measure one or more components of grammatical ability.

5. Designing test tasks to measure L2 grammatical ability

Given the central role of task in the development of grammar tests, this chapter has addressed the notion of task and task speciﬁcation in the test development process. I discussed how task was originally conceptualized as a holistic method of eliciting performance and argued that the notion of task as a monolithic entity falls short of providing an adequate framework from which to specify tasks for the measurement of grammatical ability. I also argued that given the diversity of tasks that could emerge from real-life and instructional domains, a broad conceptualization of task is needed in grammatical assessment – one that could accommodate selected-response, limited-production and extended-production tasks.

For assessment, the process of operationalizing test constructs and the speciﬁcation of test tasks are extremely important. They provide a means of controlling what is being measured, what evidence needs to be observed to support the measurement claims, what speciﬁc features can be manipulated to elicit the evidence of performance, and ﬁnally how the performance should be scored. This process is equally important for language teachers, materials writers and SLA researchers since any variation in the individual task characteristics can potentially inﬂuence what is practiced in classrooms or elicited on language tests. In this chapter, I argued that in developing grammar tasks, we needed to strive to control, or at least understand, the eﬀects of these tasks in light of the inferences we make about examinees’ grammatical ability.

Finally, I described Bachman and Palmer’s (1996) framework for characterizing test tasks and showed how it could be used to characterize SL grammar tasks. This framework allows us to examine tasks that are currently in use, and more interestingly, it allows us to show how variations in task characteristics can be used to create new task types that might better serve our educational needs and goals. In the next chapter, I will discuss the process of constructing a grammar test consisting of several tasks.

6. Developing tests to measure L2 grammatical ability

What makes a grammar test ‘useful’?

Score-based inferences from grammar tests can be used to make a variety of decisions. For example, classroom teachers use these scores as a basis for making inferences about learning or achievement. These inferences can then serve to provide feedback for learning and instruction, assign grades, promote students to the next level, or even award a certificate. They can also be used to help teachers or administrators make decisions about instruction or the curriculum.

The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating togrammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts. It also provides teachers with information on how they might modify future instruction or fine-tune the curriculum. For example, feedback on an essay telling a student to review the passive voice would be formative in nature. Summative evaluation provides test stakeholders with an overall assessment of test-taker performance related to grammatical ability, typically at the end of a program of instruction. This is usually presented as a profile of one or more scores or as a single grade.

Score-based inferences from grammar tests can also be used to make, or contribute to, decisions about program placement. This information provides a basis for deciding how students might be placed into a level of a language program that best matches their knowledge base, or it might determine whether or not a student is eligible to be exempted from further L2 study. Finally, inferences about grammatical ability can make or contribute to other high-stakes decisions about an individual’s readiness for learning or promotion, their admission to a program of study, or their selection for a job.

Given the goals and uses of tests in general, and grammar tests in particular, it is fitting to ask how we might actually know if a test is, indeed, able to elicit scorable behaviors from which to make trustworthy and meaningful inferences about an individual’s ability. In other words, how do we know if a grammar test is ‘good’ or ‘useful’ for our particular context?

Many language testers (e.g., Harris, 1969; Lado, 1961) have addressed this question over the years. Most recently, Bachman and Palmer (1996) have proposed a framework of test usefulness by which all tests and test tasks can be judged, and which can inform test design, development and analysis. They consider a test ‘useful’ for any particular testing situation to the extent that it possesses a balance of the following six complementary qualities: reliability, construct validity, authenticity, interactiveness, impact and practicality. They further maintain that for a test to be ‘useful’, it needs to be developed with a specific purpose in mind, for a specific audience, and with reference to a specific target language use (TLU) domain.

Overview of grammar-test construction

Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development.

Stage 1: Design

The design stage of test development involves the accumulation of information and making initial decisions about the entire test process. In tests involving one class, this may be a relatively informal process; however, in tests involving wider audiences, such as a joint final exam or a placement test, the decisions about test development must be discussed and negotiated with several stakeholders. The outcome of the design stage is a design statement. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:
1. a description of the purpose(s) of the test,
2. a description of the TLU domains and task types,
3. a description of the test-takers,
4. a definition of the construct(s) to be measured,
5. a plan for evaluating test usefulness, and
6. a plan for dealing with resources.
Stage 2: Operationalization
The operationalization stage of grammar-test development describes how an entire test involving several grammar tasks is assembled, and how the individual tasks are specified, written and scored.
1. Specifying the scoring method
2. Scoring selected-response tasks
3. Scoring extended-production tasks
4. Using scoring rubrics
5. Grading

Stage 3: Test administration and analysis

The final stage in the process of developing grammar tests involves the administration of the test to individual students or small groups, andthen to a large group of examinees on a trial basis.

7. Illustrative tests of grammatical ability
The First Certificate in English Language Test (FCE)

Given the assessment purposes and the intended uses of the FCE, the FCE grammar assessments privilege construct validity, authenticity, interactiveness and impact. This is done by the way the construct of grammatical ability is defined. This is also done by the ways in which these abilities are tapped into, and the ways in which the task characteristics are likely to engage the examinee in using grammatical knowledge and other components of language ability in processing input to formulate responses. Finally, this is done by the way in which Cambridge ESOL has promoted public understanding of the FCE, its purpose and procedures, and has made available certain kinds of information on the test. These qualities may, however, have been stressed at the expense of reliability.

The Comprehensive English Language Test (CELT)

In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of nonnative speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to makechoices privileging reliability and practicality over other qualities of testusefulness. To maximize consistency of measurement, the authors used only selected-response task types throughout the test, allowing for minimal fluctuations in the scores due to characteristics of the test method. This allowed them to adopt ‘easy-to-administer’ and ‘easy-toscore’ procedures for maximum practicality and reliability. Reliability Illustrative tests of grammatical ability 201was also enhanced by pre-testing items with the goal of improving their psychometric characteristics.

Reliability might have been emphasized at the expense of other important test qualities, such as construct validity, authenticity, interactiveness and impact. For example, construct validity was severely compromised by the mismatch among the purpose of the test, the way the construct was defined and the types of tasks used to operationalize the constructs. In short, scores from discrete-point grammar tasks were used to make inferences about speaking ability rather than make interpretations about the test-takers’ explicit grammatical knowledge.

Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.

In all fairness, the CELT was a product of its time, when emphasis was on discrete-point testing and reliability, and when language testers were not yet discussing qualities of test usefulness in terms of authenticity, interactiveness and impact.

The Community English Program (CEP) Placement Test

Given the purposes and the intended uses of the CEP Placement Test, the grammar section privileges authenticity, construct validity, reliability and practicality. Similar to tasks in the instruction, the theme-based test tasks all support the same overarching theme presented from different perspectives. Then, the construct of grammatical knowledge is defined in terms of the grammar used to express the theme. Given the multiple-choice format and the piloting of items, reliability is an important concern. Finally, the multiple-choice format is used over a limited-production format to maximize practicality. This compromise is certainly emphasized at the expense of construct validity and authenticity (of task).

Nonetheless, grammatical ability is also measured in the writing and speaking parts of the CEP Placement Test. These sections privilege construct validity, reliability, authenticity and interactiveness. In these tasks, students are asked to use grammatical resources to write about and discuss the theme they have been learning about during the test. In boththe writing and speaking sections, grammatical ability is a separatelyscored part of the scoring rubric, and definitions of grammatical knowledge are derived from theory and from an examination of benchmark samples. Reliability is addressed by scoring all writing and speaking performance samples ‘blind’ by two raters. In terms of authenticity and interactiveness, these test sections seek to establish a strong correspondence between the test tasks and the type of tasks encountered in themebased language instruction – that is, examinees listen to texts in which the theme is presented, they learn new grammar and use it to express ideas related to the theme, they then read, write and speak about the theme. The writing and speaking sections require examinees to engage both language and topical knowledge to complete the tasks. In both cases, grammatical control and topical control are scored separately. Finally, while these test sections prioritize construct validity, reliability, authenticity and interactiveness, it is certainly at the expense of practicality and impact.

8. Learning-Oriented Assessments of Grammatical Ability

What is learning-oriented assessment of grammar?

Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one-shot approaches to assessment,whether they occur in large-scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992). Alternative assessments are scored by humans, not machines.

Similar to alternative assessment, authentic assessment stresses measurement practices which engage students’ knowledge and skills in ways similar to those one can observe while performing some real-life or ‘authentic’ task (O’Malley and Valdez-Pierce, 1996). It also encourages tasks that require students to perform some complex, extendedproduction activity, and emphasizes the need for assessment to be strictly aligned with classroom goals, curricula and instruction. Selfassessment is considered a key component of this approach.

Performance assessment refers to the evaluation of outcomes relevantto a domain of interest (e.g., grammatical ability), which are derived from the observation of students performing complex tasks that invoke realworld applications (Norris et al., 1998). As with most performance data, assessments are scored by human judges (Stiggins, 1987; Herman et al., 1992; Brown, 1998) according to a scoring rubric that describes what testtakers need to do in order to demonstrate knowledge or ability at a given performance level. Bachman (2002) characterized language performance assessment as typically: (1) involving more complex constructs than those measured in selected-response tasks; (2) utilizing more complex and authentic tasks; and (3) fostering greater interactions between the characteristics of the test-takers and the characteristics of the assessment tasks than in other types of assessments. Performanceassessment encourages self-assessment by making explicit the performance criteria in a scoring rubric. In this way, students can then use the criteria to evaluate their performance and contribute proactively to their own learning.

9. Challenges and new directions in assessing grammatical ability

Challenge 1: Defining grammatical ability

One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentricview of language rooted largely in linguistic structuralism. Moreover, models of language ability, such as those proposed by Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment concentrated on measuring language elements –defined in terms of morphosyntactic forms on the sentence level – while performing language skills. Grammatical knowledge was determinedsolely in terms of linguistic accuracy. This approach to testing led to examinations such at the CELT (Harris and Palmer, 1970a) and the English Proficiency Test battery (Davies, 1964).

Challenge 2: Scoring grammatical ability

A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures. For example, the use of dichotomous scoring, even with certain selectedresponse items, might need to give way to partial-credit scoring, since some wrong answers may reflect partial development either in form or meaning. As a result, language educators might need to adapt their scoring procedures to reflect the two dimensions of grammatical knowledge. This might, in turn, require the use of measurement models that can accommodate both dichotomous and partial-credit data in calculating and analyzing test scores. Then, in scoring extended-production tasks for both form and meaning, descriptors on scoring rubrics might need to be adapted to reflect graded performance in the two dimensions of grammatical knowledge more clearly. It should also be noted that more complex scoring procedures will impact the resources it takes to mark responses or to program machine-scoring devices. It will also require a closer examination (and hopefully ongoing research) of how a wrong answer may be a reflection of interlanguage development. However, successfully meeting these challenges could provide a more valid assessment of the test takers’ underlying grammatical ability.

Challenge 3: Assessing meanings

The third challenge revolves around ‘meaning’ and how ‘meaning’ in amodel of communicative language ability can be defined and assessed.The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language. Therefore, while the grammatical resources used to communicate these meanings precisely are important, the notion of meaning conveyance in the communicative curriculum is critical. Therefore, in order to test something as intangible as meaning in second or foreign language use, we need to define what it is we are testing.

Challenge 4: Reconsidering grammar-test tasks

The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching. Discrete-point testing methods may have even led some teachers to have reservations about testing grammar or to have uncertainties about how to test it communicatively.

Challenge 5: Assessing the development of grammatical ability

The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind. This notion stems from the work of several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987; Ellis, 2001b) who maintain that the principal finding from years of SLA research is that structures appear to be acquired in a fixed order and a fixed developmental sequence. Furthermore, instruction on forms in non-contiguous stages appears to be ineffective. As a result, the acquisitional development of learners, they argue, should be a major consideration in the L2 grammar testing.

References:
Purpura, james. 2004. ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.

Assignment : Language Assessment

Assignment Language Assessment, Pertemuan 15

Komentar

Posting Komentar

Postingan populer dari blog ini

Assignment 3 Analysis Practicallity , Validity and Reliability

Assignment 7 Standards Based Assessment

Assignment Language Assessment, pertemuan 14