Islamic Azad University, Khoy, Iran
The term alternative assessment, and particular testing practices associated with it, have recently come into vogue in language testing. The movement is directed at establishing qualitative, more democratic, and task-based methods of evaluation in testing a learner’s language proficiency (Brown and Hudson 1998), (Aschbacher 1991), (Herman, Aschbacher, and Winters 1992), (Huerta-Macías 1995). It contrasts with traditional methods of testing by involving the learners in the evaluation process, and having the tendency to locate evaluation in a real-life context and, as result of these two features, being longitudinal. Thus, the insights emanating from these methods, alongside being used for decision-making about the future of learners, contribute to and furnish additional instructional purposes. As McNamara (2000) points out:
“This approach stresses the need for assessment to be integrated with the goals of the curriculum and to have a constructive relationship with teaching and learning”.
The procedures used within this paradigm include checklists, journals, logs, videotapes and audiotapes, self-evaluation, teacher observations, portfolios, conferences, diaries, self-assessments and peer-assessments (Brown and Hudson 1998). These procedures have been diversely called alternative or performance assessment as opposed to traditional assessment techniques such as multiple choice, cloze test, dictation, etc.
While the new movement promises more humanistic and rewarding methods of testing and thus has a lot to offer, most teachers are not quite familiar with the new concepts and practices within the emerging paradigm. To enlighten the views of interested teachers, it can be a good start to answer a basic question about the so-called alternative methods of testing which may have occupied their minds. This question is concerned with the relationship of these other methods with the traditional methods normally used within classrooms. Or to put the question another way, how can we place both traditional and alternative assessment methods in perspective to get a panoramic view of both in the pieced together jigsaw of language testing? To this purpose, it seems necessary to draw on the concepts of testing, measurement and evaluation.
Evaluation, Measurement and Testing
Bachman (1990), quoting Weiss (1972) defines evaluation as “the systematic gathering of information for the purpose of making decisions”. Lynch (2001) adds the fact that this decision or judgment is to be about individuals. In this conceptualization, both authors agree that evaluation is the superordinate term in relation to both measurement and testing. Assessment is sometimes used interchangeably for evaluation. The systematic information can take many forms, but these forms are either quantitative or qualitative. This is what distinguishes measures from qualitative descriptions.
Measurement is thus concerned with quantification. Language proficiency, like many other constructs and characteristics of persons in social sciences, needs to be quantified before any judgments can be made about it. This process of quantifying is called operationalization in research by which we mean assigning numbers according to observable operations and explicit procedures or rules to measure a construct (Bachman 1990) (Ary et al. 1996)
The third component in this model is testing, which consists of the use of actual tests to elicit the desired behavior. Carroll (1968) defines a test as:
“A psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual”.
Bachman (1990) observes that a test is one type of measurement instrument, and thus necessarily quantifies characteristics of individuals according to explicit procedures. Bachman (1990), then, concludes that there are other types of measurement than tests, and the difference is that a test is designed to obtain a specific sample of behavior.
For the purpose of schematic representation, the three concepts of evaluation, measurement and testing have traditionally been demonstrated in three concentric circles of varying sizes. This is what Lynch (2001) has followed in depicting the relationship among these concepts.
Figure 1- Assessment, measurement and testing adopted from Lynch (2001)
The purpose of this representation is to show the relationship between superordinate and subordinate concepts and the area of overlap between them. Thus, evaluation includes measurement when decisions are made on the basis of information from quantitative methods. And measurement includes testing when decision-making is done through the use of “a specific sample of behavior” (Bachman 1990). However, the process of decision-making is by no means restricted to the use of quantitative methods as the area not covered by measurement circle shows. Also, tests are not the only means to measure individuals’ characteristics as there are other types of measurement than tests, for example, measuring an individual’s language proficiency by living with him for a long time.
Bachman (1990) has represented the relationship in a somewhat different way. The goal has been to extend the model to include not only language testing but also language teaching, language learning and language research domains. Figure 2 depicts this extended view of the relationship among evaluation, measurement and testing. The areas numbered from 1 to 5 show the various forms of this relationship.
Figure 2- Assessment, measurement and testing adopted from Bachman (1990)
Area 1- Evaluation not involving either tests or measures; for example, the use of qualitative descriptions of student performance for diagnosing learning problems.
Area 2- A non-test measure for evaluation; for example, teacher ranking used for assigning grades.
Area 3- A test used for purposes of evaluation; for example, the use of an achievement test to determine student progress.
Area 4- Non-evaluative use of tests and measures for research purposes; for example, the use of a proficiency test as a criterion in second language acquisition research.
Area 5- Non-evaluative non-test; for example, assigning code numbers to subjects in second language research according to native language.
After reviewing the conceptualizations and schematic representations proposed by Bachman (1990) and Lynch (2001), an attempt will be made to more clearly locate alternative assessment methods in relation to traditional testing methods in order to help language teachers to make intelligent and insightful choices to assess their students. Some points are notable about the adapted model. First, despite Bachman’s model, language research purposes are not dealt with in it. This is because language teachers’ immediate needs do not concern the use of tests or assessment procedures for research purposes. Rather, they need to enhance their assessment choices to arrive at a sounder judgment about their students. Secondly, all assessment procedures either traditional or alternative furnish the function of decision-making and are all subordinated under the term evaluation. Thus, it would be much better to deal with them as alternatives in assessment (Brown and Hudson 1998) – available choices for the language teacher – rather than labeling some of them as normal and others as eccentric. Such a distinction makes the new developments inaccessible only because they are told to be so, hence our use of more descriptive terms instead of labels which evoke vague feelings. We have to notice the fact that all alternatives in assessment have to meet their respective requirements for reliability and validity to make teachers able to come to sound judgments (Lynch 2001).
Figure 3- Alternatives in Assessment; decision-making in educational settings
As Figure 3 shows, tests constitute only a small set of options, among a wide range of other options, for a language teacher to make decisions about students. The judgment emanating from a test is not necessarily more valid or reliable from the one deriving from qualitative procedures since both should meet reliability or validity criteria to be considered as informed decisions. The area circumscribed within quantitative decision-making is relatively small and represents a specific choice made by the teacher at a particular time in the course while the vast area outside which covers all non-measurement qualitative assessment procedures represents the wider range of procedures and their general nature. This means that the qualitative approaches which result in descriptions of individuals, as contrasted to quantitative approaches which result in numbers, can go hand in hand with the teaching and learning experiences in the class and they can reveal more subtle shades of students’ proficiency. This in turn can lead to more illuminating insight about future progress and attainment of goals. However, the options discussed above are not a matter of either-or (traditional vs. alternative assessment) rather the language teacher is free to choose the one alternative (among alternatives in assessment) which best suits the particular moment in his particular class for particular students.
Ary, D., Jacobs, L. C. and Razavieh, A. (1996). Introduction to Research in Education. New York: Harcourt Brace College Publishers.
Aschbacher, P. A. (1991). Performance assessment: State activity, interest, and concerns. Applied Measurement in Education, 4(4), 275-288.
Bachman, L.F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.
Bachman, L.F. and Palmer, A.S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests. Oxford: Oxford University Press.
Brown, J.D. and Hudson, T. (1998). The alternatives in language assessment. TESOL Quarterly 32, 653–75.
Carroll, J. B. (1968). ‘The psychology of language testing’ in Davies, A. (ed.) Language Testing Symposium. A Psycholinguistic Perspective. London: Oxford University Press. 46-69.
Herman, J.L., Aschbacher, P.R., & Winters, L. (1992). A practical guide to alternative assessment. Alexandria, VA: Association for Supervision and Curriculum Development.
Huerta-Macias, A. (1995). Alternative assessment: Responses to commonly asked questions. TESOL Journal 5, 8–11.
Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing 18 (4) 351–372.
McNamara, T. (2000). Language Testing. Oxford: Oxford University Press.
McNamara, T. (2001). Rethinking alternative assessment. Language Testing 18 (4) 329–332.
Shohamy, E. (2001). Democratic assessment as an alternative. Language Testing 18 (4) 373–391.
Weiss, C. H. (1972). Evaluation Research: Methods for Assessing Program Effectiveness. Englewood Cliffs, NJ: Prentice-Hall.
Ali Bastanfar is a lecturer in TEFL at Islamic Azad University-Khoy Branch, Khoy, Iran. He is also doing his Ph.D. in TEFL. He has more than ten years' experience of teaching English to foreign learners at various levels. His teaching experience with TEFL students at University includes teaching a vast array of courses for pre-service teachers. He has made presentations at several conferences including MICELT conference in Malaysia 2006, 4th Asia TEFL conference in Japan 2006, 5th and 6th Asia TEFL conferences in Malaysia 2007 and Indonesia 2008. He has also published articles in various journals. His major research interests are materials development, reading, testing, and learning strategies. firstname.lastname@example.org
© Ali Bastanfar 2009. All rights reserved.