Student Assessment and Testing
In the current debate about nation wide educational restructuring, perhaps no issue is more central to the concerns of equity than that of student assessment. We have a long history of using questionably relevant tests to sort children for differential educational opportunities. Awareness of how standardized testing shapes curriculum and teaching highlights the link between assessment and educational quality. Yet, there is no consensus about how educational reform is to be achieved or what the role of student assessment should be. Politically powerful advocates of "outcome based" education argue that high standards and a national system of testing will accomplish needed educational improvement. This view is reflected in the National Council on Education Standards and Testings proposed national system of examinations in five core subjects English, math, science, history and geography to be administered in grades 4, 8, and 12, and used to determine high school graduation, college admission and job placement (National Coalition of Advocates for Students [NCAS] 1993). However, advocates of equity in educational excellence (NCAS 1993; Tate 1993) insist that the role of student assessment can be a constructive one only if it is defined within the context of an education restructuring process that includes standards for equity in educational resources and processes that determine students' "real life" opportunities to learn.
We believe that neither excellence nor equity in education can be achieved as long as student assessment instruments, policies and practices limit opportunities to learn and narrow or dilute curricula and instruction. Both excellence and equity goals can, on the other hand, be served by assessments that help teachers to identify students' strengths as well as their needs and to determine the most appropriate and effective means of helping them to learn and grow.
- Standardized Testing and At Risk Students
- Alternatives
- Testing and Systemic Reform
- Criteria for Assessment Recommendations
- Conclusion
Standardized Testing and At Risk Students
Standardized tests have a disproportionate impact on students, teachers and curriculum in schools that serve low income and minority students (Mitchell 1992; Tate 1993). Some widely found effects that are of particular consequence for equity in education are reviewed briefly below.
Testing and Ability Grouping
Both tracking and homogeneous "ability grouping" decisions, especially common in urban schools, are made primarily on the basis of standardized test results. Homogeneous grouping has often resulted in defeating school desegregation efforts by substituting within school segregation of minority groups and is, in addition, itself an unsound pedagogical practice. Even within the same classroom, "high" ability students are taught and expected to learn different content than are "low" ability or "low interest" students (Brown 1993). Tracking and ability grouping are widespread and continue in spite of mounting evidence that is exposing "as fraudulent (or, at least, myopic) the claim that tracking is an appropriate response to differences in children's capacities and motivation" (Wheelock 1992). Even if standardized, norm referenced tests measured ability validly for all student groups (a claim that is widely contested), their use in sorting students for different educational opportunities is condemned even by the College Board in unequivocal terms:
A substantial share of U.S. schools engage in ability grouping or tracking of students beginning at the elementary and middle grade levels according to presumed ability levels. As a number of studies have shown, tracking almost always means that those pupils who need the most support to raise their performance levels get the least, while those who need it the least have it showered on them. The consequence is a two tiered system of education characterized by the following conditions.
- Poor and minority students underrepresented in college preparatory classes such as algebra and geometry and overrepresented in dead end classes such as consumer math and general math;
- Guidance counselors who automatically presume that poor and minority students have neither the capability nor the inclination to attend college, and who therefore fail to provide adequate information to those students about college prerequisites and financial aid options:
- Teachers who fail to provide the necessary encouragement and enrichment to minority and poor students because their expectaions of those students' success are low. (Educational Testing Service 1991)
Testing and Retention
Despite its known ineffectiveness, retention in grade is a common administrative response to students' failure to demonstrate mastery of a year's curriculum. Students rarely improve their achievement on the second round·except when they receive special instruction that does not merely repeat the same curriculum. Ascher (1990) writes: Since minority students are more likely than whites to test at the lower end of achievement test scores (as well as to be seen as more troublesome by teachers), they have retention rates three to four times higher than those of their white peers. Among blacks, males are particularly at risk for retention. Reporting on a data analysis performed by Cincinnati Public Schools, Ascher notes that students retained once had a 40 50 percent chance of becoming dropouts, those retained twice had a 60 70 percent chance, and those retained three times almost never graduated (Ascher 1990).
Testing and Curriculum
The pressure on school administrators, teachers and students to improve average school scores on norm referenced, short answer multiple choice tests has created a widespread tendency to ignore higher order skills (since the tests elicit facts) and to put classroom emphasis on preparing students to take tests, especially at the elementary level·and more especially in low income schools where drill has always been a more prevalent form of instruction than investigation has been, The pressures of standardized testing on curriculum have decreased instruction in science, writing, problem solving and analytical reasoning; they are felt from kindergarten, where the pressure is to teach quantifiable math and reading skills and to prepare children for an educational career of "bubble test" taking, to high school, where minimum competencies for graduation may also mark the upper limits of instruction. Sixty percent of early childhood educators recently surveyed reported that the pressure of year end standardized tests caused them to teach in ways that we harmful to their children (Ascher 1990).
Arizona's recent experience in attempting to use testing to reinforce high standards curricula dramatically highlights the inadequacy of test driven teaching. Arizona's researchers created a matrix and charted the items tested by the Iowa Test of Basic Skills (ITBS) and the frameworks, then charted the curricular framework items covered by the ITBS and TAP tests. While the curricular framework covered 100 percent of the ITBS and TAP items, only 26 to 30 percent of the curricular framework was assessed by the ITBS and the TAP. Using those standardized tests, Arizona could learn nothing about their students' mastery of 70 percent of their required school work (Mitchell 1992).
William Tate further suggests that low student assessments may say as much about curricula as they do about students, citing research that reveals that while African American children as a group consistently are outperformed by white children on national assessments of mathematics achievement, they are also less likely to take college preparatory mathematics courses than their white counterparts.
This relationship between exposure to higher level courses and mathematics achievement should not be shocking. In fact, one of the most powerful predictors of mathematics achievement is course taking.... For example, the National Assessment of Education Progress reveals the substantial increase in mathematical performance that is associated with students completing higher level mathematics courses. bate 1993)
Testing and College
Standardized tests play an important role in determining whether or not students completing their secondary education will have an opportunity to attend college, what colleges they will attend, and the nature and extent of financial support they will receive (American Association of Collegiate Registrars and Admission Officers 1986). Culture and gender bias in college admissions examinations stack the deck in favor of white, middle class males (Crouse and Trusheim 1988). This continues in spite of the fact that the most widely used college admissions tests are, themselves, poor predictors of students' success in college (Allina 1987; Clark and Grandy 1984). Phyllis Rosser (1992), in collaboration with the National Center for Fair and Open Testing (FairTest), report on the results of bias in college admissions testing:
The test publishers claim that their exams predict students' future academic performance. Yet. while females consistently earn higher grades in both high school and college. they receive lower grades on all these exams.Reliance on such biased exams markedly diminishes chances for women to:
- obtain millions of dollars in college tuition aid awarded by the National Merit Scholarship Corporation, and over 150 private companies, government agencies and foundations;
- gain admission to over 1,500 colleges and universities; and
- enter many special education programs reserved for "gifted and talented" high school students.
All these factors can contribute to a real dollar loss for women in later life as they get less prestigious jobs, earn less money, and have fewer leadership opportunities. Members of minority groups and those from economically disadvantaged backgrounds are further penalized by the gender, race/ethnic and class biases of these exams. (Emphases added)
Given the obstacles that unfair testing, placement and assessment raise for so many in elementary and secondary schools, it seems particularly unfair that if they overcome the obstacles and graduate from high school they will then face a selection process that denies them equal access to higher education and its lifetime social, cultural, and economic benefits.
Alternatives
New work in cognition makes clear that both teaching and testing could be structured to better prepare students for the complex thinking required by life. Since current political trends make it unlikely that the power of testing will decline in our society, or that testing will cease to drive instruction, it is especially important to reformulate assessments so that they can help alter schooling in ways that will effectively and appropriately educate individual students to meet their personal needs as well as those of society. (Asher 1990)A number of assessment approaches are currently being discussed and implemented as alternatives to the standardized, short answer multiple choice tests with which we are all so familiar. Whether referred to as "performance assessment," "situational testing," "authentic assessment, or "assessment in context," they identify a range of strategies that promotes instruction geared to complex thinking and problem solving. They provide both teachers and students with maximum feedback to demonstrate not only what they have learned about, but, more importantly, what they have learned to do.
The important distinction is between "assessment" and "test." A test is a single occasion, unidimensional, timed exercise, usually in multiple choice or short answer form. Assessment is an activity that can take many forms, can extend over time, and aims to capture the quality of a students work or of an educational program....(it is) a collection of ways to provide accurate information about what students know and are able to do or about the quality of educational programs. The collective assessments reflect the complexity of what is to be learned and do not distort its nature in the informadon gathering process. (Mitchell 1992)The principal forms of "authentic" or "performance" testing are portfolios, open ended| questions, observations and exhibitions, Portfolios, now used from kindergarten through grad uate school, are the best known (Mitchell 1992). They are collections of work actually done by the student, selected to demonstrate progress toward a stated aim. Their use in English/lan guage arts, creative writing and mathematics programs is widespread, and, in several states, portfolio assessments are being developed in science programs.
The new assessments call for tasks that differ dramatically from those usually employed in multiple choice examinations, especially those that are norm referenced. In standardized testing, an ill structured problem is considered unfair. However, using the open ended questions, situations to be observed and problems/situations for which resolution/understanding is to be exhibited for "performance" assessment, ill structured problems are intentionally devised. This enables each student to demonstrate mastery in his or her own way: a mastery that is considered more meaningful beyond the instructional setting·since most of the important problems that one faces in life are ill structured (Ascher 1990). Major differences between norm referenced, multiple choice tests and performance based assessments involve the extent to which performance based assessments encourage students to:
- Construct their responses rather than select a right answer;
- Solve a problem or work on a task using primary or authentic materials rather than prompts or passages taken out of context or devised specifically for the assessment;
- Apply basic and more complex skills in unison rather than in isolation, and pursue multiple approaches and solutions to a problem or task. (Simmons and Resnick 1993)
Problems with these approaches to assessment include:
- Difficulty in scoring: Both Mitchell (1992) and Ascher (1990) report from research and personal experience the difficulty in developing reliable quantitative measures for writing assignments and the need for training if examiners are to score portfolios with a high degree of agreement. Nevertheless, Mitchell and Ascher find that alternative assessments yield better information about student progress.
- Cost: Ascher (1990) argues that while such assessments are more expensive per pupil, testing need not be done as often as is done currently. Testing for accountability, in fact, can be done by sampling student populations, which would keep mandated testing costs within tolerable bounds. Mitchell (1992) argues that reducing the amount and frequency of testing will free time for instruction, and that properly designed assessments are, themselves, instructional tools·both of which considerations shift part of assessment costs into the "instructional cost" side of the ledger.
- Fairness: The National Coalition of Advocates for Students' concern about the historic use of testing to discriminate against children of the poor and of minorities is reflected in their caution against relying on any test/assessment in the absence of equitable resource and process restructuring:
Nor are we captivated by claims made for a largely unproven set of "authentic" or "performance based" tests. As the National Council's (The National Council on Education Standards and Testing, chaired by Colorado's Governor Roy Romer) own panel concluded, we lack evidence that these experimental tests can be widely deployed at a reasonable cost or that they will be fairer than traditional tests for at risk students·especially when high stakes are attached to them. (NCAS 1993)
Testing and Systemic Reform
Those who support national content standards and performance assessment as necessary foundations for school reform hold that systemic change cannot be accomplished without first defining what we want to achieve (specific content or subject standards) and have in hand accurate performance based assessments that will measure the extent to which the content/performance standards have been met, By creating universal standards, the belief that all children can reach them is implicit, Such standards, therefore, would by themselves undermine the tracked programs that hold poor and minority students to lower standards. Authentic, performance based assessment would accomplish curriculum and assessment alignment and would do away with multiple choice testing that fractures knowledge and leaves students to deal with the bits and pieces outside of context, Multiple choice tests would no longer drive curriculum and instruction, Students could be taught complex, high order skills in real learning contexts and testing would allow them to perform tasks that mirror real life performance in authentic settings.
Simmons and Resnick (1993) point out that the examination component of performance standards will be useless without teachers, content specialists and other educators who have a firm understanding of how to construct and apply the examination system to improve curriculum and instruction and·most importantly·student performance. Today, there is a severe shortage of educators with this needed expertise. Therefore, in addition to building testing and assessment hardware, we must also create a professional development system to transform the way that educators view teaching, learning and assessment.
Equity advocates insist on the unfairness of assessing/testing students to a common standard while exposing them to different learning experiences, William Tate links curricular inadequacy and curricular reform to the realities of funding. Noting the new vision of mathematics education called for by the National Council of Teachers of Mathematics, Tate (1993) writes:
This vision will require urban schools to reallocate current i unding sources and/or seek additional funding to incorporate a new assessment policy; to improve teachers' mathematics qualifications; possibly to decrease class sizes; to update instructional materials (such as textbooks, science laboratories, and computer capabilities); and to enhance the quality of many other resource inputs. Each of these inputs will require a funding source. This implies that preparing students for a new policy (i.e., national assessment) has important connections to issues of fiscal equity for urban schools.
Fiscal equity for urban schools is one of the United States' most critical dilemmas.... The additional resources required by a policy such as the national mathematics assessment will increase the burden on the already fiscally stressed systems of urban education. Thus, mathematics assessment, local properly assessment (i.e., property taxes), and state funding become linked in a struggle to achieve social and educational equity.
We see, therefore, curriculum based performance assessment as an element of systemic change·but it is only one element, Other questions must be addressed simultaneously if content and performance standards are to improve education for all students. Other critical questions include:
- Will the curriculum that is being assessed be high quality, multicultural and interdiscilplinary?
- Since higher standards and authentic assessments will change both what is taught and how it is taught, how will teachers be taught the new contents?
- Are the funding and mechanisms for teacher training available and in place?
Content standards and performance assessment will prove irrelevant to improved education for an unacceptably large percentage of today's students if:
- Students do not have access to quality programs because of inequitable school funding or because their schools continue current tracking and ability grouping practices;
- Students enter school unprepared because of poverty or deprivation, health or nutritional deficits, or unstable and violent home or community backgrounds,
Changing the way we assess or test students will only get us what we already have unless we first change the opportunities that we provide poor and minority students to learn, Currently, those students are rarely provided real opportunities to meet the standards that already exist·let alone new, high standards. If reform stops at setting content and performance standards, the same children who have been left out of the reforms of the past will be left out of today's. The National Coalition of Advocates for Students (1993) outlines some real consequences of national outcomes standards unaccompanied by equitable restructuring of our education system:
- Low income and minority students will face proposed examinations with no proof that their teachers are qualified to teach them the skills they will need;
- All of our children will be required to be "Number One" in science·including those who attend low income schools that have no science labs;
- Our children will be required to outperform German children who have universal access to early childhood education and health services that massive numbers of our low income children do without;
- Our children will be held hostage to a single "world class" standard without regard for the reality that they attend schools characterized by "savage inequalities" of resources and environment, and that sort them by group identity for exposure to radically different curricular content, teaching methods and expectations, counseling practices and personal treatment.
At a minimum, students must be taught a curriculum that will prepare them for high standards assessments. Their teachers must have the expertise needed to teach the curriculum, and there must be an equitable distribution of the resources students and teachers each need to succeed. In our tracked schooling programs, which begin with elementary reading, children in poor and minority communities are held to lower standards than the rest of the population (Simmons and Resnick 1993). New standards without concern for equity will simply perpetuate old results.
Criteria for Assessment Recommendations
We recommend that any national, state or local student assessment standard or system meet the following Criteria for Evaluation of Student Assessment Systems, which has been endorsed by more than 100 national civil rights, education and advocacy organizations. Criteria...was created by FairTest, which, with the Council for Basic Education, co chairs the National Forum on Assessment.
- Educational standards specifying what students should know and be able to da should be clearly defined before assessment procedures and exercises are developed. For assessment information to be valid and useful, assessment must be based on a consensus definition of what students are expected to learn, and the expected level of performance, at various developmental stages. Such standards, which might also be called intellectual competencies, are not discrete pieces of information or isolated skills, but important abilities, such as the ability to solve various kinds of problems or to apply knowledge appropriately.
The standards should be determined through open discussion among subject matte' experts, educators, parents, policymakers, and others, including those concerned with the relationship between school learning and life outside school. Without a consensus on standards, there is little likelihood of valid assessment.
- The primary purpose of the assessment systems should be to assist both educators and policy makers to improve education and advance student learning. Students educators, parents, policymakers and others have different needs for assessments and different uses for assessment information. For example, teachers, students and their parents want information on individual achievements, while policymakers and the public want information for accountability purposes. In all cases, the system should be designed to provide not just numbers or ratings, but useful information on the particular abilities students have or have not developed.
All purposes and uses of assessment should be beneficial to students. For example, the results should be used to overcome systemic inequalities. If assessments cannot be shown to be beneficial, they should not he used at all.
- Assessment standards, tasks, procedures, and uses should be fair to all students. Because individual assessment results often affect students' present situation and future opportunities, the assessment system, the standards on which it is based, and all its parts must treat students equally. Assessment tasks and procedures must be sensitive to cultural, racial, class and gender differences, and to disabilities, and must be valid for and not penalize any groups. To ensure fairness, students should have multiple opportunities to meet standards and should be able to meet them in different ways. No student's fate should depend upon a single test score.
Assessment information should also be used fairly. It should be accompanied by information about access to curriculum and about opportunities to meet the standards. Students should not be held responsible for inequities in the system.
- The assessment exercises or tasks should be valid and appropriate representations of the standards students are expected to achieve. A sound assessment system provides information about a full range of knowledge and abilities considered valuable and important for students to learn, and therefore requires a variety of assessment methods. Multiple choice tests, the type of assessment most commonly used at present, are inadequate to measure many of the most important educational outcomes, and do not allow for diversity in learning styles or cultural differences. More appropriate tools include portfolios, open ended questions, extended reading and writing experiences which include rough drafts and revisions, individual and group projects, and exhibitions.
- Assessment results should be reported in the context of other relevant information. Information about student performance should be one part of a system of multiple indicators of the quality of education. Multiple indicators permit educators and policymakers to examine the relationship among context factors (such as type of community, socioeconomic status of students, and school climate), resources (such as expenditures per students, plant, staffing, and money for materials and equipment), programs and processes (such as curriculum, instructional methods, class size, and grouping), and outcomes (such as student performance, dropout rates, employment, and further education). Statements about educational quality should not be made without reference to this information.
- Teachers should be Involved in designing and using the assessment system. For an assessment system to help improve learning outcomes, teachers must fully understand its purposes and procedures and must be committed to, and use, the standards on which it is based. Therefore teachers should participate in the design, administration, scoring and use of assessment tasks and exercises.
- Assessment procedures and results should be understandable. Assessment information should be in a form that is useful to those who need it students, teachers, parents, legislators, employers, postsecondary institutions, and the general public. At present, test results are often reported in technical terms that are confusing and misleading, such as grade level equivalents, stanines, and percentiles. Instead, they should be reported in terms of educational standards.
- The assessment system should be subject to continuous review and improvement. Large-scale, complex systems are rarely perfect, and even well designed systems must be modified to adapt to changing conditions. Plans for the assessment system should provide for a continuing review process in which all concerned participate.
Conclusion
The nation's history of using tests to sort children for differential educational opportunities is a long one. It is time for schools, local education agencies, and state and federal governments to ensure that no system of testing or student assessment be used except in the context of educational approaches that are based on standards for equity in educational resources and processes. Biased assessment instruments, policies and practices must not be allowed to limit opportunities to learn and narrow or dilute curricula and instruction. Unless preceded by an equitable restructuring of educational resources and processes, testing to meet National Student Outcomes Standards will leave students vulnerable to the discriminatory educational practices that deny 40 percent of students a meaningful opportunity to learn.
More than 100 national civil rights, education and advocacy organizations have endorsed the Criteria for Evaluation of Student Assessment Systems presented above. By adopting these criteria as their basis for student assessment standards, states could ensure that student assessments create tools for·rather than barriers to·educational opportunity for all students.
<< Table of Contents >>