State Standardized Testing: What Is It and Why All the Controversy?
Every spring students in public schools all around the country take some sort of standardized test, usually starting in the third grade. The test serves two main purposes: (1) to indicate whether a student has the skills and abilities to meet the established state learning standards, and (2) to determine whether schools and school districts are meeting state and national requirements for student achievement. So what’s controversial about that?
The ramifications of standardized testing results ramped up in 2002 when the Elementary and Secondary Education Act (a.k.a. No Child Left Behind) became law. Though many standardized tests, such as the California Achievement Test and the Comprehensive Test of Basic Skills, have been around for decades, the stakes immediately became higher; the amount of federal funding schools received was now tied to how well students performed on these tests. Many teachers, feeling the pressure for their students to achieve high scores, began teaching to the test. The time it takes to prepare for and administer these standardized tests can not only narrow the scope of classroom instruction but often upends routines and schedules and stresses out students and staff. There is so much hoopla around testing, including practice testing to predict results on the standardized test, that some parents are refusing to allow their children to participate at all.
Regardless of how people feel about it, standardized testing is unlikely to disappear. Many parents—and even educators—make judgments about these tests and the results without sufficient knowledge about what standardized testing is, what the results indicate about student achievement, and the care that goes into developing test items. Understanding more about assessment in general (standardized testing is a type of assessment) could help ease undue stress for parents and students and also help you make informed decisions about whether to have your child participate.
Types of Assessment
Assessments are learning barometers—indicating how well students understand lessons and concepts. Teachers use two types of assessments to evaluate student learning: formative and summative.
Formative assessments are used to make instructional decisions. They can be as informal as asking students to give a thumbs-up or a thumbs-down about whether they understand a concept. Formative assessments can also be more formal, like a written or oral quiz. Teachers give formative assessments all the time to gather information about how students are receiving lessons. These assessments help teachers plan the next steps for instruction, for example, whether to move on after a lesson is complete or provide remediation—or other enrichment opportunities.
Summative assessments are usually given at the end of a unit of study or grading period. They are intended to assess, in general, how much a student has learned in a given period of time. Midterms and finals are examples of summative tests. When summative assessments are combined with formative data, teachers, students, and parents can tell how a student is doing. However, achievement tests—predominately a form of summative assessment that evaluates what a student knows and is able to do—are how student learning is mostly judged, and these tests are given once a year to all students.
What’s in a Quality Achievement Test Item?
Summative, standards-based tests like the widely used Smarter Balanced assessments (measuring Common Core State Standards in mathematics and reading) are not easy to develop. There are three main interconnected considerations when writing a test question. The first is the content of the item. What standard is the item trying to measure? Perhaps the standard is a student’s ability to identify a fraction in its lowest terms.
The next consideration is the context of the item. How is the problem presented? For example, does the item list groups of fractions in a multiple-choice format, or does the problem require reading a paragraph that contains several blanks where the lowest-terms fractions need to be placed? Both require the student to identify lowest-terms fractions, yet only one requires more reading skills.
The third consideration is the test item’s level of cognitive demand, based on Bloom’s taxonomy. For example, the verb identify just requires a student to remember (the first level of cognition) what a lowest-term fraction is. Identifying a lowest-term fraction does not require the same thinking skills as reading a word problem and converting several fractions into their lowest terms.
The content, context, and cognitive demand of each test item need to align to give a valid measure of a learning standard. How does this happen? There are several steps involved in creating a test item, and different committees will often work on the various steps. Some of the major steps are as follows:
Specifications for content, context, and cognitive demand are developed according to the learning standards.
Test questions are written by one committee.
Another committee reviews for bias and fairness.
Another group reviews the content.
Test items are piloted (given a test run) with a group of students and then scored. Changes to the test items may be made depending on the results.
Test items are piloted again, followed by more rounds of review.
Test Scores: Part of the Picture
The items themselves are not the problem with assessment, as long as they go through this rigorous process and qualified professionals facilitate test development and assure validity. Many people, including educators, are often surprised to learn how much time (many months) and effort go into creating each test question. Better understanding the test development process and remembering that results are not the only measure of a student’s knowledge and abilities can alleviate much wasted energy in complaining about testing.
Test scores provide schools and districts valuable data about changes in academic performance from year to year as well as strengths and potential gaps in achievement. These scores can also provide an indication of whether students have developed mastery of a certain set of concepts and skills. However, the results of the annual spring standardized test are also often misunderstood and can lead to overreactions.
For example, way too much significance is given to subscores. In mathematics, there is an overall score and several subscores for particular categories like probability and statistics, geometry, or number sense. More than once over my teaching career, I heard administrators cite a low subscore as a reason to put more energy into a particular area of mathematics. For example, if they saw that only 42 percent of fifth-grade students met standard in the geometry section, suddenly everyone had to focus more instructional time on geometry, which meant some other topic area was eliminated. Administrators often did not take into consideration that there were only four questions that made up that subscore. Such a paltry number of questions doesn’t provide enough data upon which to base huge instructional decisions. While a low subscore does warrant further examination, it, alone, should not be the reason for making instructional changes.
It is important to remember that a test score is just one part of the picture of a student’s abilities. Portfolios of work over several weeks or a few months plus grades on projects combined with test scores provide a more complete overview of a student’s knowledge and skills. The best any test can offer is a snapshot of what students can do at a particular time. The test does not give information about why students achieved a certain score; thus, conclusions based only on those test scores can be misguided.
What do the Results Really Mean?
After a student takes the spring standardized test, parents receive a document that summarizes the results. The way that these test results are reported is informative but limited and sometimes confusing. There’s not a lot of context provided to explain what the scores mean. Talk with your child’s teachers about what the scores mean and ask if there is other evidence of your child meeting a particular standard or not. For example, if the student is approaching standard in measurement and data in mathematics, look at the work already completed to see if there is a match in achievement. Maybe there were only four problems addressing that standard on the test, not nearly enough information to make a judgment on a year’s worth of learning. Talking with your child and their teachers about specifics can be very helpful.
Educational success has come to mean graphs and charts showing high test results. But those charts won’t show how much a child has grown to love reading. They won’t show how a child has stumbled upon their calling to be an artist or a scientist. They won’t show how a child has learned to successfully manage conflict with peers. There’s so much learning that happens for students that is not assessed by the all-important spring test.
The misunderstanding of assessment data in a society that relies on quarterly profit margins to determine success puts an inordinate amount of importance on annual tests. But it’s wrong to treat children as if they are factory parts and to look only at short-term measures to judge teachers and schools. Children are unique individuals who grow at various rates. Many of us remember a time when we finally blossomed. The progression is not a straight line. How we as parents, educators, and community members react to student testing has a huge influence on students’ attitudes about learning. Do we want to create a punitive, rigid, test-obsessed atmosphere that stunts children’s innate desire to learn? Or do we want to foster a supportive, positive environment where test results are taken for exactly what they are—a data point among millions of others—on a long journey through education filled with milestones, setbacks, and progress? It’s up to us.