Constructing Balanced Classroom Tests

This file should be read in conjunction with the discussion of alternative assessment. Pencil-and-paper classroom tests should be only one component of assessment strategies.

The principle of balance

Any classroom test is effectively a random sample of what students know and don't know, can and cannot do. It is limited in its reliability and validity by the fact that it only gives information about what students did under the test conditions at the time the test was taken, and may not generalize to tell us what the student might do under other conditions or at another time. It also tells us only about a very small subset of all the things students are expected to learn from the curriculum. We need to try to make sure that that subset is properly representative of the goals of the curriculum and is not a biased sample.

Many classroom tests are biased, or unbalanced, in one or more of the following ways:

I recommend that classroom tests aim to be balanced in each of these ways:

Teacher judgment is required to decide what proportions of these various possibilities provide the most reasonably balanced test for a particular curriculum and population of students.

Some general guidelines:

Types of reasoning: keep pure memory or recall items to no more than 50% of the test (by time, by number of items, and/or by number of points); less than 50% for more able groups of students

Difficulty: for average classes, about 15-20% easier and more basic questions that you believe all students should know the answer to and estimate that at least 60% will get correct; about 10-15% difficult questions to challenge the most able students (you estimate that only about 10-15% of the class will get these correct); the rest of average difficulty (you estimate that 30-60% of the class will get them correct).

Item formats: no more than 50% multiple-choice items; at least one visual-graphical item per test, more if possible; at least two items in which students must write out a full sentence, or more, response and/or show their work; the rest as short-answer or problem-solving or other types.

Secrets of writing effective multiple-choice items

TERMS: the prompt is what precedes the answers or response choices; incorrect choices are called distractors.

DO NOT ASSUME THAT BECAUSE AN ITEM APPEARS IN A STANDARDIZED TEST THAT IT IS EFFECTIVE FOR USE ON A CLASSROOM TEST!!

Standardized tests contain many more items and are based on statistical criteria of effectiveness that do not apply to classroom tests. They also contain many items which are deliberately chosen to give the test desirable statistical properties (e.g. to raise or lower the mean, variance, degree of inter-item correlation, etc.) but which make the items unreliable for classroom use.

Remember the GOLDEN RULE of classroom test item construction:

There should be no other reason for a student to get an item incorrect EXCEPT not understanding the curriculum content that item is designed to test.

NEVER include 'trick' items, except on practice tests designed to get students used to seeing such items on standardized tests. Items with multiple negators and responses that depend on combining other responses tend to confuse students unnecessarily. So do items with complex wording or unfamiliar, nontechnical vocabulary. If you want to build students' reading skills, do so gradually, and provide quizzes where students can practice responding to more complexly worded items. These items will reduce the effectiveness of your testing of curriculum content if included in regular classroom tests.