Testing, testing

April 1, 2005
Uldarico P. Datiles
Uldarico P. Datiles

Uldarico P. Datiles, vice president, instructional design and materials development for Education and Training Systems International, is a past recipient of the Fullbright-Hayes grant, the author of numerous articles in professional journals in the United States and abroad, and a consultant to the World Health Organization. For over 30 years he has taught, trained personnel, and planned and developed courses of study, instructional materials and technological programs for higher education and the pharmaceutical industry.

Frank B. Penta

Frank B. Penta is president of Chapel Hill, NC-based Education and Training Systems International, a developer of sales training instructional materials, and executive director of the Health Sciences Consortium, a medical publishing cooperative of 1,300 schools in 40 countries. He has written numerous articles for leading health sciences publications and has lectured and conducted seminars and faculty development workshops throughout the world as a consultant with the World Health Organization.

Pharmaceutical Representative

How to develop a valid assessment.

Assessmentis essential to training. The more accurately you assessyour trainees, the more effectively you can guide their learning.Regardless of instructional setting, assessment involves the systematicgathering of information (using a test, for example) used in makingdecisions about:

* How much and how well the trainees have learned.

* How effective the trainers have been in facilitating the training.

* How useful the training materials have been.

Trainers generally give two kinds of assessments: cognitive assessmentsand performance (or skill) assessments (such as role-plays). Cognitiveassessment (or the assessment of trainees' thinking and reasoningabilities) is commonly accomplished by using objective-type tests, suchas multiple-choice tests. In an objective-type test, the rules forscoring are so specific that different scorers who follow these ruleswill arrive at the same score. These tests can be designed to assessdifferent levels of learning outcomes, including knowledge,understanding and application outcomes:

* Knowledge refers to theability to recall previously learned content,such as facts, information or events.

* Understanding (orcomprehension) refers to the ability to explainconcepts, rules, theories or processes.

* Application refers to theability to use learned materials inanswering unfamiliar questions or in solving a new problem.

Characteristics of useful tests

A test designed for cognitive assessment must be reliable, valid andusable in order to yield results that are useful to trainers in makingdecisions.

Reliability. Test reliability refers to theconsistency of test results(for example, how often do the results replicate themselves withrepeated testing?). Reliability is a prerequisite for test validity.There are three methods for determining reliability:

* The "test-retest" method compares results of the same test overmultiple administrations.

* The "split-half" method randomly divides the test items equally (forexample, scoring the odd-numbered items and the even-numbered itemsseparately) and compares the results of the two halves.

* The "parallel-form" method correlates the results obtained from onetest with those obtained from an alternative or equivalent test.

The aim of these methods is to determine the correlation of resultsbetween two equivalent sets of test items. The higher the correlation,the more reliable the test is. Although many factors affect testreliability, test length appears to be the most critical. Longer teststend to be more reliable.

Validity. Test validity refers to the degreeto which the test measureswhat it is intended to measure. Tests must be valid in order to gatheraccurate information, and decisions based on test results are validonly to the extent that the test is valid.

Usability. Test usability refers to a numberof practicalconsiderations. Trainers can easily determine the usability of a testby asking three simple questions:

* Is the test easy to administer?

* Can the test be readily scored?

* Can the test results be easily analyzed and interpreted?

If the trainer fails to answer any of these questions in theaffirmative, the assessment may be compromised. Without these practicalconsiderations, even a completely reliable and highly valid test willhave little value in training.

Ensuring validity

Some practical pointers will help trainers ensure that their tests arevalid. These include using a test blueprint (or a table ofspecifications), test-item construction guidelines, clear and concisedirections, and an appropriate test-item grouping strategy.

Use a test blueprint. For atest to be effective, it must have theproper focus, include a representative sampling of the training contentand be congruent with the learning outcomes. The use of a testblueprint, such as the one shown in figure 1, can help you plan a validtest.

A test blueprint shows the relative emphasis given to certain contentareas and gives a sense of the overall coverage of the course contentin the test. In the example blueprint, the rows indicate the areas oftraining content being tested; the columns indicate the levels oflearning outcomes being measured. The number in each cell representsthe number of test items devoted to that content area and to thespecific learning outcome level. Thus, for instance, there are 13knowledge-based items devoted to anatomy/physiology in this 100-itemtest. The total number of items devoted to anatomy/physiology is 20 (or20% of the test). In this example, content areas are given equalemphasis on the test (20% for each area), but levels of learningoutcomes have different degrees of emphasis (50% knowledge, 30%understanding and 20% application).

Use test-item construction guidelines.As you begin to fill out yourblueprint, construct test items using established guidelines. Numeroustest item types can be used to measure the attainment of learningoutcomes, including true-false, matching, labeling, classifying,multiple choice, completion (short response) and essay. Essay tests andshort-response tests are called "supply-type," because the test takermust supply the answer; true-false, multiple choice and matching items,in contrast, are "selection-type" items, since the test taker canselect the correct response from a list of options. In general, themultiple-choice type is the most commonly used because it is relativelyobjective, decreases guessing and is appropriate for measuring a widevariety of behaviors, from simple to complex.

A multiple-choice test item consists of a stem (often in the form of aquestion) and three or four alternatives(options or choices); one ofthese is the correct response (or answer),and the others aredistractors. As their namesuggests, the function ofthese latter choices is to distract test takers who are uncertain ofthe answer, so the distractors need to be plausible options.

A good multiple-choice item should measure trainees' knowledge of thesubject without being readily answerable on some other basis. That iswhy it is so important to have plausible and well-crafted distractors.By plausible, we mean thatboth the correct and incorrect responsesappear equally attractive to uninformed trainees. If such trainees caneasily eliminate incorrect responses, the distractors are notfunctioning as they should. Some factors that can make distractorsimplausible include: unfamiliar words or concepts unrelated to thelesson, synonymous distractors in the same item, distractors notcompatible with the stem, and overlapping distractors. The purpose of atest item is to assess trainees' mastery of learning outcomes, ratherthan their common sense or reading abilities.

The following are some helpful guidelines for enhancing the validity ofmultiple-choice items.

* Match each item with an important learning outcome.

* State a clearly defined problem or proposition in the stem.

* Use simple, unambiguous language in the stem and options.

* If wordings are repeated in the options, place them in the stem.

* Make sure that the intended response is correct or the best answer.

* Use proper and consistent grammar.

* Avoid verbal clues that might give away the answer.

* Provide plausible and attractive distractors.

* Vary the length of correct responses.

* Do not overuse "all or nothing" options. Experienced test takers caneasily figure out that "all of the above" is often the correctresponse, and "none of the above" is rarely the correct answer.

* Vary the position of the correct response in a random fashion. Sometest takers will look for a recognizable pattern of answers (forexample: A, A, A, B, B, B; or A, B, C, D, A, B, C, D). Alphabetizingthe alternatives according to their initial letters is a good way toavoid recognizable patterns of answers.

* Make items independent of other items on the same test -- make surenone of the test items provides clues for answering other items in thesame test.

* Use a clear and efficient format.

* Avoid the use of determiners such as never, always, none or all.Because there is usually an exception to any rule, the use ofdeterminers in either the stem or options of the item makes the itemconfusing and, hence, invalid.

* Highlight or emphasize negative items. Test takers will be lookingfor the "right" response, and emphasizing negative words such as not orexcept will avoid confusion.

* Use negatively stated items sparingly.

Use clear and concise directions.When making a test, pay attention totest directions. Test directions should be clear and unambiguous. It isessential that the trainees know exactly what is expected of them. Yourtest directions should be explicit, so that the test takers will knowprecisely what they need to do to score well on the test.

Assemble test items appropriately.The most common way to assemble theitems into the test is to group them by item type; for example, keepthe true-false items together, separating them from themultiple-choice, short answer or essay items. Check your answer key tomake sure the answers are not grouped in a recognizable pattern.Sometimes it is advisable to arrange test items from easy to hard, thusallowing trainees to build confidence (and reduce test anxiety) earlyin the exam.

A good test can measure the trainees' knowledge, understanding andapplication outcomes. For a test to be effective, it must be reliable,valid and usable. Practical considerations -- such as using a testblueprint, constructing effective test items, giving clear directionsand assembling the test appropriately -- can ensure that yourassessmentprovides information useful for making decisions in the future.

Related Content: