On Tests and Measurements

53
On Tests and Assessments Marty Ian Gideon Flores, B. Sc. Psych Department of Psychology Urdaneta City University Department of Education Pangasinan II Special Education Division June 26, 2014

Transcript of On Tests and Measurements

On Tests and AssessmentsMarty Ian Gideon Flores, B. Sc. Psych

Department of PsychologyUrdaneta City University

Department of EducationPangasinan IISpecial Education DivisionJune 26, 2014

At the end of the lecture, the participants can:

Differentiate measurement from evaluation, and testing from assessment;

Compare and contrast formative tests and summative tests;

Appreciate the science behind test construction; and

Apply some of the skills discussed in constructing their own sound and fair tests.

Learning Goals

Definition of terms Formative vs Summative test

Test Construction 101 Some types of alternative assessments

Final thoughts

What lies ahead?

Basics of Assessments

Standardized TestTests given, usually nationwide, under uniform conditions and scored according to uniform procedure.

Classroom AssessmentsClassroom assessments are selected and created by teachers and can take many different forms- unit tests, essays, portfolios, projects, performances, oral presentations, etc.

Basics of Assessments

Measurement An evaluation expressed in quantitative (numbers) terms.

Assessment Procedures used to obtain information about student performance.

Formative Assessment Ungraded testing used before or during instruction to aid in planning and diagnosis.

Pretest Formative test for assessing students’ knowledge, readiness, and abilities.

Summative Assessment Testing that follows instruction and assesses achievement.

Functions of Assessments

1. The identification by teachers & learners of learning goals, intentions or outcomes and criteria for achieving these.

2. Rich conversations between teachers & students that continually build and go deeper.

3. The provision of effective, timely feedback to enable students to advance their learning.

4. The active involvement of students in their own learning.

5. Teachers responding to identified learning needs and strengths by modifying their teaching approach(es).

Black & Wiliam, 1998

Key Elements of Formative Assessment

Summative Assessment Assessment of learning Generally taken by students at the end of a unit or semester to demonstrate the "sum" of what they have or have not learned.

Summative assessment methods are the most traditional way of evaluating student work.

"Good summative assessments--tests and other graded evaluations--must be demonstrably reliable, valid, and free of bias" (Angelo and Cross, 1993).

Formative ‘… often means no more than that the assessment is carried out frequently and is planned at the same time as teaching.’ (Black and Wiliam, 1999)

‘… provides feedback which leads to students recognizing the (learning) gap and closing it … it is forward looking …’ (Harlen, 1998)

‘ … includes both feedback and self-monitoring.’ (Sadler, 1989)

‘… is used essentially to feed back into the teaching and learning process.’ (Tunstall and Gipps, 1996)

Summative‘…assessment (that) has increasingly been used to sum up learning…’(Black and Wiliam, 1999)

‘… looks at past achievements … adds procedures or tests to existing work ... involves only marking and feedback grades to student … is separated from teaching … is carried out at intervals when achievement has to be summarized and reported.’ (Harlen, 1998)

If we think of our children as plants …

Summative assessment of the plants is the process of simply measuring them. It might be interesting to compare and analyze measurements but, in themselves, these do not affect the growth of the plants.

Formative assessment, on the other hand, is the equivalent of feeding and watering the plants appropriate to their needs - directly affecting their growth.

The Garden Analogy

Factors Inhibiting Assessment A tendency for teachers to assess quantity and presentation of work rather than quality of learning.

Greater attention given to marking and grading, much of it tending to lower self esteem of students, rather than providing advice for improvement.

A strong emphasis on comparing students with each other, which demoralizes the less successful learners.

Self-evaluationWhere would you place your assessment practice on the

following continuum?

The main focus is on:Quantity of work/Presentation

Quality of learning

Marking/Grading

Comparing students

Advice for improvement

Identifying individual progress

Implications for classroom practice

Share learning goals with students. Involve students in self-assessment. Provide feedback that helps students recognize their next steps and how to take them.

Be confident that every student can improve.

Table 1.Using Tests to Make Instructional DecisionsDecision Category Typical Assessment

StrategyDecision Options

What to teach in the first place?

How long to keep teaching toward a particular instructional objective?

How effective was an instructional sequence?

Preassessment before instruction

En route assessments of students’ progress

Comparing students’ posttest to pretest performances

Whether to provide instruction for specific objectives?

Whether to continue or cease instruction for an objective either for an individual or for the whole class?

Whether to retain, discard, or modify a given instructional sequence the next time its’ used?

“ Any teacher who uses tests dominantly to determine whether students get high or low grades should receive a solid F in classroom assessment. Tests and all assessments should be used to help teachers make better instructional decisions.”- Popham, 2008.

Always remember:

STOP &THINKThink back to your most recent test. What was the format? Did you feel that the test results were an accurate reflection of your knowledge or skills? Have you ever had to design a test? What makes a good, fair test?

Classroom Assessment: Testing

A learning target consists of what students should know, be able to do, and feel.

Creating Clear, Appropriate Learning Targets

Test items with an objective format in which responses can be scored on quick inspection. A scoring guide for correct responses is created and can be used by an examiner or computer.

Examples: True/False, Multiple Choice, and Matching items

Traditional Tests: Selected-Response Items

Asks a student to mark whether a statement is true or false—for example,◦ Aspects of experience can sculpt the features of brain structure. True False?

Common drawback is that teachers sometimes take statements directly from a text or modify them slightly. Avoid this practice because it tends to encourage rote memorization with little understanding of the material.

On True/False Items

The item is useful for outcomes where there are only two possible alternatives (e.g., fact or opinion, valid or invalid).

Less demand is placed on reading ability than in multiple-choice items.

A relatively large number of items can be answered in a typical testing period.

Scoring is easy, objective, and reliable.

On True/False Items: Strengths

It is difficult to write items at a higher level of knowledge and thinking that are free from ambiguity.

When a statement indicates correctly that a statement is false, that response provides no evidence that the student knows what is correct.

No diagnostic information is provided by the incorrect answers.

Scores are more influenced by guessing than with any other item type.

On True/False Items: Weaknesses

Use only one key idea in each statement. Two or more ideas tend to confuse students.◦ Example:

Hypothalamus is responsible for our basic survival motives.

Hypothalamus, a forebrain structure near the base of the brain, is responsible for our basic survival motives.

Keep the statement short, and use simple vocabulary and sentence structure.

On True/False Items: Writing

Use words precisely that the statement can clearly judged as true or false. No possible and might. Avoid vague terms such as seldom, frequently, and often.◦ Example:

There are 38 people who witnessed the Genovese murder.

Many people witnessed the Genovese murder. Use negatives sparingly and avoid double negatives.◦ Example:

In the presence of high heat, oxygen bonds readily with hydrogen.

In the presence of high heat, oxygen is not unlikely to bond with hydrogen.

On True/False Items: Writing

Avoid extraneous clues to the answer. Avoid absolute words such as always, never, all, none, and only because they often signals “FALSE” as an answer. On the other hand, qualifier words such as usually, might, and sometimes tend to be true.◦ Example:

Corazon Aquino made important contributions in Philippines democracy.

Corazon Aquino never made important contributions in Philippines democracy.

On True/False Items: Writing

Why USE Multiple-Choice Items Most widely used Measure simple learning outcomes Measure complex learning outcomes (knowledge, understanding, and application)

Flexible, high quality items adaptable to most subject-matter content

Used extensively in achievement testing

Characteristics of Multiple-Choice Items Consists of a problem (stem) and a list of suggested solutions (alternatives, choices, or options)

Answers other than the correct answer are called distracters (decoys or foils)

Items can be stated in two ways. 1) Direct questions a) easier to write b) more natural for younger students c) present a clearly formatted problem

2) Incomplete sentences a) more concise b) present a well defined problem if phrased well

Correct Answer Type and Best Answer Type The correct answer type has only one possible

correct answer (recall factual information).

The best answer type measures learning outcomes that require the understanding, application, or

interpretation of factual information (measures more complex learning and is more difficult).

When dealing with the best answer variety, make sure your best answers are those that are agreed on by experts. This will allow you to defend your answers as the best possible choice.

USES OF MULTIPLE-CHOICE ITEMSMeasuring Knowledge Outcomes 1) Knowledge of Terminology 2) Knowledge of Specific Facts 3) Knowledge of Principles 4) Knowledge of Methods and Procedures

Measuring Outcomes at the Understanding

and Application Levels 1) Ability to Identify Application of Fact and Principles

2) Ability to Interpret Cause-and-Effect Relationships

3) Ability to Justify Methods and Procedures

Advantages and Limitations of Multiple-Choice ItemsAdvantages Measures achievement and complex learning outcomes. Structure of alternatives eliminate vagueness and ambiguity

Knowledge of content area is measured without concern for spelling errors

Multiple-choice requires students to choose the correct or best answer while true-false tests allow students to get credit for knowing a statement is not correct.

Multiple-choice items have a greater reliability than true-false

Multiple-choice items measure a single idea while matching exercises require a series of related ideas

Multiple-choice items are usually free of response sets

Incorrect answers in multiple-choice items can usually allow for diagnosis of errors and misunderstandings that need correction

Disadvantages Limited to outcomes at the verbal level Requires selection of the correct answer and therefore it does not measure problem solving skills in math and science or the ability to organize and present ideas

It is difficult to find a sufficient number of reasonable alternatives or distracters (especially at the primary level)

Write the stem as a question. Give three or four possible alternatives from which to choose.

State items and options positively when possible. Note: Elementary students especially find negatives confusing. If you use the word not in the stem, italicize or underline it.◦ Example:◦ Which of the following cities is not in Metro Manila?

a. Caloocan B. Valenzuela C. Paranaque D. Antipolo

On Multiple Choice: Writing

Include as much of the item possible in the stem, thus making the stem relatively long and the alternatives relatively short.◦ Example:

Which Philippine hero organized the La Liga Filipina?

A. M. H. del Pilar B. G. L. Jaena C. J. P. Rizal D. A. Bonifacio

Alternatives should gramatically match the stem so that no answers are gramatically wrong.

On Multiple Choice: Writing

Alternatives should grammatically match the stem so that no answers are grammatically wrong.◦ Example:

Orville and Wilbur Wright became famous because of which type of transportation?

A. airplane B. automobile C. boat D. train Orville and Wilbur Wright became famous because of an?A. airplane B. automobile C. boat D. train

Write items that have a clearly defensible correct or best option. Note: unless you give alternative directions, students will be assume that there is only one correct or best answer to an item.

On Multiple Choice: Writing

Vary the placement if the correct options. Students who are unsure of an answer tend to select the middle options and avoid the extreme options.

Beware of cues in the length of the options. Note: Correct answers tend to be longer than incorrect ones because of the need to include specifications and qualifications that make it true. Therefore, lengthen the distractors to approximately the same length as the correct answer.

On Multiple Choice: Writing

Don’t expect your students to make narrow distinctions among answer choices.◦ Example:

The freezing point of water isa. 25ºF b. 32ºF c. 39ºF d. 46ºF The freezing point of water isa. 30ºF b. 31ºF c. 32ºF d. 43ºF

Do not overuse None of the above and All of the above. Also avoid using variations of “A and B” or “C and D but not A.”

Don’t use the exact wording in a textbook when writing a question.

On Multiple Choice: Writing

Allow students more freedom of response to questions but require more writing than other formats.

Especially good in assessing students’:◦ understanding of material, ◦ higher-level thinking skills, ◦ ability to organize information, and ◦ writing skills.

On Essays

Suggestions for writing good essay items:◦ Specify limitations. Inform students about

The length of the desired answer and The weight that will be given to each item

◦ Structure and clarify the task. Example: Who is Emilio Aguinaldo? This can be answered by only six words: First president of the Philippines Republic.

Improved: Ask yourself what more you want the student to tell. More-structured essay items would require more thinking on the part of the student: Describe two major accomplishments of Emilio Aguinaldo’s political life. What was important about each accomplishment?

On Essays

On Essays Ask questions in a direct way. Don’t get too tricky.

On Essays: Strengths The highest level of learning outcomes (analysis, synthesis, evaluation) can be measured.

The integration and application of ideas can be emphasized.

Preparation time is usually less than for selection-type formats.

On Essays: Limitations Achievement may not be adequately sampled due to the time needed to answer each question.

It can be difficult to relate essay responses to intended learning outcomes because of freedom to select, organize, and express ideas.

Scores are raised by writing skill and bluffing, and lowered by poor handwriting, misspelling, and grammatical errors.

Scoring is time-consuming, subjective, and possibly unreliable.

On Essays: Scoring Outline a plan for what constitutes a good or acceptable answer prior to administering or scoring students’ responses.

Analytic Scoring: scoring various criteria separately, then adding up the points to produce an overall score. It can be time consuming so avoid having more than three or four criteria.

Holistic Scoring: making an overall judgment about the student’s answer and giving it a single number or letter. Make the judgment based on overall impression of the essay.

Devise a method where you can score the essays without knowing which student wrote them.

On Essays: Scoring Evaluate all answers to the same questions together.

Decide on a policy for handling irrelevant or incorrect responses.

If possible, reread papers before handling them back to students.

Write comments on the paper.

Type Advantages Disadvantages

Short Answer

Essay

True/ False

Matching

Multiple choice

Can test many facts in a short time. Fairly easy to score. Excellent format for math. Tests recall.

Can test complex learning. Can assess thinking process and creativity.

Tests the most facts in shortest time. Easy to score. Test recognition. Objective.

Excellent for testing associations and recognition of facts. Although terse, can test complex learning (especially concepts). Objective.

Can assess learning at all levels of complexity. Can be highly reliable, objective. Tests fairly large knowledge base in short time. Easy to score.

Difficult to measure complex learning. Often ambiguous.

Difficult to score objectively. Uses a great deal of testing time. Subjective.

Difficult to measure complex learning. Difficult to write reliable items. Subject to guessing.

Difficult to write effective items. Subject to process of elimination.

Difficult to write. Somewhat subject to guessing.

Table 3. Advantages and Disadvantages of Different Kinds of Test Items

Nontraditional Assessments

Alternatives to traditional testing have emerged that address what are seen as its limits, including that it emphasizes recall of facts instead of thinking and problem solving. Alternative approaches include authentic assessment, student exhibitions, and student portfolios.

Alternatives to Traditional Assessments

Authentic Assessment Assessment procedures that test skills and abilities as they would be applied in real- life situation.

Performance Assessment Any form of assessment that requires students to carry out an activity or produce a product in order to demonstrate learning.

Portfolio A collection of the student’s work in an area, showing growth, self- reflection, and achievement.

Exhibition A performance test or demonstration of learning that is public and usually takes an extended time to prepare.

Alternatives to Traditional Assessments

Evaluating Portfolios and Performances

Checklists

Rating Scales

Scoring RubricsRules that are used to determine the quality of a student’s performance.

Example of a rubric scoringCriteria QualityPurpose The report

explains the key purposes of the invention and also points out less obvious ones.

The report explains all the key purposes of the invention.

The report explains some of the purposes of the invention but misses key purposes.

The report does not refer to the purposes of the invention.

Features The report details both key and hidden features of the invention and explains how they serve several purposes.

The report details the key features of the invention and explains the purposes they serve.

The report neglects some of the features of the invention or the purposes they serve.

The report does not detail the features of the invention or the purposes they served.

Connections

The report make appropriate connections between the purposes and features of the invention and many different kinds of phenomena

The report makes appropriate connections between the purposes and features of the invention and one or two phenomena

The report makes unclear or inappropriate connections between the invention and other phenomena.

The report makes no connections between the invention and other things.

Include a scale of possible points to be assigned in scoring work.

Provide descriptors for each performance criteria to increase reliability and avoid biased scoring.

Decide whether the rubric will be generic, genre-specific, or task-specific.◦ Generic – broad performance◦ Genre-specific – more specific type of performance◦ Task-specific – unique to a single task

Decide whether the rubric should be longitudinal. Assess progress over time toward mastery of educational objectives.

Notes in prepping a rubric

Informal Assessments

Ungraded (formative) assessments that gather information from multiple sources to help teachers make decisions.

Journals

Involving students in Assessments

ASSESSMENT METHOD

Target to Be Assessed

Selected Response

Essay Performance Assessment

Personal Communication

Knowledge Mastery

Multiple choice, true/ false, matching, and fill- in can sample mastery of elements of knowledge

Essay exercises can tap understanding of relationships among elements of knowledge

Not a good choice for this target- three other options preferred

Can ask questions, evaluate answers, and infer, mastery-- but a time consuming option

Reasoning Proficiency

Can assess understanding of basic patterns of reasoning

Written descriptions of complex problem solutions can provide a window into reasoning proficiency

Can watch students solve some problems and infer about reasoning proficiency

Can ask student to “think aloud” or can ask follow- up questions to probe reasoning

Skills Can assess mastery of the prerequisites of skillful performance- but cannot tap the skill itself

Can assess mastery of knowledge prerequisite to the ability to create quality products- but cannot assess the quality of products themselves

A strong match can assess:(a) Proficiency in carrying out steps in product development and (b) attributes of the product itself

Can probe procedural knowledge and knowledge of attributes of quality produc

Table 4. Aligning Different Assessment Tools with Their Targets

Tests are not created equal. Creation of good tests is not a matter of chance!

Assessment should not end on tests alone. Tests are one of the means to better understand our students cognitively and behaviorally and affectively.

Final thoughts

Cohen & Swerdlik (2008). Psychological Testing and Assessment: An Introduction to Tests and Measurement (7th Ed.).

Kaplan & Saccuzo (2011). Psychological Testing.

Kellough & Roberts (1994). A Resource Guide for Elementary School Teaching: Planning for Competence (3rd Ed.).

Santrock (2008). Educational Psychology (2nd Ed.).

Woolfolk (2010). Educational Psychology (11th Ed.).

Recommended readings (references)

Thank You!

Thank you!