Grindability soft-sensors based on lithological composition and on-line measurements
On Tests and Measurements
Transcript of On Tests and Measurements
On Tests and AssessmentsMarty Ian Gideon Flores, B. Sc. Psych
Department of PsychologyUrdaneta City University
Department of EducationPangasinan IISpecial Education DivisionJune 26, 2014
At the end of the lecture, the participants can:
Differentiate measurement from evaluation, and testing from assessment;
Compare and contrast formative tests and summative tests;
Appreciate the science behind test construction; and
Apply some of the skills discussed in constructing their own sound and fair tests.
Learning Goals
Definition of terms Formative vs Summative test
Test Construction 101 Some types of alternative assessments
Final thoughts
What lies ahead?
Basics of Assessments
Standardized TestTests given, usually nationwide, under uniform conditions and scored according to uniform procedure.
Classroom AssessmentsClassroom assessments are selected and created by teachers and can take many different forms- unit tests, essays, portfolios, projects, performances, oral presentations, etc.
Basics of Assessments
Measurement An evaluation expressed in quantitative (numbers) terms.
Assessment Procedures used to obtain information about student performance.
Formative Assessment Ungraded testing used before or during instruction to aid in planning and diagnosis.
Pretest Formative test for assessing students’ knowledge, readiness, and abilities.
Summative Assessment Testing that follows instruction and assesses achievement.
Functions of Assessments
1. The identification by teachers & learners of learning goals, intentions or outcomes and criteria for achieving these.
2. Rich conversations between teachers & students that continually build and go deeper.
3. The provision of effective, timely feedback to enable students to advance their learning.
4. The active involvement of students in their own learning.
5. Teachers responding to identified learning needs and strengths by modifying their teaching approach(es).
Black & Wiliam, 1998
Key Elements of Formative Assessment
Summative Assessment Assessment of learning Generally taken by students at the end of a unit or semester to demonstrate the "sum" of what they have or have not learned.
Summative assessment methods are the most traditional way of evaluating student work.
"Good summative assessments--tests and other graded evaluations--must be demonstrably reliable, valid, and free of bias" (Angelo and Cross, 1993).
Formative ‘… often means no more than that the assessment is carried out frequently and is planned at the same time as teaching.’ (Black and Wiliam, 1999)
‘… provides feedback which leads to students recognizing the (learning) gap and closing it … it is forward looking …’ (Harlen, 1998)
‘ … includes both feedback and self-monitoring.’ (Sadler, 1989)
‘… is used essentially to feed back into the teaching and learning process.’ (Tunstall and Gipps, 1996)
Summative‘…assessment (that) has increasingly been used to sum up learning…’(Black and Wiliam, 1999)
‘… looks at past achievements … adds procedures or tests to existing work ... involves only marking and feedback grades to student … is separated from teaching … is carried out at intervals when achievement has to be summarized and reported.’ (Harlen, 1998)
If we think of our children as plants …
Summative assessment of the plants is the process of simply measuring them. It might be interesting to compare and analyze measurements but, in themselves, these do not affect the growth of the plants.
Formative assessment, on the other hand, is the equivalent of feeding and watering the plants appropriate to their needs - directly affecting their growth.
The Garden Analogy
Factors Inhibiting Assessment A tendency for teachers to assess quantity and presentation of work rather than quality of learning.
Greater attention given to marking and grading, much of it tending to lower self esteem of students, rather than providing advice for improvement.
A strong emphasis on comparing students with each other, which demoralizes the less successful learners.
Self-evaluationWhere would you place your assessment practice on the
following continuum?
The main focus is on:Quantity of work/Presentation
Quality of learning
Marking/Grading
Comparing students
Advice for improvement
Identifying individual progress
Implications for classroom practice
Share learning goals with students. Involve students in self-assessment. Provide feedback that helps students recognize their next steps and how to take them.
Be confident that every student can improve.
Table 1.Using Tests to Make Instructional DecisionsDecision Category Typical Assessment
StrategyDecision Options
What to teach in the first place?
How long to keep teaching toward a particular instructional objective?
How effective was an instructional sequence?
Preassessment before instruction
En route assessments of students’ progress
Comparing students’ posttest to pretest performances
Whether to provide instruction for specific objectives?
Whether to continue or cease instruction for an objective either for an individual or for the whole class?
Whether to retain, discard, or modify a given instructional sequence the next time its’ used?
“ Any teacher who uses tests dominantly to determine whether students get high or low grades should receive a solid F in classroom assessment. Tests and all assessments should be used to help teachers make better instructional decisions.”- Popham, 2008.
Always remember:
STOP &THINKThink back to your most recent test. What was the format? Did you feel that the test results were an accurate reflection of your knowledge or skills? Have you ever had to design a test? What makes a good, fair test?
Classroom Assessment: Testing
A learning target consists of what students should know, be able to do, and feel.
Creating Clear, Appropriate Learning Targets
Test items with an objective format in which responses can be scored on quick inspection. A scoring guide for correct responses is created and can be used by an examiner or computer.
Examples: True/False, Multiple Choice, and Matching items
Traditional Tests: Selected-Response Items
Asks a student to mark whether a statement is true or false—for example,◦ Aspects of experience can sculpt the features of brain structure. True False?
Common drawback is that teachers sometimes take statements directly from a text or modify them slightly. Avoid this practice because it tends to encourage rote memorization with little understanding of the material.
On True/False Items
The item is useful for outcomes where there are only two possible alternatives (e.g., fact or opinion, valid or invalid).
Less demand is placed on reading ability than in multiple-choice items.
A relatively large number of items can be answered in a typical testing period.
Scoring is easy, objective, and reliable.
On True/False Items: Strengths
It is difficult to write items at a higher level of knowledge and thinking that are free from ambiguity.
When a statement indicates correctly that a statement is false, that response provides no evidence that the student knows what is correct.
No diagnostic information is provided by the incorrect answers.
Scores are more influenced by guessing than with any other item type.
On True/False Items: Weaknesses
Use only one key idea in each statement. Two or more ideas tend to confuse students.◦ Example:
Hypothalamus is responsible for our basic survival motives.
Hypothalamus, a forebrain structure near the base of the brain, is responsible for our basic survival motives.
Keep the statement short, and use simple vocabulary and sentence structure.
On True/False Items: Writing
Use words precisely that the statement can clearly judged as true or false. No possible and might. Avoid vague terms such as seldom, frequently, and often.◦ Example:
There are 38 people who witnessed the Genovese murder.
Many people witnessed the Genovese murder. Use negatives sparingly and avoid double negatives.◦ Example:
In the presence of high heat, oxygen bonds readily with hydrogen.
In the presence of high heat, oxygen is not unlikely to bond with hydrogen.
On True/False Items: Writing
Avoid extraneous clues to the answer. Avoid absolute words such as always, never, all, none, and only because they often signals “FALSE” as an answer. On the other hand, qualifier words such as usually, might, and sometimes tend to be true.◦ Example:
Corazon Aquino made important contributions in Philippines democracy.
Corazon Aquino never made important contributions in Philippines democracy.
On True/False Items: Writing
Why USE Multiple-Choice Items Most widely used Measure simple learning outcomes Measure complex learning outcomes (knowledge, understanding, and application)
Flexible, high quality items adaptable to most subject-matter content
Used extensively in achievement testing
Characteristics of Multiple-Choice Items Consists of a problem (stem) and a list of suggested solutions (alternatives, choices, or options)
Answers other than the correct answer are called distracters (decoys or foils)
Items can be stated in two ways. 1) Direct questions a) easier to write b) more natural for younger students c) present a clearly formatted problem
2) Incomplete sentences a) more concise b) present a well defined problem if phrased well
Correct Answer Type and Best Answer Type The correct answer type has only one possible
correct answer (recall factual information).
The best answer type measures learning outcomes that require the understanding, application, or
interpretation of factual information (measures more complex learning and is more difficult).
When dealing with the best answer variety, make sure your best answers are those that are agreed on by experts. This will allow you to defend your answers as the best possible choice.
USES OF MULTIPLE-CHOICE ITEMSMeasuring Knowledge Outcomes 1) Knowledge of Terminology 2) Knowledge of Specific Facts 3) Knowledge of Principles 4) Knowledge of Methods and Procedures
Measuring Outcomes at the Understanding
and Application Levels 1) Ability to Identify Application of Fact and Principles
2) Ability to Interpret Cause-and-Effect Relationships
3) Ability to Justify Methods and Procedures
Advantages and Limitations of Multiple-Choice ItemsAdvantages Measures achievement and complex learning outcomes. Structure of alternatives eliminate vagueness and ambiguity
Knowledge of content area is measured without concern for spelling errors
Multiple-choice requires students to choose the correct or best answer while true-false tests allow students to get credit for knowing a statement is not correct.
Multiple-choice items have a greater reliability than true-false
Multiple-choice items measure a single idea while matching exercises require a series of related ideas
Multiple-choice items are usually free of response sets
Incorrect answers in multiple-choice items can usually allow for diagnosis of errors and misunderstandings that need correction
Disadvantages Limited to outcomes at the verbal level Requires selection of the correct answer and therefore it does not measure problem solving skills in math and science or the ability to organize and present ideas
It is difficult to find a sufficient number of reasonable alternatives or distracters (especially at the primary level)
Write the stem as a question. Give three or four possible alternatives from which to choose.
State items and options positively when possible. Note: Elementary students especially find negatives confusing. If you use the word not in the stem, italicize or underline it.◦ Example:◦ Which of the following cities is not in Metro Manila?
a. Caloocan B. Valenzuela C. Paranaque D. Antipolo
On Multiple Choice: Writing
Include as much of the item possible in the stem, thus making the stem relatively long and the alternatives relatively short.◦ Example:
Which Philippine hero organized the La Liga Filipina?
A. M. H. del Pilar B. G. L. Jaena C. J. P. Rizal D. A. Bonifacio
Alternatives should gramatically match the stem so that no answers are gramatically wrong.
On Multiple Choice: Writing
Alternatives should grammatically match the stem so that no answers are grammatically wrong.◦ Example:
Orville and Wilbur Wright became famous because of which type of transportation?
A. airplane B. automobile C. boat D. train Orville and Wilbur Wright became famous because of an?A. airplane B. automobile C. boat D. train
Write items that have a clearly defensible correct or best option. Note: unless you give alternative directions, students will be assume that there is only one correct or best answer to an item.
On Multiple Choice: Writing
Vary the placement if the correct options. Students who are unsure of an answer tend to select the middle options and avoid the extreme options.
Beware of cues in the length of the options. Note: Correct answers tend to be longer than incorrect ones because of the need to include specifications and qualifications that make it true. Therefore, lengthen the distractors to approximately the same length as the correct answer.
On Multiple Choice: Writing
Don’t expect your students to make narrow distinctions among answer choices.◦ Example:
The freezing point of water isa. 25ºF b. 32ºF c. 39ºF d. 46ºF The freezing point of water isa. 30ºF b. 31ºF c. 32ºF d. 43ºF
Do not overuse None of the above and All of the above. Also avoid using variations of “A and B” or “C and D but not A.”
Don’t use the exact wording in a textbook when writing a question.
On Multiple Choice: Writing
Allow students more freedom of response to questions but require more writing than other formats.
Especially good in assessing students’:◦ understanding of material, ◦ higher-level thinking skills, ◦ ability to organize information, and ◦ writing skills.
On Essays
Suggestions for writing good essay items:◦ Specify limitations. Inform students about
The length of the desired answer and The weight that will be given to each item
◦ Structure and clarify the task. Example: Who is Emilio Aguinaldo? This can be answered by only six words: First president of the Philippines Republic.
Improved: Ask yourself what more you want the student to tell. More-structured essay items would require more thinking on the part of the student: Describe two major accomplishments of Emilio Aguinaldo’s political life. What was important about each accomplishment?
On Essays
On Essays: Strengths The highest level of learning outcomes (analysis, synthesis, evaluation) can be measured.
The integration and application of ideas can be emphasized.
Preparation time is usually less than for selection-type formats.
On Essays: Limitations Achievement may not be adequately sampled due to the time needed to answer each question.
It can be difficult to relate essay responses to intended learning outcomes because of freedom to select, organize, and express ideas.
Scores are raised by writing skill and bluffing, and lowered by poor handwriting, misspelling, and grammatical errors.
Scoring is time-consuming, subjective, and possibly unreliable.
On Essays: Scoring Outline a plan for what constitutes a good or acceptable answer prior to administering or scoring students’ responses.
Analytic Scoring: scoring various criteria separately, then adding up the points to produce an overall score. It can be time consuming so avoid having more than three or four criteria.
Holistic Scoring: making an overall judgment about the student’s answer and giving it a single number or letter. Make the judgment based on overall impression of the essay.
Devise a method where you can score the essays without knowing which student wrote them.
On Essays: Scoring Evaluate all answers to the same questions together.
Decide on a policy for handling irrelevant or incorrect responses.
If possible, reread papers before handling them back to students.
Write comments on the paper.
Type Advantages Disadvantages
Short Answer
Essay
True/ False
Matching
Multiple choice
Can test many facts in a short time. Fairly easy to score. Excellent format for math. Tests recall.
Can test complex learning. Can assess thinking process and creativity.
Tests the most facts in shortest time. Easy to score. Test recognition. Objective.
Excellent for testing associations and recognition of facts. Although terse, can test complex learning (especially concepts). Objective.
Can assess learning at all levels of complexity. Can be highly reliable, objective. Tests fairly large knowledge base in short time. Easy to score.
Difficult to measure complex learning. Often ambiguous.
Difficult to score objectively. Uses a great deal of testing time. Subjective.
Difficult to measure complex learning. Difficult to write reliable items. Subject to guessing.
Difficult to write effective items. Subject to process of elimination.
Difficult to write. Somewhat subject to guessing.
Table 3. Advantages and Disadvantages of Different Kinds of Test Items
Nontraditional Assessments
Alternatives to traditional testing have emerged that address what are seen as its limits, including that it emphasizes recall of facts instead of thinking and problem solving. Alternative approaches include authentic assessment, student exhibitions, and student portfolios.
Alternatives to Traditional Assessments
Authentic Assessment Assessment procedures that test skills and abilities as they would be applied in real- life situation.
Performance Assessment Any form of assessment that requires students to carry out an activity or produce a product in order to demonstrate learning.
Portfolio A collection of the student’s work in an area, showing growth, self- reflection, and achievement.
Exhibition A performance test or demonstration of learning that is public and usually takes an extended time to prepare.
Alternatives to Traditional Assessments
Evaluating Portfolios and Performances
Checklists
Rating Scales
Scoring RubricsRules that are used to determine the quality of a student’s performance.
Example of a rubric scoringCriteria QualityPurpose The report
explains the key purposes of the invention and also points out less obvious ones.
The report explains all the key purposes of the invention.
The report explains some of the purposes of the invention but misses key purposes.
The report does not refer to the purposes of the invention.
Features The report details both key and hidden features of the invention and explains how they serve several purposes.
The report details the key features of the invention and explains the purposes they serve.
The report neglects some of the features of the invention or the purposes they serve.
The report does not detail the features of the invention or the purposes they served.
Connections
The report make appropriate connections between the purposes and features of the invention and many different kinds of phenomena
The report makes appropriate connections between the purposes and features of the invention and one or two phenomena
The report makes unclear or inappropriate connections between the invention and other phenomena.
The report makes no connections between the invention and other things.
Include a scale of possible points to be assigned in scoring work.
Provide descriptors for each performance criteria to increase reliability and avoid biased scoring.
Decide whether the rubric will be generic, genre-specific, or task-specific.◦ Generic – broad performance◦ Genre-specific – more specific type of performance◦ Task-specific – unique to a single task
Decide whether the rubric should be longitudinal. Assess progress over time toward mastery of educational objectives.
Notes in prepping a rubric
Informal Assessments
Ungraded (formative) assessments that gather information from multiple sources to help teachers make decisions.
Journals
Involving students in Assessments
ASSESSMENT METHOD
Target to Be Assessed
Selected Response
Essay Performance Assessment
Personal Communication
Knowledge Mastery
Multiple choice, true/ false, matching, and fill- in can sample mastery of elements of knowledge
Essay exercises can tap understanding of relationships among elements of knowledge
Not a good choice for this target- three other options preferred
Can ask questions, evaluate answers, and infer, mastery-- but a time consuming option
Reasoning Proficiency
Can assess understanding of basic patterns of reasoning
Written descriptions of complex problem solutions can provide a window into reasoning proficiency
Can watch students solve some problems and infer about reasoning proficiency
Can ask student to “think aloud” or can ask follow- up questions to probe reasoning
Skills Can assess mastery of the prerequisites of skillful performance- but cannot tap the skill itself
Can assess mastery of knowledge prerequisite to the ability to create quality products- but cannot assess the quality of products themselves
A strong match can assess:(a) Proficiency in carrying out steps in product development and (b) attributes of the product itself
Can probe procedural knowledge and knowledge of attributes of quality produc
Table 4. Aligning Different Assessment Tools with Their Targets
Tests are not created equal. Creation of good tests is not a matter of chance!
Assessment should not end on tests alone. Tests are one of the means to better understand our students cognitively and behaviorally and affectively.
Final thoughts
Cohen & Swerdlik (2008). Psychological Testing and Assessment: An Introduction to Tests and Measurement (7th Ed.).
Kaplan & Saccuzo (2011). Psychological Testing.
Kellough & Roberts (1994). A Resource Guide for Elementary School Teaching: Planning for Competence (3rd Ed.).
Santrock (2008). Educational Psychology (2nd Ed.).
Woolfolk (2010). Educational Psychology (11th Ed.).
Recommended readings (references)