04 - Paper and Pencil, Performamce, Oral, Essay and Objective Test

HANDOUTS
EPY513: Educational Test &

Measurements
Topic 4: TEST CONSTRUCTION: TYPES OF TEST AND

MEASUREMENTS SCALES
Types of Test
Maseno University
PGDE PROGRAMMES
EPY513: Educational Test & Measurements
Maseno University
Paper and Pencil, Performamce,Oral,essay and objective test

PAPER-AND-PENCIL TESTS: In this category we include objective and essay type tests. We call this
category paper-and-pencil tests, because the examinee records his/her responses to a question on a
paper , thus leaving behind a permanent record of the test proceedings. The scripts can be then taken
away and marked at the examinee's convenience.
Typically they are used to assess students cognitive ability, but they have also been used to measure
effective and practical skills. For example, the KCSE-Science practicals require a student to carry out
certain tasks or procedures and to record his/her observations. The student's written record is
collected at the end of the examination to be marked.
Advantages
1. They are economical in terms of time and money.

2. A large number of students can be tested at the same time.
3. All the students answer the same question paper.
4. It allows one to test the students under uniform conditions* the examination time can be
strictly controlled, and so can one control access to reference material (usually none is
allowed)
5. Usually the test consists of a series of questions that may cover the whole syllabus.
6. Provides an opportunity to obtain detailed feedback for both the students and teachers.
Disadvantages;
1. It attaches undue importance to a very small sample of student behaviour.

2. Students ' results may be unnecessarily influenced by extraneous factors.
3. It may have adverse side-effects on instructional programme
Items on paper – and – pencil tests can be classified into two categories,) supply, and ii) selection. The
supply items include short- answer and essay items; and the selection items include objective items
such as true-false, multiple choice and matching items. Many teachers believe that ability to choose an
answer is different from, and less significant than, ability to produce an answer.
1
Maseno University
Research, however, indicates that both these abilities are highly correlated. In other words, the item
form does not determine the complexity or the superiority of cognitive behaviour tested. Neither is it
correct to conclude that luck plays a larger role in one type, and is totally absent in the other. If care is
exercised, both item types can be used to measure the same kind, and level of ability.
Diagram 4.1
MOST COMMONLY USED PAPER – AND – PENCIL TEST ITEMS FORMS
UNSTRUCTURED THESIS, DISSERTATION; SUBJECTIVE
TERM PAPER , PROJECT, FIELD STUDY REPORT;
ESSAY;
SHORT ANSWER QUESTIONS;
COMPLETION ITEMS (fill-in-blanks)
STRUCTURED OBJECTIVE ITEMS; OBJECTIVE
True – false Items,
Multiple-choice items, (Correct and best answer variety),
Matching items.
Diagram 4.1 has placed various items forms on a continuum one end of this continuum represents
completely unstructured test setting, where the student, when writing a thesis or a project is
completely free to select a research problem, investigate, etc, and write up her/his findings. At the
other end of the continuum, where the test setting has been completely structured, the student has no
freedom even to formulate an answer: She/he may select the correct answer from various options
provided.
Between these two extremes, you have; essay, short-answer and completion items. As one moves
away from the completely unstructured test setting towards the structured end, we find the examiner
2
Maseno University
curtailing the examines freedom. With the use of an essay item, the examiner instructs the examinee
to answer a given problem. This still allows some room for examinee interpretation.
With short answer questions, the examiner begins to control for interpretation and the length of the
answer. With the use of completion items, the examinee's freedom has been reduced to the provision
of a simple phrase or a single word.
However, for all these items – thesis project, essay, short answer, and completion items there is no
one correct answer. When designing a marking scheme for these items, the examiner has to build in
some degree or flexibility, which results in subjectivity in marking. With the structured items, where
the examinee is choosing a correct answer, there is, only one correct answer hence the marking is
very objective. Subjectivity, in unstructured items is related to the award of partial credit to an
imperfect Or incomplete answer. The examiner subjectively decides which elements to consider in
judging degree of perfection.
An examiner should try to make her measurements as objective as possible. A measurement is

considered objective to the extent that marking is independent of the subjective bias of a marker.
Generally speaking, selection items tend to be more objective and hence more reliable than supplied
ones but the characteristics of a particular assessment method are not entirely static or mixed.
Almost any method is capable of improvement in numerous ways. For example, &n essay item can be
constructed to reduce possible interpretations, and care can be exercised in the process of marking
to make it more objective.
The supply items are usually easier to construct than selection items. The selection items require
special training and more time to construct. However, these items are easy to score and yield more
reliable results. Where very large number, of students is to be tested, the use of objective items is
recommended. The teacher, keeping in mind what needs to be tested and how many students need to
be tested should choose the type of item that is most useful.
PERFORMANCE TESTS
Have been used for aptitude testing, achievement testing; and teachers often use their informal
observations of pupil behaviour in class as basis for pupils assessment. Performance tests are
concerned with assessing pupil's ability in using various skills and procedures in various academic
courses. For example, Geography courses are concerned with ma1" reading; Science courses are
concerned with laboratory skills, Similarly, courses like fine art, music, home economics, and physical
education attach great importance to pupil's ability to perform various skills.- On the whole, the
performance tests act as an important adjunct to the paper-and-pencil measures of cognitive skills.
3
Maseno University
Typically, a performance test is an individualized test in which a student is given a task to perform.
While the student is carrying out the task, she is observed by one or more judges. This kind of testing
gives access to the process or procedure by which the outcome or product is prepared, and is used
when the process evaluation is essential for purposes of assessment. For example, in Home
Economics, when testing a student's ability to prepare a meal the teacher needs to judge whether or
not the student used a safe hygienic and correct method'. In other courses, we are not so concerned
about the actual process used because different procedures would result in a similar product.
For example, in Fine Art it really does not matter what process the student uses, we are only
interested in the final product - the painting. Therefore, we will not need a judge to evaluate the
student's work during the examination.
Advantages of a Performance Test. It ;
1. Provides an opportunity to test in a realistic setting;

2. allows the examiner to observe and check individual performance ;
3. Assesses the pupil's proficiency in performing an activity;
4. provides an opportunity to test student's ability to bring together a number of different skills in
a way which is difficult for written test.
5. Provides an opportunity to observe and test attitudes and responsiveness to total situation.
Disadvantages . It
1. May impose some pressure on laboratory and workshop equipment since all students use
the same tools at the same time.
2. Where sufficient equipment is not available it may result in unequal (non - standardized)
examination conditions thus making comparison of results invalid.
3. Only a small portion of the total syllabus can be tested.
4. Where a marker is required to observe the actual process or working, one would require
services of one marker for every 6-10 students, thus increasing the cost, of the exam.
5. Has limited feasibility for large groups.
6. Lacks objectivity in marking and suffers from intrusion of irrelevant factors.
4
Maseno University
Performance exams are usually marked to an analytical marking scheme, although a checklist or
rating scales may be used instead. The objectivity of marks can be increased by use of more than one
examiner, but this would require good coordination between examiners.
ORAL TESTS
With the advent of paper-and-pencil exams, the oral tests are used less frequently. Not so long ago,
they were the only means of assessing students' cognitive abilities. These days, oral tests are used to
assess student’s communication ability in English or a foreign language; or to test a visually
handicapped individual. Another use of oral test is in connection with evaluation of a graduate
student’s defence her/his thesis.
Typically, an oral test requires a one to one situation - one examinee and one examiner (though there
can be more than one examiner). Depending on the purpose of the exam, it tends to be short and
rather loosely structured. This allows one to assess the strengths and weaknesses of individuals. In
other words, it allows the examiners to tailor-make the examination to fit a particular candidate. The
oral tests are usually marked by means of a rating scale.
Advantages
An oral test;
1. provides direct contact with the candidate}

2. provides an opportunity to assess strong and weak areas of each candidate:
3. provides the examiner ah opportunity to clarify the question when the candidate had not
grasped the meaning;
4. provides an opportunity to question the candidate about how she arrived at the answer.
5. more than one examiner can examine the candidate simultaneously.
Disadvantages
An oral test
1. lacks standardisation, hence the results of this test can not compare across candidates;
2. lacks objectivity and reproducibility of results;
3. the test results tend to be very subjective;
4. lacks a precise definition of the criteria for award of a satisfactory grade;
5
Maseno University
5. consequently depends heavily on the experience of examiners and their ability to retain in their
minds an accurate impression of the standard required requires a large number of examiners;
- one examiner cannot normally examine more than 15 - 20 students in a day.
To ensure consistency in marking, the proceedings can be taped, this would not only increase the
expense of the examination, but it would almost double the number of hours needed to test - first time
round to quiz the candidate and second time round to listen to the tapes.
SECTION TWO
Which Assessment Method?
Among the many teaching methods available, a teacher chooses the one that will be most effective
and efficient in attaining the objectives of the course. Some of the objectives can be attained using
large-group situation, while others call for small group or even individualised teaching. Teaching in
large group situation is most economical in terms of teaching time, and individualised instruction is by
far the most1 expensive. In other words, teachers must decide as to what is the most effective
method of teaching after a careful analysis of what needs to be taught.
The same kind of thoughtful analysis is necessary for planning an effective evaluation programme.
The evaluation or attainment of some objectives may require the teacher to work with and observe the
student while she is learning to swim or dive from the top board. Evaluation of some other aspects of
student progress may require the use of personal interview. In terms of teacher time, both these
methods are expensive. There are, however, many objectives with the cognitive domain that can be
tested very effectively and very inexpensively by use of group paper and pencil tests.
The first step in deciding on what method to choose, is a thorough understanding of what needs to be
evaluated and for what purpose. After the teacher has defined her instructural objectives, and has
determined what kinds of decisions will require evaluative information, she is in a position to select the
most appropriate method(s) of evaluation. In this process of selection the teacher needs to be guided
by certain criteria that should help her weigh relative qualities of various measurement methods.
These criteria will be discussed in relatively non technical terms in this section. This discussion should
enable a teacher to decide what is the best method for assessing pupil achievement. Some of these
criteria will be presented in greater depth in the text later with a more technical discussion.
Validity
This refers to the extent to which the results of measurement method provide the desired information.
Achievement tests are mainly designed to generate information that would allow a teacher to
6
Maseno University
determine the degree to which pupils have achieved the specific instructional objectives. Therefore,
the important question to ask when choosing a method is whether or not it will provide evidence of the
extent to which pupils Can exhibit the specific behaviour identified in the objectives. If a given Method
can not provide this essential information, then it is of little value.
Validity, then is the first consideration in the selection of an appropriate procedure. If the specific
objective deals with the cognitive affective or psychomotor domain, the method selected should help
manifest the relevant behaviour at the appropriate level of complexity. The pupils must be given an
opportunity to display the ability/skill identified by the specific objective, so that the teacher can assess
the extent to which it has been mastered.
No method is valid in and by itself. A method may be valid in one situation and for one purpose and
quite invalid in another situation or for another purpose. It is quite likely that even for one particular
situation, one assessment method will not fulfill all the criteria of a good or desirable assessment. In
such a situation it may be necessary to use more than one method for a course, particularly if
different abilities are to be assessed. For example, The KCPE English paper is divided into two sections
- essay to assess the student ability to write a composition and the objective item section to test the
student's ability to comprehend and demonstrate mastery of rules of grammar. Another important
point to remember is that the validity of the test will also depend on how a procedure is used and how
the test results are interpreted. For example, if an incompetent item writer opts for multiple choice
items and constructs items that end up testing knowledge of isolated facts (a test of low validity), then
the fault does not lie with the method, but in the incompetent-application of the method.
Ideally, the teachers should employ the most direct and valid procedure for assessing a particular-
behaviour. However to increase the validity the teacher may often use a second method to
supplement evaluation. For example, a teacher may use objective items to test student’s knowledge of
the topics in a given course. This method would allow the teacher to prepare a test that would survey
the subject content adequately, but it would not provide the students with opportunity to show their
ability to synthesize or to explain their answers. So the teacher may decide to use the essay question
to supplement his evaluation.
7
Maseno University
ESSAY QUESTIONS
INTRODUCTION
Essay tests can provide good measures of a person's understanding of any designated area of
knowledge. An essay test should be used when:
1. The group to be tested is small and the test, will not be reused.
2. The instructor wishes to encourage to the fullest the development of student skill in written
expression
3. The instructor is more interested in exploring student attitudes than in measuring
achievement.
4. The instructor is mere confident of his or her proficiency as a critical reader- than as an
imaginative writer of good objective items.
1. Time available for test preparation is shorter -than time available for test grading.
OBJECTIVES
At the end of this lecture you should be able to:
1. prepare for writing an essay question;

2. write objectives on which to base essay questions;
3. classify essay questions;
4. set a framework in which to write clear essay questions;
5. identify factors to use in evaluating essay questions;
6. discuss three reasons for using more restricted-response questions than extended response
questions in essay tests;
7. discuss two methods used in grading essay questions;
8. discuss six factors to consider while scoring essay questions
Classification of Essay Questions
Essay questions are subdivided into two major types, extended and restricted response, depending on
the amount of latitude or freedom given to the student to organize his ideas and write his answer.
In the extended response type of essay questions, no bounds are placed on the student as to the
point(s) he will discuss and the types of organization he will use. The extended-response type of essay
question permits the student to demonstrate his ability to call upon factual knowledge, evaluate his
8
Maseno University
factual knowledge, organize his ideas, and present his ideas in a logical coherent fashion. It is at the
levels of synthesis and evaluation of writing skills (style, quality) that the extended-response essay
question makes the greatest contribution.
In the restricted-response essay question, the student is more limited in the form and scope of his
answer because he is told specifically the context that the answer is to take. The student is 'aimed at1
the desired response. The restricted-response type of essay is of greater value in measuring learning
outcomes at the comprehension, application and analysis levels, and its use is best reserved for these
purposes.
Construction of Essay Questions
1 . Preparation
Before writing an essay question, one should give adequate time and thought to the preparation of the
question. The following questions should be answered adequately:
(a) Is it measuring the intended objectives?

(b) Is the wording simple and clear to the students?
(c) Is it reasonable and can it be answered by the students?
2. Objectives
The questions should be written so that each question will elicit the type of behaviour a teacher wants
to measure. However, care should be taken so that questions asked do not just require students to
demonstrate ability to recall essential knowledge. Such questions will simply call for reproduction of
material presented in the textbook or in class lectures. The questions should be based on novel
situations or problems, not on the same ones used for instructional purposes.
3. Framework
An essay question should establish a framework within which the student operates. Absence of a
framework makes it more difficult for the teacher to grade the response reliably, since he nay get a
variety of answers to the sane question, depending on hew the students interpret it. A frame work to
guide the students may be established by:
(a) delimiting the area covered by the question,

(b) using words that themselves give direction(descriptive w o r d s) , a n d
9
Maseno University
(c) giving specific directions to the student on the desired response.
Let us discuss these three points.
In delimiting the area covered by the question, the student should know exactly what he is to do. Lack of
specificity in the question often gives rise to problems on the validity of the test.
Descriptive words like "define, outline, select,
illustrate, classify, and summarize" are reasonably clear in their meaning. These words make the
framework in which the question is set very clear. However, a word like "discuss" can be ambiguous. If
the word "discuss" is used, there should be specific instructions as to what points should be discussed.
Finally, a student should be given specific directions concerning the desired response. This is due to
the fact that the purpose of any test is to assess the student's knowledge. One is not measuring
general intelligence. The teacher should write the question so that the student’s task is defined as
completely and specifically as possible. The specific factors the teacher wishes the student to consider
and discuss in the answer should be explicitly stated. Of course the student should be given as much
latitude as possible to demonstrate his synthesis and evaluative skills but the test items should be
given enough direction so that it is evident to the student that the question is carefully phrased so that
the students fully understand what they are expected to do. If the task is not clearly evident in the
question itself, then evaluation may be unreliable.
4. Factors in Evaluation
The test constructor should decide in advance factors to be considered in evaluating essay responses.
"Ground rules" of the test, especially the weighting of the questions and sub-parts of the questions, as
well as information on the general criteria to be used in grading the test responses, should be made
known to the student before and with sufficient time so that he can organize and plan his study habits
more effectively. However, with the exception of a composition test, a student should not be marked
down for miss-spelled words, faulty grammar and poor handwriting. This does not mean that the
teacher should not note and correct these errors.
Optional questions may improve the examinee's chances of receiving a high grade, but they do not
improve the accuracy with which the examinee’s competence is assessed. If any essay test has
optional questions the students are taking somewhat different tests, and their scores lose strict
comparability. However, while not recommended for classroom testing, optional questions may still be
necessary in our national examinations because the examinees are heterogeneous in terms of
experience, facilities and opportunities.
10
Maseno University
6. More Restricted-response Questions
It is better for a teacher to use a relatively large number of questions requiring short answers
(restricted-response type) rather than just a few questions involving long answers. Use of more
restricted-response essay questions:
(a) provides for broader sampling of content, thereby reducing the error associated with
limited sampling-,
(b) tends to discourage bias on the part of teachers who grade for quantity rather than quality;
(c) makes it easier for the teacher to read the answer more rapidly and more reliably because
he has a mental set of what he should be looking for.
7. Length and Complexity
The length of responses and the complexity of the questions and answers should be adapted to the
level of maturity of the students. For example, the depth and breadth of discussion anticipated for
Form one and Form Four students should be markedly different for the two groups, since the Form
One students may not be able to conceptualize, organize and express their thoughts as effectively as
the Form Four students.
8. Novel Situations
Essay questions should use novel situations. To answer a question discussed in the textbook or in class
requires little more than a good memory. But if the teacher's objective is to test for application of
knowledge, then novel situations provide better material for testing. The teacher may need to cast the
questions in novel situations in order to measure higher levels of learning objectives.
9 . Scoring Key
The teacher should prepare a scoring key or the ideal answer in advance. Scoring can be made more
reliable and more objective if the teacher writes out the model answer in advance. This should be done
before the teacher’s starts grading the papers. It is better for the teacher to write the model (ideal)
answer as he is writing the question because at may bring to light ambiguity, unrealistic expectations
on other deficiencies in the question.
Guidelines for Grading Essay Tests
The best basis for objective evaluation of an essay-test answer is the examiner's version of an ideal
answer to the question. The ideal answer can then be used as a criterion for estimating the quality of
the examinees1 responses. There are basically two approaches for estimating the quality of
11
Maseno University
examinees' response on the basis of the examiner's answer, the Holistic Approach and the Analytic
Approach.
In the Holistic Approach scores are obtained by sorting the answers into piles of higher, middle and
lower ability. In this approach, a small number of letter marks are often used to express various levels
of achievement. In many cases, a five letter A-B-C-D-E system is used. A truly outstanding achievement
is rewarded with a mark A . A mark B indicates above average; C is the average mark; D indicates
below average achievement; and E is used to report failure.
A popular term used for the above five-letter grading system is "grading on the curve". The curve
refers to the curve c f the normal distribution. One method of grading on the curve is to determine
from the ideal normal curve what proportion of the marks should fall at each of five levels and to follow
these proportions as closely as possible in assigning marks. For example, the best 7 percent might get
A's, the next 23 percent get B's and so on. Grading on the curve is justified for large classes but if the
class is small it may not be relevant.
One word of caution in using letter grades is that sometimes the students and parents do not
understand them. It is therefore important for the teachers to qualify what the marks they use mean.
For example, instead of just writing a grade B, an additional short explanation phrase, like, this is above
average, would make the grade more interpretable.
What about the Analytic Approach?
In this approach, the ideal answer is broken down into specific points or essential elements. The
student's scores are then- based on the number of essential elements (points) contained in his answer
In this approach, the scores are often reported in percent. A student who has learned all that anyone
could learn, i.e. got all the essential points in his response, is given a mark of 100 percent. A Student
who does not get any of the essential elements in his answer gets a score of zero. However like the
case of .letter grades, students and parents may not be able to interpret the percents. Therefore a
teacher should write a short explanation phrase to accompany these percents. A 70 percent score
standing in isolation may have no interpretable meaning unless its meaning id explicitly stated.
Both Holistic and analytic approaches can be used by a teacher but the Holistic approach tends to be
more subjective. The Analytic approach yields more reliable scores because the identified essential
points provide objective evaluation criteria.
12
Maseno University
Some Scoring Advice
In order to make the scores more reliable a class teacher should:
(a) Check the responses against ideal answer.

(b) Be consistent in the grading. The first papers should not influence the grading of subsequent
papers. This is & serious problem when Holistic scoring is used.
(c) Randomly shuffle the papers before grading.
(d) Grade only one question at a time for all students.
(e) Grade the responses anonymously.
(f) Judge the mechanics of expression separate. Mechanics of expression include things like
legibility, spelling, punctuation, grammar, etc. The proportion of the total score assigned to
these factors should be spelled out in the grading criteria and the students should be
informed. The penalty imposed en the basis of mechanics of expression should be small, not
more that 5 percent of the total score.
(g) If the penalty is big this lowers the reliability of the scores. Try to score all responses to a
particular question without interruption.
(h) If possible, have two independent readings of the answers and use the average as the final
score. A double reading by two independent readers will make the scores more reliable.
When this is not possible, the same teacher can grade the paper twice but there should be an
interval of a few days between the two readings. The average score from the two readings
should then be used.
(i) Provide comments oh the" students' papers as he grades them and corrects the errors made
by students. This helps the students learn more.
13
Maseno University
SUMMARY
1. Essay questions can be subdivided into two major types, i.e. extended and restricted response.
2. In the extended-response type of questions, no bounds are placed on the students as to the
points to discuss and the kind of organization to use. In the restricted-response type, The
student is more limited in the form and scope of the answer to be supplied because the
context that the answer should take is specified.
3. In writing any essay question, one needs to give adequate time and thought to the preparation
of the question.
4. The questions should be written so that each question will elicit the type of behaviour a teacher
wants to measure. To avoid the dominance of questions requiring recall, a number of
questions should be based on novel situations.
5. Each Question should establish a framework within which the students operate. This is best
done by delimiting the area covered by the question, using self-descriptive words and giving
specific directions to the students on the desired response.
6. Factors "to consider in evaluating responses should be' made explicit. The weighting of the
questions and sub-parts of the questions, as well as information on the general criteria to be
used in grading the test responses should be communicated to the students.
7. Optional questions are riot recommended for classroom testing but these may be necessary in
our national examinations because the examinees are more heterogeneous in terms of
experience, facilities and opportunities.
8. More restricted-response questions facilitate broader sampling of the content.
9. The length of required responses and the –complexity of questions should be adapted to the
level of maturity of the students,
10. It is better for the teacher to write model answers during the writing of questions because this
may bring to light any ambiguity, unrealistic expectations or other deficiencies in the questions.
14

04 - Paper and Pencil, Performamce, Oral, Essay and Objective Test

Uploaded by

Copyright:

Available Formats

04 - Paper and Pencil, Performamce, Oral, Essay and Objective Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 - Paper and Pencil, Performamce, Oral, Essay and Objective Test

Uploaded by

Copyright:

Available Formats

HANDOUTS

EPY513: Educational Test &

Topic 4: TEST CONSTRUCTION: TYPES OF TEST AND

Paper and Pencil, Performamce,Oral,essay and objective test

1. They are economical in terms of time and money.

1. It attaches undue importance to a very small sample of student behaviour.

MOST COMMONLY USED PAPER – AND – PENCIL TEST ITEMS FORMS

UNSTRUCTURED THESIS, DISSERTATION; SUBJECTIVE

TERM PAPER , PROJECT, FIELD STUDY REPORT;

SHORT ANSWER QUESTIONS;

COMPLETION ITEMS (fill-in-blanks)

STRUCTURED OBJECTIVE ITEMS; OBJECTIVE

True – false Items,

Multiple-choice items, (Correct and best answer variety),

An examiner should try to make her measurements as objective as possible. A measurement is

Advantages of a Performance Test. It ;

1. Provides an opportunity to test in a realistic setting;

1. provides direct contact with the candidate}

Which Assessment Method?

At the end of this lecture you should be able to:

1. prepare for writing an essay question;

Classification of Essay Questions

Construction of Essay Questions

(a) Is it measuring the intended objectives?

(a) delimiting the area covered by the question,

Descriptive words like "define, outline, select,

Guidelines for Grading Essay Tests

What about the Analytic Approach?

In order to make the scores more reliable a class teacher should:

(a) Check the responses against ideal answer.

You might also like