Educational Measurement & Evaluation
Educational Measurement & Evaluation
Educational Measurement & Evaluation
Educational Measurement
& Evaluation
Prepared by:
Muhammad Naseer Khan
Subject Specialist English
E&SE Dept. Govt of AJ&K
Whatsap +923229310761
Part I
1. Items analysis focuses to find out:
A. Facility index
B. Discrimination Power
C. Effectiveness of distractors
D. All of the above
Answer is: D
Item analysis is the act of analyzing student responses to individual exam
questions with the intention of evaluating exam quality. It is an important
tool to uphold test effectiveness and fairness. Item analysis is likely
something educators do both consciously and unconsciously on a regular
basis.
The facility index of a test item is the percentage of a group of testees
that chooses the correct response. What is it in a particular item which
determines whether it will be easy or difficult?
Discriminatory power measures the degree to which a test score varies
with the level of the measured trait and thus reflects the effectiveness of
a test detect differences between participants concerning the respective
traits or discriminate between high achievers and low achievers.
An item distractor, also known as a foil or a trap, is an incorrect option for
a selected-response item on an assessment.
Distractor Effectiveness When distractors are ineffective and obviously
incorrect as opposed to being more disguised, then they become
ineffective in assessing student knowledge. An effective distractor will
attract test takers with a lower overall score than those with a higher
overall score
2. Facility index of an item determines?
A. Ease or difficulty
B. Discrimination power
C. Objectivity
D. Reliability
Answer is = A
Answer is = B
4. Test item is acceptable which its faculty index /difficulty level ranges
from?
A. 30-70 %
B. 70 %
C. 30%
D. None
Answer is =A
5. Test item is very easy when value of faculty index/ difficulty level is
higher than?
A. 30-70 %
B. 70 %
C. 30%
D. None
Answer is =B
6. Test item is very difficult when value of facility index/ difficulty level is
less than?
A. 30-70 %
B. 70 %
C. 30%
D. None
Answer is =C
Answer is = A
Understanding Item discrimination
Item discrimination is the difference between the percentage correct for
these two groups. The maximum item discrimination difference
is 100 percent. This would occur if all those in the upper group answered
correctly and all those in the lower group answered incorrectly.
8. Test item discriminates 100% when its value for discrimination is?
A. 0.30 – 1
B. 1
C. 0.30
D. None
Answer is = B
9. Test item cannot discriminate low achievers and high achievers when
its value is lower than?
A. 0.30 – 1
B. 1
C. 0.30
D. None
Answer is = C
10. The quality of test that measures “what it claims to measure” is?
A. Validity
B. Differentiability
C. Objectivity
D. Reliability
Answer is = A
Objectivity:
Objectivity is an important characteristic of a good test. It affects both
validity and reliability of test scores. Objectivity of a measuring
instrument moans the degree to which different persons scoring the
answer receipt arrives of at the same result. C.V. Good (1973) defines
objectivity in testing is “the extent to which the instrument is free from
personal error (personal bias), that is subjectivity on the part of the
scorer”.
Reliability:
The dictionary meaning of reliability is consistency, dependence or trust.
So in measurement reliability is the consistency with which a test yields
the same result in measuring whatever it does measure. A test score is
called reliable when we have reason for believing the score to be stable
and trust-worthy.
a. practice effects.
b. alternate forms.
c. order effects.
d. parallel forms.
Answer is: A
There are four main types of reliability. Each can be estimated by
comparing different sets of results produced by the same method.
Reliability
Test-retest reliability
Test-retest reliability measures the consistency of results when you
repeat the same test on the same sample at a different point in time. You
use it when you are measuring something that you expect to stay
constant in your sample.
A test of colour blindness for trainee pilot applicants should have high
test-retest reliability, because colour blindness is a trait that does not
change over time.
Interrater reliability
Interrater reliability (also called interobserver reliability) measures the
degree of agreement between different people observing or assessing
the same thing. You use it when data is collected by researchers assigning
ratings, scores or categories to one or more variables.
Parallel forms reliability
Parallel forms reliability measures the correlation between two
equivalent versions of a test. You use it when you have two different
assessment tools or sets of questions designed to measure the same
thing.
Internal consistency
Internal consistency assesses the correlation between multiple items in a
test that are intended to measure the same construct.
Split-half reliability: You randomly split a set of measures into two sets.
After testing the entire set on the respondents, you calculate the
correlation between the two sets of responses.
12. When a test developer gives the same test to the same group of test
takers on two different occasions, he/she can measure
a. internal consistency.
b. test-retest reliability.
c. split-half reliability.
d. validity.
Answer is: B
13. As a rule, adding more questions to a test that measures the same
trait or attribute _________ the test’s reliability.
a. can decrease
CE b. can increase
c. does not affect
d. lowers
Answer is: B
14. The greatest danger when using alternate/parallel forms is that the:
Answer is: C
Equivalent (parallel) forms Two or more forms of a test covering the
same content whose item difficulty levels are similar.
15. What is the amount of consistency among scorers’ judgements
called?
A. Internal reliability.
B. Interrater reliability.
C. Test-retest reliability.
D. Intrascorer reliability.
Answer is: B
Intra-rater reliability is the degree of agreement among repeated
administrations of a diagnostic test performed by a single rater.
16. Which of the following is NOT one of the four types of validity?
A. Inter-rater
B. Test-retest
C. Predictive
D. Parallel forms
Answer is: C
17. In a study of children social behavior, what type of reliability would be
considered important?
A. Inter-rater
B. Test-retest
C. Predictive
D. Parallel forms
Answer is: A
18. Measurement reliability refers to the:
A. consistency of the scores.
B. dependency of the scores.
C. comprehensiveness of the scores.
D. accuracy of the scores.
Answer is: A
19. If a measure is consistent over multiple occasions, it has:
Inter-rater
Test-retest
Predictive
Parallel forms
Answer is: C
20. The validity of a measure refers to the:
A. consistency of the measurement.
B. accuracy with which it measures the construct.
C. particular type of construct specification.
D. comprehensiveness with which it measures the construct.
Answer is; B
The four types of validity
Validity tells you how accurately a method measures something. If a
method measures what it claims to measure, and the results closely
correspond to real-world values, then it can be considered valid. There
are four main types of validity:
Construct validity: Does the test measure the concept that it’s
intended to measure?
Content validity: Is the test fully representative of what it aims to
measure?
Face validity: Does the content of the test appear to be suitable to
its aims?
Criterion validity: Do the results correspond to a different test of the
same thing?
Answer is: A
External validity is the validity of applying the conclusions of a scientific
study outside the context of that study. In other words, it is the extent to
which the results of a study can be generalized to and across other
situations, people, stimuli, and times.
Internal validity is the extent to which a piece of evidence supports a
claim about cause and effect, within the context of a particular study. It is
one of the most important properties of scientific studies, and is an
important concept in reasoning about evidence more generally.
Alternate-form reliability is also known as
A. Split-half reliability.
B. Test-retest reliability.
C. Parallel forms
D. Convergent reliability
Answer is: D
Answer is = B
23. If the scoring of the test is not effected by any factor, quality of test is
called?
A. Validity
B. Differentiability
C. Objectivity
D. Reliability
Answer is = C
24. The quality of test to give same scores when administered at different
occasions is?
A. Validity
B. Differentiability
C. Objectivity
D. Reliability
Answer is = D
25. If the sample of the question in the test is sufficiently large enough,
the quality of test is?
A. Adequacy
B. Differentiability
C. Objectivity
D. Reliability
Answer is = A
Part II
Answer is = A
2. The split-half method is used as a test of
A. Stability
B. Internal reliability
C. Inter-observer consistency
D. External validity
Answer: B
Answer is: A
Answer is: B
Answer is: A
Answer is = B
Test:
A test or quiz is used to examine someone's knowledge of something to
determine what he or she knows or has learned. Testing measures the
level of skill or knowledge that has been reached.
Measurement:
Educational measurement is the science and practice of obtaining
information about characteristics of students, such as their knowledge,
skills, abilities, and interests. Measurement is the process of assigning
numbers to events based on an established set of rules.
This testing is used to “diagnose” what a student knows and does not
know. Diagnostic testing typically happens at the start of a new phase of
education, like when students will start learning a new unit. The test covers
topics students will be taught in the upcoming lessons.
Teachers use diagnostic testing information to guide what and how they
teach. For example, they will plan to spend more time on the skills that
students struggled with most on the diagnostic test. If students did
particularly well on a given section, on the other hand, they may cover that
content more quickly in class. Students are not expected to have mastered
all the information in a diagnostic test.
Diagnostic testing can be a helpful tool for parents. The feedback my kids
receive on these tests lets me know what kind of content they will be
focusing on in class and lets me anticipate which skills or areas they may
have trouble with.
2. Formative Assessment
3. Summative Assessment
Summative assessment is aimed at assessing the extent to which the
most important outcomes at the end of the instruction have been
reached. But it measures more: the effectiveness of learning, reactions
on the instruction and the benefits on a long-term base. The long-term
benefits can be determined by following students who attend your
course, or test. You are able to see whether and how they use the
learned knowledge, skills and attitudes.
4. Benchmark Testing
This testing is used to check whether students have mastered a unit of
content. Benchmark testing is given during or after a classroom focuses on
a section of material, and covers either a part or all of the content has
been taught up to that time. The assessments are designed to let teachers
know whether students have understood the material that’s been covered.
6. Norm-referenced assessment
This compares a student’s performance against an average norm. This
could be the average national norm for the subject History, for example.
Other example is when the teacher compares the average grade of his or
her students against the average grade of the entire school.
7. Ipsative assessment
8. Self Assessment
Self-assessment is defined as 'the involvement of learners in making
judgements about their achievements and the outcomes of
their learning' and is a valuable approach to supporting student learning
9. Confirmative assessment
When your instruction has been implemented in your classroom, it’s still
necessary to take assessment. Your goal with confirmative assessments is
to find out if the instruction is still a success after a year, for example, and
if the way you're teaching is still on point. You could say that a
confirmative assessment is an extensive form of a summative
assessment.
Exams
Portfolios
Final projects
Standardized tests
Summative assessments
Norm-referenced assessments
Criterion-referenced assessments
11. Assessment for learning
Answer is = C
Answer is = B
Answer is = B
Answer is = A
Answer is = B
Answer is = A
Answer is = A
Answer is = D
Answer is = C
Answer is = D
Answer is = A
Answer is = D
Answer is = B
Answer is = C
21. Objective type question have advantage over essay type because
such questions?
A. Are easy to prepare
B. Are eay to solve
C. Are easy to mark
D. None
Answer is = C
22. In multiple choice items the stem of the items should be?
A. Large
B. Small
C. Meaningful
D. None
Answer is = C
Answer is = A
An anecdotal record is a detailed descriptive narrative recorded after a
specific behavior or interaction occurs. Anecdotal records inform
teachers as they plan learning experiences, provide information to
families, and give insights into identifying possible developmental delays.
Answer is = B
1. The median of 7, 6, 4, 8, 2, 5, 11 is
A. 6
B. 12
C. 11
D. 4
Answer A
Median is a statistical measure that determines the middle value of a
dataset listed in ascending order (i.e., from smallest to largest value).
How to Find the Median?
The median can be easily found. In some cases, it does not require any
calculations at all. The general steps of finding the median include:
1. Arrange the data in ascending order (from the lowest to the largest
value).
2. Determine whether there is an even or an odd number of values in
the dataset.
3. Considering the results of the previous step, further analysis may
follow two distinct scenarios:
4. If the dataset contains an odd number of values, the median is a
central value that will split the dataset into halves.
5. If the dataset contains an even number of values, find the two
central values that split the dataset into halves. Then, calculate the
mean of the two central values. That mean is the median of the
dataset.
Answer is: C
Answer is: D
4. The average of all observations in a set of data is known as
A. median
B. range
C. mean
D. mode
Answer is: C
Answer is = A
Answer is = D
Answer is = D
Answer is = D
Answer is = A
Answer is = D
Answer is = A
Answer is = A
Kuder-Richardson Formula 20, or KR-20, is a measure reliability for a test
with binary variables (i.e. answers that are right or wrong). Reliability
refers to how consistent the results from the test are.
Answer is = A
Table of specification is a chart that provides graphic representations of
the content of a course or curriculum elements and
the educational outcomes/objectives by learning outcomes/objectives
Bloom's taxonomy and its level and weightage, methods of instruction ,
assessment plan is added keeping in mind the contents, learning
outcomes, weightage and time spent on instruction.
14. ”table of specification” helps in?
A. Test development
B. Test Construction
C. Test Administration
D. Test Scoring
Answer is = A
Answer is = D
Answer is = D
Answer is = B
18. The item in the column for which a match is sought is?
A. Premise
B. Response
C. Destructor
D. None
Answer is = A
Answer is = B
Answer is = C
Answer is = A
22. The incorrect options in M.C.Q are?
A. Answer
B. Premise
C. Response
D. Distractor
Answer is = D
23. The type of essay item in which contents are limited is?
A. Restricted Response Questions
B. Extended Response Questions
C. Matching items
D. M.C.Q items
Answer is = A
Answer is = B
Answer is = A
26. Which one is not the type of test of test by purpose?
A. Standardized Test
B. Essay Type Test
C. Criterion Referenced Test
D. Norm referenced test
Answer is = B
Answer is = C
Answer is =A
Answer is: A
30. What is interview called when interviewee are more than one:
A. Group interview
B. Panel interview
C. Structural interview
D. Focused interview
Answer is: A
31. The planned interview is:
A. Group interview
B. Panel interview
C. Structural or structured interview
D. Focused interview
Answer is: C
Answer is: D
33. Which type of test tends to have lower reliability?
A. True false
B. Completion
C. Matching
D. Essay
Answer is: D
34. Most of the tests used in our schools are:
A. Intelligence tests
B. Achievement tests
C. Aptitude tests
D. Personality tests
Answer is: B
Answer is: D
Part IV
Answer is: A
Answer is: B
Answer is: A
5. The most comprehensive term used in the process of educational
testing is called:
A. Test
B. Interview
C. Measurement
D. Evaluation
Answer is: D
6. Process of quantifying given traits, achievement or performance of
someone is called:
A. Test
B. Measurement
C. Assessment
D. Evaluation
Answer is: B
7. A collection of procedure used to collect information about students’
learning progress is called:
A. Measurement
B. Assessment
C. Evaluation
D. All of the above
Answer is: B
8. The process of collection, synthesis, and interpretation of information
to aid the teacher in decision making is called:
A. Test
B. Measurement
C. Assessment
D. Evaluation
Answer is: D
9. Test items in which examinees are required to select one out of two
options in response to a statement are called:
A. Multiple choices
B. Matching items
C. Alternate response items
D. Restricted response items
Answer is: C
ALTERNATIVE- RESPONSE TEST (true/false test) - Consist of a declarative
statements that the student is asked to mark true or false, right or wrong,
correct or incorrect, yes or no, fact or opinion, agree or disagree or the
like.
10. A brief written response is required in:
A. Short answer type items
B. Restricted response items
C. Extended response items
D. Completion type items
Answer is: A
11. Topics of limited scope are assessed by:
A. Short answer type items
B. Restricted response items
C. Extended response items
D. Completion type items
Answer is: B
12. Tests developed by a team of experts are termed as:
A. Teacher made tests
B. Standardized tests
C. Board tests
D. Published tests
Answer is: B
13. A standardized achievement test has definite unique feature,
including:
A. A fixed set of items
B. Specific directions for administration and scoring the test
C. Answer keys
D. All of the above
Answer is: D
14. High technical quality is assured in:
A. Teacher made tests
B. Standardized tests
C. Achievement tests
D. Published tests
Answer is: B
15. The test designed to measure the number of items an individual can
attempt correctly in a given time is referred type of test as:
A. Power
B. Supply
C. Achievement
D. Speed
Answer is: D
16. An aptitude test measures:
A. Overall mental ability
B. Attained ability
C. Present attainment
D. Potential Ability
Answer is: D
Potential Ability: It dictates the maximum that a person's
Current Ability attribute can ever rise to, and therefore how good they
can possibly become.
17. The quality testing in education is only possible by using:
A. Achievement test
B. Intelligence test
C. Aptitude test
D. Standardized achievement test
Answer is: D
18. A test designed to know the students’ position in a group is called:
A. Criterion referenced
B. Norm referenced
C. Achievement
D. Aptitude
Answer is: B
19. The scores of a student in a paper is:
A. Test
B. Measurement
C. Evaluation
D. All
Answer is: B
20. A test answers the question:
A. How much
B. How many
C. How well
D. All of the above
Answer is: C
21. Measurement answers the question:
A. How much
B. How many
C. How well
D. All of the above
Answer is: A
22. Which of the following is not a formal assessment?
A. Assignment
B. Paper
C. Quiz
D. Discussion
Answer is: D
23. Which of the following is not an informal assessment?
A. Assignment
B. Observation
C. Discussion
D. All of the above
Answer is: A
24. Prerequisite skills needed by students to succeed in a unit or course
are evaluated by:
A. Placement assessment
B. Formative assessment
C. Diagnostic evaluation
D. Summative evaluation
Answer is: C
25. Grades in assessment are:
A. Provide data for parents on their children’s progress
B. Certify promotional status and graduation
C. Serve as an incentive to do school lesson
D. All of these
Answer is: D
26. Your principal has asked that you create a chart showing that all of
the quiz items are related to the Sunshine State Standards. What type of
validity is he investigating?
A. Content
B. Construct
C. Criterion
D. All of the above
Answer is: A
Construct validity is "the degree to which a test measures what it claims,
or purports, to be measuring."
Content validity assesses whether a test is representative of all aspects of
the construct. To produce valid results, the content of a test, survey or
measurement method must cover all relevant parts of the subject it aims
to measure.
Criterion validity is an estimate of the extent to which a measure agrees
with a gold standard (i.e., an external criterion of the phenomenon being
measured). The major problem in criterion validity testing, for
questionnaire-based measures, is the general lack of gold standards.
27. A teacher created two forms of the final exam so that students sitting
next to each other could not look at their neighbor's test. What sort of
reliability evidence might she gather to make sure they are equal
assessments?
A. Alternate form reliability
B. Internal consistency
C. Interrater reliability
D. Test re test reliability
Answer is: A
28. Ms. Smith asked Mr. Jones to review the questions on her social
studies quiz. What type of measure was she worried about?
A. Alternate form reliability
B. Internal consistency
C. Construct Reliability
D. Content validity
Answer is: D
29. What is the primary purpose of assessments?
A. Inform parents of student progress
B. Provide feedback to help students succeed
C. Allow schools to compare progress
D. Enable teachers to test strategies
Answer is: B
30. Are all assessments tests?
A. Yes
B. No
Answer is: B
31. Which type of assessments have students conducting research in the
field and experiments?
A. Authentic assessment
B. Summative assessment
C. Formative assessment
D. Performance Assessment
Answer is: A
Authentic assessment is the idea of using creative learning experiences to
test students' skills and knowledge in realistic situations. Authentic
assessment measures students' success in a way that's relevant to the
skills required of them once they've finished your course or degree
program.
32. Which assessment requires students to demonstrate their knowledge
through performing specific tasks?
A. Authentic assessment
B. Summative assessment
C. Formative assessment
D. Performance Assessment
Answer is: D
33. Which assessment is it when you apply what you have just learned?
A. Authentic assessment
B. Summative assessment
C. Formative assessment
D. Performance Assessment
Answer is: A
34. A process to identify students’ learning styles, learning difficulties in
order to enhance their learning is called
A. Assessment
B. Evaluation
C. Measurement
D. Test
Answer is: A
35. Use of unfamiliar vocabulary or sophisticated term is a good tip to
make a good test question.
A. True
B. False
C. Can’t say
D. None of the above
Answer is: B
36. Predict how well a student is likely to do in a certain school subject
A. Diagnostic tests
B. Prognostic tests
C. Norm referenced tests
D. Criterion referenced tests
Answer is: B
Part V
Answer is: d
Verbal reasoning is the ability to understand and logically work through
concepts and problems expressed in words. Verbal reasoning tests tell
employers how well a candidate can extract and work with meaning,
information and implications from text.
2. The tests which use pictures or symbols are termed as:
A. Performance tests
B. Ability tests
C. Non verbal tests
D. Verbal tests
Answer is: c
Non verbal tests, such test are also called diagrammatic or abstract
reasoning tests. Non-verbal reasoning involves the ability to understand
and analyze visual information and solve problems using visual reasoning.
Which of the following are projective techniques or personality
assessment techniques?
A. Indirect open ended questioning
B. Semi structured interview
C. Both of the above
D. None of the above
Answer is: C
Which of the following are the tools of psychological assessment?
A. Port folio
B. Case history
C. Behavioral observations
D. All of the above
Answer is: D
Overt and Covert behaviour tests are included in the category of:
A. Aptitude tests
B. Achievement tests
C. School Tests
D. Psychological tests
Answer is: D
Overt and Covert Behaviour:
Psychologists often classify behaviors into two categories: overt and
covert. Overt behaviors are those which are directly observable, such as
talking, running, scratching or blinking. Covert behaviors are those which
go on inside the skin. They include such private events as thinking and
imagining.
Following is the type of a Personality Test:
A. Structured
B. Projective
C. Measured
D. Both A and B
Answer is: D
Answer is: A
Answer is: D
The degree to which test items correlate with each other is called:
A. Parallel or alternate form reliability
B. Inters corer consistency
C. Split Half method
D. Inter-item Consistency
Answer is: D
Answer is: A
Answer is: A
The greater the number of reliable test items, the higher the reliability
will be:
A. True
B. False
Answer is: A
Answer is: D
Items for which equally able persons from different cultural groups have
different probabilities of success is called:
A. Item difficulty
B. Item differentiability
C. Item validity
D. Item Bias
Answer is: D
Binet Scales and Wechsler Scales are the type of test tools for the
measurement of:
A. Intelligence
B. Performance
C. Skills
D. Knowledge
Answer is: A
The Stanford–Binet Intelligence Scale is now in its fifth edition (SB5) and
was released in 2003. It is a cognitive ability and intelligence test that is
used to diagnose developmental or intellectual deficiencies in young
children. ... The test originated in France, then was revised in the United
States.
The Binet Scales
Directed by the French government to develop a test for identifying
mentally retarded school children for special instruction
Considered the first intelligence test (1905)
Increase
Decrease
Both A&B
None of these
Answer is: A
Answer is: A
A. Mean
B. Mode
C. Range
D. Quartiles
Answer is: C
A. Test developing
B. Test administration
C. Test scoring
D. Test reporting
Answer is: A
Answer is: D
Answer is: D
Good distracter is that which:
Answer is: B
Answer is: D
A. Learning
B. Effort
C. Achievement
D. Knowledge
Answer is: C
A. Test item
B. Scores
C. Interpretation
D. Performance
Answer is: B
Answer is: C
In selected response items, students choose a response provided by the
teacher or test developer, rather than construct one in their own words
or by their own actions. Selected response items do not require that
students recall information, but only that they recognize the
correct answer.
Answer is: A
Answer is: C
Answer is: B
Answer is: C
A good item must have ___________ distractors.
A. High appealing
B. Low appealing
C. No appealing
D. All of the above
Answer is: A
Which of the following is the quality of a good stem in an item?