Part3 Single
Part3 Single
Part3 Single
classroom
Observation
Tool
Part 3
of a 5 Part Series:
A Practitioners Guide to
Conducting Classroom
Observations: What the
Research Tells Us About
Choosing and Using
Observational Systems to
Assess and Improve
Teacher Effectiveness
Megan W. Stuhlman, Bridget K. Hamre, Jason T. Downer, & Robert C. Pianta, University of Virginia
This work was supported by a grant from the WT Grant Foundation.
Choosing the
Right Observational Tool:
Factors to Consider
There are multiple published and unpublished classroom observation systems available for use, and deciding among them
is the first step in putting an observational system to work in
your organization. The primary advantage of using an existing observation tool is that it saves a great deal of time and
resources that would need to be put into developing an instrument with even minimal levels of reliability and validity for
predicting outcomes of interest.
When reviewing such tools, the following questions can be
used to guide the decision-making processes regarding which
observation system is best suited to the needs of a particular
organization.
skills that teachers bring into the classroom setting, and feedback and support around these behaviors is much more likely
to resonate with teachers and to function as useful levers for
helping them change their practice. It is advantageous for observational tools to provide information on their test-retest
reliability or the extent to which ratings on the tool are consistent across different periods of time (within a day, across days,
across weeks, etc).
A notable exception around the criteria of stability over time
as a marker for reliability is when teachers are engaged in
professional development activities or are otherwise making
intentional efforts to shift their practice. In these cases, as
well as in cases where an organizations curriculum is changing
or new program-wide goals are being implemented, a lack of
stability in observations of teacher behaviors may well represent true change in core characteristics and not just random
(undesired) fluctuation over time. In these cases, it would be
desirable to collect data on the extent of change and specific
areas where change is observed.
With regard to stability across observers, in order for results
of observations to be useful at scale, training protocols and
provision of scoring directions must be clear and extensive
enough to produce an acceptable level of agreement across
observers. If there is very low agreement between two or
more observers ratings of the same observation period, the
degree to which the ratings represent the teachers behavior
rather than the observers subjective interpretations of that
behavior or personal preferences is unknown.
Conversely, if two independent observers can consistently
assign the same ratings to the same patterns of observed
behaviors, this speaks to the fact that ratings truly represent
attributes of the teacher as defined by the scoring system, as
opposed to attributes of the observer. Therefore, users may
wish to select systems for which there is documented consensus among trained raters on whether or not or to what extent
teachers are engaging in the behaviors under consideration.
validated for your purposes, but this is truly essential for making
observational methodology a useful part of teacher evaluation
and support programs. If the teacher behaviors that are
evaluated in an observation are known to be linked with desired
student outcomes, teachers will be more willing to reflect on
these behaviors and buy in to observationally-based feedback,
teacher educators and school personnel can feel confident
establishing observationally-based standards and mechanisms
for meeting those standards, and educational systems, teachers,
and students will all benefit.
CASE STUDY # 1:
Choosing an Observation Tool for a
Specific Curricula
The Fairmont school district is considering mandating the
use of a new mathematics curriculum in all of its schools.
A small number of teachers who are pilot testing the new
curriculum have been trained on this approach to teaching mathematics and have been provided with all needed
materials. The district now wants to evaluate the extent
to which teachers using this curriculum are incorporating
high-quality strategies for teaching mathematics in comparison with the extent to which teachers in a control group
of schools are incorporating such strategies in teaching
mathematics in order to help them decide whether this
curriculum may be a good choice for district-wide use.
This school district may wish to use an observation protocol focused on research-based definitions and descriptions
of high-quality mathematics instruction or to supplement a
more generalized observational protocol with a contentspecific protocol for mathematics instruction.
CASE STUDY # 2:
Choosing an Generalized
Observational Tool
The Lakeview school district wishes to conduct an observational assessment of all teachers in order to gain a
better understanding of system-wide areas of strength and
challenge so that they can plan for in-service programming
and create individualized professional development plans
for teachers. Observers will conduct multiple observations
per day, so these observations will occur at different times
of day and during different activities for different teachers.
CASE STUDY # 3:
Choosing an Observational Tool for
Merit Pay and Tenure
Franklin County school district wants to outline a structure
for merit pay and tenure decisions that includes quality of
observed teaching behaviors as one of their components.
Therefore, the county decides to select an assessment
instrument that has shown a relationship to student outcomes at different levels of quality. In other words, one
with research support demonstrating that incremental
gains in the quality of the measured teaching practices result in incremental gains in student performance.
They then stipulate two options for sufficient practice in this
component: 1) teachers demonstrate high-quality teaching
practices in initial and follow-up assessments, or 2) teachers
demonstrate improvement over time in quality of teaching
practices/positive response to professional development
support as indicated by increasing scores over time.
Is the instrument
standardized in terms of
administration procedures?
Does it offer clear directions
for conducting observations
and assigning scores?
Once you have clarified your purpose and goals in conducting
classroom observations, it is important to select an observation system that provides clear instructions for use, both in
terms of how to set up and conduct observations and how to
assign scores. This is an essential component of a useful observation system: without standardized directions to follow, different people are likely to use different methods, which severely
limits the potential for agreement between observers when
making ratings, and thus hampers system-wide applicability.
There are three main components of standardization that users may consider evaluating in an observation instrument:
1. training protocol;
2. observation protocol;
3. scoring directions
Training Protocol. With regard to the training protocol, are there specific directions for learning to use the instrument? Is there a comprehensive training manual or
users guide? Are there videos or transcripts with gold standard scores available that allow for scoring practice? Are
there other procedures in place that allow for reliability
checks such as having all or a portion of observers rate the
same classroom (live, via video, or via transcript) to ensure
CASE STUDY # 4:
Importance of Observational
Protocols
A teacher preparation program is looking for a way to assess students performances at the beginning and end of
their student teaching work, during which time they are also
taking a course on effective teaching practice. They find
Observational Protocol A, which has six clearly defined,
theoretically based, 10-point scales that observers use to
rate teacher practice. Several members of the faculty read
the definition of the six scales and agree that the teaching
behaviors the scale assesses are aligned with the course
objectives, as well as the broader goals of the program, and
therefore would be good targets for assessment. However,
the system does not include training or observational protocols or explicit directions for scoring. As a consequence,
it is used quite differently by two faculty members.
When Professor Jones makes observations, he has arranged the observation time in advance with the teachers.
He arrives at the appointed time, but does not begin the
observation until he can tell that the teacher is ready to
begin the lesson. He ends the observation as the teacher
ends the lesson. He takes detailed notes about the teachers practice along the six dimensions. When scoring, he
reasons that if he sees teachers engaging in the behaviors
under consideration several times, they should get full
The University of Virginia Center for Advanced Study of Teaching and Learning (CASTL) focuses on the quality of teaching and students learning. CASTLs aim is to improve
educational outcomes through the empirical study of teaching, teacher quality, and classroom experience from preschool through high school, with particular emphasis on
the challenges posed by poverty, social or cultural isolation, or lack of community resources.