Ms 96 03
Ms 96 03
Ms 96 03
F A M I LY
PLANNING
PROGRAMS
WITH
A D A P TA T I O N S F O R
REPRODUCTIVE
H E A LT H
Bertrand
Magnani
Rutenberg
EVALUATING
FAMILY PLANNING
PROGRAMS
WITH
ADAPTATIONS FOR
REPRODUCTIVE
HEALTH
Jane T. Bertrand
Robert J. Magnani
Naomi Rutenberg
T h e
EVALUATION
Project
September 1996
2
LIST OF ACRONYMS
3
TABLE OF CONTENTS
Acknowledgments 2
List of Acronyms 3
Chapter
I Overview of Evaluation 6
Why Evaluate
Objectives of the Manual
Intended Users of the Manual
Family Planning or Reproductive Health
Scope and Focus of Evaluations
Why Another Evaluation Manual
Organization of this Manual
References 98
Appendix
Regression Formats for Cross–sectional and Panel Multilevel Models 100
Overview of Evaluation
Chapter I
Overview
of Evaluation
■ Why Evaluate
■ Objectives of this Manual
■ Intended Users of the Manual
■ Family Planning or Reproductive Health
■ Scope and Focus of Evaluations
■ Why Another Evaluation Manual
■ Organization of this Manual
6
Overview of Evaluation
Chapter I
O V E R V I E W O F E V A L U AT I O N
7
Overview of Evaluation
8
Overview of Evaluation
In short, this manual draws on the extensive The specific target population will vary in different
evaluation experience from the field of family settings and for different types of interventions:
planning, much of which is applicable to ■ all women of reproductive age (e.g., family
the broader range of reproductive health
planning;
interventions. It is intended as a companion
volume to two other publications of The ■ all sexually active adults (e.g., integrated family
EVALUATION Project, the Handbook of Indicators planning — STD/HIV prevention); or
for Family Planning Program Evaluation (Bertrand ■ youth aged 10 –19 (e.g., adolescent programs).
et al., 1994) and Indicators of Reproductive Health
Program Evaluation (Bertrand and Tsui, 1995). This manual focuses on the evaluation of
These two documents provide a menu of programs that are national in scope, although
indicators for evaluation interventions in the areas many of the techniques can also be used to
of family planning, safe pregnancy, breastfeeding, evaluate smaller–scale programs or projects. It
STD/HIV prevention, women’s nutrition, and addresses the key question of most program
adolescent reproductive health services. By administrators and donor agency staff: has the
contrast, the current manual provides guidelines program achieved its objectives in terms of change
in designing an evaluation plan (that will at the population level? It goes an additional step
incorporate those indicators) for monitoring and in asking to what extent the observed change is
evaluating interventions. attributable to the program.
Although “results” are of tantamount impor-
tance, a comprehensive evaluation will also
SCOPE AND FOCUS OF EVALUATIONS examine the processes involved in carrying out the
program. Historically, family planning program
Evaluations vary greatly in scope and focus. For evaluation has tended to focus heavily on
example, the target area may be defined as: quantitative outputs at the program level (e.g.,
number of new acceptors, couple–years of
■ the entire country;
protection [CYP]) or outcomes at the population
■ an entire region or state; or level (e.g., level of contraceptive prevalence, total
■ a specific city or location. fertility rate [TFR]). However, this approach to
evaluation treats the program as a “black box.” If
the expected results are not obtained, it provides
Evaluation can focus on different program
little insight into the reasons. One does not know,
components:
for example, what factors contributed to the poor
■ inputs, results: inadequate access to service, poor quality
■ processes, of care at service delivery points (SDPs) in the
system, lack of information among the target
■ outputs, and population, stockouts (lack of commodities) in the
■ outcomes. system? Similarly, if the program is successful, one
has little knowledge of what contributed to the
Measurements can be taken at: success.
In sum, a comprehensive evaluation will
■ the population level (e.g., among a random
examine not only the quantitative outcomes that
sample of the general population), or
indicate progress toward program objectives; it
■ the program level (e.g., among clients or will also evaluate the inner workings of the
participants in a given program). program in terms of functional areas (manage-
ment, training, commodities and logistics,
Different techniques are used to collect and information, education and communication [IEC],
analyze the data: research/evaluation) and service adequacy (access
to services and quality of care). This manual
■ quantitative, or
focuses on both process and outcomes, with
■ qualitative. particular emphasis to the latter. Outcomes
9
Overview of Evaluation
continue to be the primary concern of many 1992; Buckner et al., 1995). Why then do we need
program administrators and donor agencies, a new “how–to” document for evaluating FP
especially in an environment of shrinking programs? This document differs from previous
resources and greater accountability. evaluation texts in several ways:
Figure I–1
Methodological Approach:
How ■ Study design
■ Indicators
■ Data sources
Implementation Plan:
Who, When,
with What Funds ■ Individuals and institutions responsible for different
parts of the evaluation
■ Budget
10
Overview of Evaluation
11
Defining the Scope of Evaluation
Chapter II
12
Defining the Scope of Evaluation
Chapter II
D E F I N I N G T H E S C O P E O F E V A L U AT I O N
13
Defining the Scope of Evaluation
measure their contribution to the expected stated in measurable terms. In this case, part of
change in population–level outcomes; thus, their the process of establishing the objectives is to
objectives are often stated in terms of expected assist in stating the program objectives in terms
results of activities at the program level. that lend themselves to evaluation; that is, to
“operationalize” the objectives.
Evaluators are sometimes faced with a situation
where the objectives of the program are not
DESCRIBING HOW
DEFINITION: THE PROGRAM “SHOULD” WORK
Program Program Components
In its broadest conceptualization, a family
The different types of organized activity common to family planning
planning program can be viewed in terms of
institutions can be classified as follows. This manual focuses primarily
four distinct elements: inputs, process (or activi-
on the first definition, but many of the evaluation principles and tools
ties), outputs, and outcomes (Veney and
apply across all three.
Gorbach, 1993).
National Family Planning Program ■ Program inputs refer to the set of resources
(i.e., personnel, facilities, space, equipment and
Definition: All organized activities designed to promote family
supplies, etc.) that are the raw materials of the
planning in the public, private voluntary, and commercial
program.
sectors in a given country.
14
Defining the Scope of Evaluation
components: intermediate outcomes, 3 and services are delivered (e.g., provider surveys,
long–term outcomes. observation of provider–client interaction, retail
audits, mystery clients) or from a follow–up study
➤ Intermediate outcomes are the set of results
of clients.
at the population level that are closely and
clearly linked to program activities and
program level results. The most familiar inter- 3 Alternatively, these are referred to as “short– to mid–
mediate outcome of a family planning range outcomes.”
program is contraceptive use. Changes in
intermediate outcomes generally occur DEFINITION:
within 2–5 years of program inception. Project
➤ Long–range outcomes refer to the set of
results at the population level that are Project (within an institution)
long–term in nature and are produced
Definition: A specific set or cluster of activities with specific objectives
through the action of intermediate outcomes.
that contribute to the overall objectives of the institution.
Examples of long–term outcomes of family
The different projects of an institution, which may
planning programs are changes in fertility
be funded from different sources, collectively constitute
and maternal–child health status. Although
its program. A project often has specific attributes in
health and fertility rates can change abruptly
terms of:
in response to external forces, there is gener-
ally a considerable time lag (5–10 years) ■ target population (e.g., adolescents, men);
between the inception of the program and
■ method promoted (e.g., long–acting methods);
the observance of change in these rates.
Inputs (program resources) are fed into ■ type of service delivery mechanism used
processes (program activities), which in turn (e.g., community based distribution [CBD], traditional
produce outputs (program results) and ultimately midwives); and
outcomes (changes in population behavior). ■ constellation of other health and social services
provided concurrently (e.g., prenatal).
Figure II–1 One may also hear reference to the “population program” of a donor
agency. For example, multi–lateral agencies such as the United Nations
Relationship between Population Fund (UNFPA) and IPPF, and bi–lateral development
Program Components agencies of certain countries (USA, Japan, Germany, England,
Sweden, Canada, among others) support a range of activities in the
population sector in different countries. A donor agency often provides
Input ➔ Process ➔ Output ➔ Outcome funding for one or more projects within an institution, and in isolated
cases for the entire program of the institution. Although the activities
Program Population funded by a given donor may differ markedly in nature (e.g., conducting
based based a census, supporting observational travel for key decision–makers, pur-
chasing commodities), the portfolio of population–related activities of a
donor agency is often referred to as its “population program.” This
manual addresses the evaluation of donor funding to the extent that
The first three — inputs, processes, and outputs— these funds are channeled into a family planning program or project in
relate to activities and results at the program level. a specific developing country.
Inputs, processes, and outputs are measured
with program–based or facility–based data (see Programs and projects are closely related. The set of multiple projects
Figure II–1). Program–based data come from of a given institution collectively constitute the institutional program.
routine data collection (e.g., service statistics, The various institutional programs collectively make up what we term
client and other clinic records, administrative “the national program.” In turn, these programs are generally funded
records, commodities shipments, sales) as well in part by the government and possibly one or more donor agencies.
as information that is collected on–site where
15
Defining the Scope of Evaluation
Outcomes (intermediate and long–term The flow diagram (conceptual model) in Figure
changes at the population level) are measured II–2 indicates the content (factors, activities,
with population–based data collected from the results, etc.) that are included under each of
catchment area or social group that the program the broad categories of input, process, output,
seeks to benefit. This may be a country, a region, and outcome.
or a particular sub–group of the population
Figure II–2 is useful in that it shows the relation
(e.g., adolescents).
between program components and the terms
input–process–output–outcome. The diagram
Describing Causal Linkages in Figure II–2 is appealing because of its simplicity;
Once the program objectives are established, it is however, it masks the complexity of real
important to define how the activities in the programs. For example, “planning” and
program are expected to achieve these objectives. “implementation” entail an array of interrelated
The expected causal sequence is shown in its activities from the different functional areas
simplest form in Figure II–1 on the previous page. (management, training, commodities/logistics,
Figure II–2
Organizational Functional
Implementation Institutionalization
Resources Outputs
Donor Service
Assistance Outputs
Service
Contraceptive Use
Utilization
Fertility and
Other Impacts
Source: Tsui, A.O. and P. D. Gorbach, 1996. Framing Family Planning Program Evaluation: Cause, Logic and Action.
The EVALUATION Project, University of North Carolina at Chapel Hill.
16
Defining the Scope of Evaluation
17
Defining the Scope of Evaluation
Figure II–3
Other
Intermediate
Societal Value and FP Demand Variables Fertility
and Demand ■ Spacing ■ Wanted
Individual for Children ■ Limiting ■ Unwanted
Factors
Contraceptive
Practice
Figure II–4
■ Sectoral
■ Access
■ Commodity
Integration
■ Quality
Acquisition/
■ Delivery
■ Image/
Distribution
Strategies ■ I–E–C
Acceptability
Political and
Administrative ■ Public–Private ■ Research and
Larger Societal
and Political
Governance Factors
18
Defining the Scope of Evaluation
expected outcomes relate not only to fertility but One of the main objectives of this manual is
also to maternal and child health status and the to clearly differentiate program monitoring and
satisfaction of individual reproductive intentions impact assessment. As reflected by the illustrative
(as measured by the HARI index4). questions in Figure II–5, program monitoring ad-
dresses a number of different questions, one of
which is: did change occur? However, without im-
ESTABLISHING THE OBJECTIVES pact assessment, one can not answer the question:
OF THE EVALUATION did change occur because of the program?
Alternative Types of Evaluation
There are different types of evaluation, each
with a different purpose, as outlined in Figure II–5. 4 HARI is an acronym for “Helping Individuals Achieve
In designing an evaluation strategy, the evaluator
Their Reproductive Intentions” (Jain and Bruce, 1994).
needs to identify the key question(s) that he/she It measures the extent to which members of the target
wishes to answer and thus the type of evaluation population achieve their reproductive intentions (e.g.,
to conduct. to have another child, to avoid further pregnancy).
Figure II–5
Types of Evaluation
Needs Assessment What should the program include and how can it best be delivered to meet the needs of
the target group?
available to the program in the quantities and at the times specified by the
program plan?
Processes
■ Were the scheduled activities carried out as planned?
■ How well were they carried out?
Outputs
■ Did the expected changes occur at the program level, in terms of:
➤ access to services
➤ quality of care
➤ service utilization
Outcomes
■ Did the expected change occur at the population level (not necessarily
Impact Assessment What and how much change occurred (at the program– or population–level) that is
attributable to the program?
19
Defining the Scope of Evaluation
The remainder of this chapter compares and ■ to track changes in the services provided
contrasts program monitoring versus impact (service outputs) and the desired results.
assessment. Monitoring includes measuring the current
status and change over time in any of the
Purposes of Program Monitoring
program components.
Monitoring 5 refers to a varied set of evaluation
techniques, all of which measure some aspect of At the program level:
program performance. There are two main ■ Inputs
purposes of program monitoring: ■ Outputs: Functional outputs, Service outputs
■ to improve programs by identifying those (or service adequacy), Service utilization
aspects that are working according to plan
and those that are in need of mid–course 5 The monitoring of activities related to program–level
corrections, and variables is also called “process evaluation.”
Figure II–6
Inputs What types and level of resources Were qualified personnel What was the unit cost of
were allocated to this available to implement each resource? the total cost
intervention? activities? of each resource? the total
cost of the program?
Outputs How many persons were trained, Were trained staff able to What was the cost per
Functional areas by category of personnel? perform tasks competently participant–day of training?
(e.g., training) 6 months post–training?
Outputs Access: Did the number of SDP’s Quality: Did the quality of What was the added cost of
Service outputs providing services increase? care improve over time? increasing the numbers of
SDPs? of improving the quality
of care?
Outputs Did the number of new acceptors Has percent of clients What was the added cost
Service or CYP increase over time? returning for follow–up associated with the increase
utilization appointments increased? in new acceptors? with the
increase in CYP?
Outcome Was there change in the key Was there a change in the What was the increase in costs
Intermediate behavior (e.g., contraceptive key behavior (e.g., receiving associated with the change in
outcomes prevalence) among the quality of care) in the contraceptive prevalence?
target population? target population?
Outcome Did women achieve their Did fertility rates change What was the cost of achieving
Long–term reproductive intentions? over time? the fertility change?
outcomes
20
Defining the Scope of Evaluation
21
Defining the Scope of Evaluation
22
Defining the Scope of Evaluation
Figure II–7
USAID/Morocco Program
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
○
○
○
Increased income and
○
Sub–Goals Reduced population Prudent stewardship of
○
growth rate and increased Morocco’s environment enhanced economic
○
life expectancy participation of the lowest
○
two quintiles
○
○
○
○
○
○
Strategic
○
Reduced fertility and Promote sustainable use Expanded base of stake
○
Objectives improved health of children of scarce natural resources holders in the economy
○
under five and women of and healthier environment
○
○
child–bearing age
○
○
○
○
Program Increased use of FP/MCH Improved policy, regulatory Enabling policy and
○
○
Outcomes services and institutional framework regulatory environment for
○
for resource management creation and expansion of
○
○
and pollution prevention micro and small enterprises
○
○
○
○
Increased sustainability Adoption of improved Broadened access to financial
○
○
of FP/MCH services environmental practices resources and services
participation in opportunities
○
environmental protection
○
○
○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○
○
○
○
○
○
23
Defining the Scope of Evaluation
Figure II–8
24
Defining the Scope of Evaluation
Figure II–9
Previous texts and manuals on evaluating the impact of family planning programs have described a number of meth-
ods or approaches (United Nations, 1979, 1982, 1985; Hermalin, 1982; Sherris et al., 1985; Buckner et al., 1995).
However, the consensus developed by The EVALUATION Project is that only three approaches adequately demon-
strate causality (i.e., that observed change is attributable to the program). Described in fuller detail in Chapter IV,
these three preferred approaches are as follows:
Randomized experiment: the “pretest/posttest control group design”
This design is widely viewed as “the gold standard” for evaluating impact, because (when implemented appropri-
ately) it answers the question: “what would have happened in the absence of the program?” By comparing the
change that occurs in the experimental versus control populations, one can measure the amount of change
atributable to the program (“net” of confounding factors). The major limitation to this method relates to the feasibil-
ity and political acceptability of conducting experiments.
Data needs: One must obtain data from two groups: those who do and do not receive the intervention; moreover,
subjects (or groups of subjects, such as villages) must be randomly assigned to the experimental versus
control group. Data (either service statistics or survey data) are collected both “pre” and “post”
intervention for the two groups.
Although this design is often criticized for being difficult to implement, one promising approach is the “siting” of
interventions. Specifically, if one is at the beginning of a project cycle that will entail new interventions or a pilot study
prior to full–scale expansion, it may be possible to allocate randomly the facilities or areas that do and do not receive
the intervention, and subsequently compare the results from the two groups. This approach may be particularly use-
ful where resources are too limited to begin the intervention in all areas simultaneously.
Data needs: The data requirements for this type of analysis include surveys of the target population (which can be
the country as a whole) at two points in time conducted in the same sample clusters, as well as data on
the service delivery network at the same two points. (In some cases, it is possible to do similar analyses
from a study at one point in time if information on key variables can be reconstructed retrospectively for
a period several years earlier.) DHS–type surveys provide information on the target population. The
data on the service delivery network can be obtained from the service availability module (SAM) of the
DHS and/or a Situation Analysis study. Few countries currently have household and facility data for the
same set of sample clusters. However, many countries already have one DHS survey with a service
availability module and are planning for a second DHS; in this case, the addition of the second SAM
would provide the necessary data for this approach.
Note: The above methods are listed in an order intended to facilitate presentation, not necessarily in order of
preference. The conditions under which the different methods would be preferred are discussed in Chapter IV.
25
Methodological Approach: Program Monitoring
Chapter III
Methodological
Approach:
Program
Monitoring
26
Methodological Approach: Program Monitoring
Chapter III
This chapter assumes that the evaluator has address problems identified by the exercise. This
completed the first step in developing an evalua- qualitative technique represents a promising
tion plan: defining the scope of the evaluation approach to improving programs from the bottom
(i.e., to monitor program performance only, to upward, but it would not satisfy the needs of a
measure impact only, or to do both). regional program officer in tracking results from
the SDPs in the program network.
If the decision includes program monitoring,
the next steps in developing an evaluation plan In contrast, there are techniques that monitor
consist of defining the : achievements of a program but provide relatively
little insight into strengths and weaknesses. One
■ primary purpose of monitoring,
example would be the routine reporting of service
■ components (aspects) of program to monitor, statistics (e.g., number of new acceptors, number
of clinic visits, number of couple–years of protec-
■ study design(s),
tion, etc.) collected at the level of the SDPs and
■ indicators, aggregated at a central level. This type of informa-
■ sources of data, and tion is valuable in tracking trends over time, yet
alone does not indicate why the program is or isn’t
■ format for presenting results. achieving the desired results.
Program administrators and donor agencies
CLARIFYING THE PRIMARY are generally interested in both types of monitor-
PURPOSE OF MONITORING ing. Donor agencies almost always want the
quantitative data on “results,” but are increasingly
As mentioned in Chapter II, program monitoring interested in knowing that the program has some
has two main purposes: means of obtaining data, often qualitative in
■ to improve programs by identifying those nature (e.g., focus groups, in–depth interviews,
aspects that are working according to plan observation checklists, etc.), that will be used
and those that are in need of mid–course directly for program improvement. To this end
corrections, and program monitoring often consists of a combina-
tion of evaluation activities that collectively
■ to track (and demonstrate) results at the provide information on the program as a whole.
program or population level.
For example, certain evaluation techniques are IDENTIFYING THE
designed specifically to improve performance. In
COMPONENTS TO BE MONITORED
an effort to enhance quality of care in family plan-
ning programs, a number of countries have The decision as to which components of a
experimented with the Client–Oriented Provider– program to monitor depends in part on the
Efficient (COPE) technique, which is a self– primary purpose of the evaluation: to improve
assessment tool designed for use at the local level the program, to track results, or both (see Figure
(AVSC International, 1995). The data are NOT III–1). For the program manager, it is not a
aggregated to a higher level, but rather are question of “one or the other;” he/she will need
analyzed by the service providers to identify both. By contrast, donor agencies are often more
changes that can take place at the local level to interested in tracking program–level results
27
Methodological Approach: Program Monitoring
Figure III–1
Purpose Components
(although in general they strongly encourage and reporting of service statistics), others on a
implementing agencies to carry out evaluation periodic basis (e.g., simulated or “mystery” client
activities intended to identify areas in need of surveys to assess quality of care), and others as
improvement). a one–time exercise (e.g., analysis of cost per
CYP for different contraceptive methods).
Evaluation of the different service operations
(“functional areas”) is particularly useful when ■ The key management staff should take the lead
done early enough in the implementation process role in deciding what aspects of the program to
to allow for mid–course corrections. Service utili- monitor, not the evaluation specialist.7 In this
zation is generally tracked continuously over the way, evaluation truly serves the needs of the
life of the project. Outcomes, by contrast, are organization.
generally measured at (two or more) intervals to
To arrive at a decision regarding specific
measure change over time.
aspects of the program to evaluate, it is useful to
Certain points warrant mention: identify the range of possible topics and then
■ Program monitoring ideally employs both quan- prioritize them. Figure II–6 (on page 18 of the
titative and qualitative techniques. previous chapter) provides a useful framework for
considering options. It is replicated in Figure III–2
■ It is not practical to attempt a detailed evalua-
with the spaces left blank, which an organization
tion of ALL aspects of the program. Rather, it is
could use to identify and prioritize evaluation
important to prioritize those points for which
questions.
the information will be most useful to the orga-
nization and crucial to the success of the program.
■ The use of a conceptual framework to define 7 It is important, however, for the evaluation specialist
the pathways to achieving the desired results is to be closely involved in these discussions, to provide
equally applicable to projects as to national pro- information that may influence the decision (e.g., the
grams (see box on the following page). approximate costs and time required for different data
collection activities, alternative sources of information,
■ Evaluation activities are generally staggered. the biases inherent in different methods of data collec-
Some may be done routinely (e.g., collection tion, and so forth).
28
Methodological Approach: Program Monitoring
Chapter II described in detail the value of a conceptual framework among young people as they become at risk for HIV infection, but
in designing an evaluation of a national program. However, it is the scope of the evaluation (and the period for the project) may
equally useful to have a “road map” that describes the pathways not allow for the testing of this assumption. Rather, as
to achieving desired results for smaller–scale projects. Although reflected in the framework below, the project is designed to in-
the expected change is generally expressed in terms of a practice crease knowledge of HIV transmission, and the evaluation would
or behavior, that may not be the case in some programs, as shown in focus on this result. The framework is useful in clarifying the
the example below, on youth and AIDS prevention. pathways to change. Each intermediate result can in turn be
In this example the implicit assumption is that knowledge monitored to assure that the project is being implemented
obtained at a younger age will in fact influence behavior according to design.
29
Methodological Approach: Program Monitoring
Figure III–2
Inputs
Outputs
Functional areas:
➤ Management
➤ Training
➤ Commodities/
Logistics
➤ IEC
➤ Research/
Evaluation
Outputs
Service utilization
Outcome
Intermediate
outcomes
Outcome
Long–term
outcomes
comparison to similar programs or the same pro- consider the conceptual framework only an aca-
gram at an earlier date (e.g., trends in CYP). demic exercise, in fact it is extremely practical,
given that it identifies the areas for which the
At the risk of oversimplification, program moni-
evaluator may want to select indicators.
toring consists of measuring how well the
program is doing in one or more of the “boxes” A menu of possible indicators for evaluating
of the conceptual framework (see Figures II–3 family planning programs is provided in the
and II–4). The framework illustrates how the Handbook of Indicators for Evaluating Family
program should theoretically work in achieving Planning Programs (Bertrand et al., 1994). As
the desired results at the program and population family planning programs expand to include
levels. Program monitoring quantifies what other aspects of reproductive health, the potential
actually occurs at each level (of inputs, processes, number of other relevant indicators expands
outputs, and outcomes). Whereas some might (see Bertrand and Tsui, 1995, which describes
30
Methodological Approach: Program Monitoring
indicators for the areas of safe pregnancy, STD/ measures of contraceptive practice in these areas
AIDS, women’s nutrition, breast-feeding, and (though they may not be valid if the population in
adolescent reproductive health services). the area obtains contraceptives elsewhere).
Sample surveys generally provide unreliable
For a given evaluation, one should prioritize estimates of abortion due to response bias, in this
indicators based on specific program objectives case the reluctance of respondents to report
and select a manageable set of indicators to abortions. In addition, indicators that rely on
meet the particular needs of the situation. In subjective judgments, for example the quality of
short, it is essential to identify the key question(s) program leadership, may be unreliable, as differ-
to address in the evaluation and to select the ent evaluators may use varying standards to
indicators accordingly. measure that characteristic.
31
Methodological Approach: Program Monitoring
progress in decreasing stockouts, the appropri- ■ Amount of time allotted to the evaluation
ate indicator is the “percentage of SDPs that Example: Program managers might like to know
encountered a stock–out during the past 12 whether their new approach to counseling
months” (which could be tracked over time), not NORPLANT® clients results in longer continua-
“a decrease in the percentage of SDPs that had tion rates. However, if the evaluation of the
a stock–out.” counseling program is limited by the life of the
Indicators should be collected on a timely basis. project (part of which has presumably elapsed),
The indicator should provide a measurement for a it is impossible to ascertain the long–term
recent period or at least for the period during effects of this counseling.
which the intervention occurred; also, it should be ■ Financial support available for evaluation
available at appropriate intervals. Population–
based indicators are rarely available annually and Example: Many IEC directors would like to know
often refer to a period of several years before the the percentage of the target population reached
survey (for example, DHS estimates of fertility by a given campaign and the reaction of the
public to the messages. However, they may not
typically refer to the three–year period that pre-
have the resources for conducting a population–
cedes the survey). While routine program–based
based survey. There may be a trade–off between
data such as service statistics would seem to be
cost on the one hand and validity, reliability, and
a good source for current data, there is often
timeliness on the other hand.
considerable delay in their availability. Finally,
some program–based data are now being ■ Donor agency requirements
collected periodically through such methods as Example: The indicator “couple–years of protec-
Situation Analysis or the DHS Service Availability tion” has become the most widely used measure
Module. These instruments provide timely, but not of service utilization in USAID–funded programs,
continuous, measures of program functioning. because USAID (as well as IPPF) requires
recipient agencies to report this result.
Factors that Affect
the Selection of Indicators Use of Multiple Indicators
In an ideal world, the evaluator would systemati- For well–established indicators such as the
cally identify the indicators judged to be most Total Fertility Rate (TFR) and the Contraceptive
useful for a given evaluation and proceed to Prevalence Rate (CPR), single indicators are usually
collect or acquire the needed data. However, in sufficient. However, there are instances when it
the field setting where time, human, and financial may be advisable to use two or more indicators to
resources are in short supply, others factors inter- measure a given result. One such situation is
vene in the selection of indicators. The following where data quality is suspect; a given result
are common factors that enter into the decision. is more credible if the same trend can be demon-
strated across two or more indicators.
■ Availability of data needed to measure the
Secondly, when new program indicators are
indicator
being introduced, it is useful to have alternative in-
Example: To assess the effects of family dicators for a given category of result (e.g., to
planning programs on fertility and health out- measure the quality of the client–provider interac-
comes worldwide, it would be extremely useful tion). This provides the evaluator with a back–up
to have data for all countries on the sources of plan in the event that the data from one source do
funding (donor agencies, local taxes, client fees) not materialize or are judged invalid (e.g., respon-
for family planning and on costs of providing dents misunderstood the question). Finally, the
family planning services. However, such data do indicators in a given functional area often measure
not currently exist in readily accessible form. a chain of events, and the use of multiple
Moreover, it is unclear whether all governments indicators may be important to developing an
would be willing to open their financial records understanding of the dynamics along the chain.
to outside evaluators for the purpose of collect- For example, an IEC program (1) generates a
ing this information. certain number of messages via a certain number
32
Methodological Approach: Program Monitoring
of channels and (2) offers counseling to potential “presence.” For example, with regard to the
and actual clients who seek services in the expecta- indicator “absence of unwarranted restrictions
tion that members of the target population on users,” one could have a model program
will (3) hear messages about family planning, in this sense with one small exception. Should
(4) understand the main messages, (5) react posi- that exception count for enough to alter one’s
tively to the messages, (6) discuss the messages assessment?
with others, (7) develop a favorable predisposition
toward the behavior, such as contraceptive use, If faced with the problem of subjective judg-
(8) become an acceptor, and (9) continue the prac- ment calls, the evaluator must first decide
tice. At each step in the process the percent whether to retain the indicator. If so, then he/
following this chain can potentially decrease; to she should clarify the criteria used in arriving at
the evaluator it is indispensable to identify the the final score or assessment.
pattern of this response on the part of the target ■ The rules of measurement are clear but local
population. applications differ from the recommended
In sum, the selection of indicators is based on approach.
the purpose of the evaluation, to learn more
about a specific functional area, program output For example, the Handbook of Indicators for
or outcome, and in some cases to meet donor Family Planning Program Evaluation (Bertrand
agency requirements. The selection of indicators et al., 1994) recommends defining “number of
is dictated by the specific needs and interests acceptors new to the institution” as “new only
of those undertaking the evaluation. Different once.” That is, if a person drops out of the
types of evaluations may be staggered over the institution’s program for several years and even-
five–year (or longer) lifetime of a project. tually returns, then she would NOT be “new to
the institution.” However, since some organiza-
Operationalizing the Indicators tions don’t retain records after a five year period,
“Operationalizing the indicators” means identify- it may not be feasible to adhere to the recom-
ing how a given behavior or concept will be mended definition. In such cases, the evaluator
measured. In the best case scenario, an indicator should be very clear how the measurement used
is conceptually clear and lends itself to easy, differs from standard or recommended practice.
unequivocal measurement. An example would be ■ The indicator is conceptually clear but the “yard-
the number of persons trained in a given year, by
stick” for measuring it is not.
category of personnel (physician, nurse, commodi-
ties/logistics specialist, etc.). At first blush, the “cost of one month’s supply of
Unfortunately, very few of the indicators are so contraceptives as a percentage of monthly
simple and straightforward. Rather, even after the wages” appears to be clear and measurable.
evaluator has identified the indicators to be used, However, as one applies the indicator, certain
he/she tends to be faced with one or more of the questions may arise. For example, different
following problems in operationalizing them. contraceptive methods have different costs.
What cost should be included: the average cost
■ The measurement of an indicator requires of all methods available? The average cost
subjective judgment. weighted by the proportion using the different
Many agree that one of the single most impor- methods? Moreover, what numbers should be
tant factors in the successful family planning used if the cost of methods varies over the course
programs currently in existence is the “quality of of the year? What number should be used in the
program leadership.” To use this indicator, it is denominator if average monthly wage figures
imperative to define the characteristics that are outdated?
constitute “leadership,” but the final assessment To the extent that evaluators have access to
remains subjective. reports by others, it is useful to review how
Similarly, indicators requiring a judgment of “pres- fellow researchers and evaluators have handled
ence” or “absence” may be difficult, where it similar situations. In the absence of such infor-
is necessary to establish “how much” constitutes mation, the cardinal rule bears repeating:
33
Methodological Approach: Program Monitoring
34
Methodological Approach: Program Monitoring
Figure III–3
Independent Organizations
Government Offices (Universities, Research Firms, National Family
and Institutions Management Consultants) Planning Program
Supervision
■ Training
■ Research/
Evaluation
Service Statistics
Official Policy Documents
■ Utilization of Service
■ Distribution of
Contraceptives
Special Studies
■ Surveys
■ Observation
Administrative Records (by expert)
■ Course Evaluations
■ Commodities MIS
is the Service Availability Module (SAM), devel- administrators at the central level with important
oped and implemented under the DHS program. insights into the strengths and weaknesses of the
program throughout the country (or geographic
The two types of surveys differ (1) in the data
area under study). The Situation Analysis is also
collection instruments used and (2) in the popula-
used to obtain data on the quality of care in family
tion of SDPs that they describe. For Situation
planning programs.
Analysis, the instruments for collecting the data
include a series of modules (e.g., inventory of the The SAM is a community–based survey, conduct-
SDP, observation of provider–client interaction, ed in connection with the household level survey in
exit interviews with clients, interviews with service the DHS (in selected countries). For every sampling
providers). Descriptive results provide program cluster used in the study, key informants provide a
35
Methodological Approach: Program Monitoring
list of existing health/family planning facilities. for this data and in part, because commercial
Teams are then dispatched to collect data at the providers are competing with one another and
nearest (1) hospital, (2) clinic, (3) health center, may be reluctant to share information on the
(4) pharmacy, and (5) private doctor within a volume and quality of their services.
30–kilometer radius of the center of the cluster.8
Information on family planning activities in the
The information collected at each location covers
private, commercial sector have been measured in
the governing structure (private versus public),
some SAMs and Situation Analysis Studies as well
number and type of staff, infrastructure (equip-
as in surveys focused exclusively on private provid-
ment, type of construction materials), types of
ers. Additionally, some data sources unique to this
services provided, types of contraceptive methods
sector are available to measure the availability of
available, etc. This information is potentially useful
methods and services in the sector. These include:
for assessing the availability and adequacy of
family planning services for a given population, and ■ data on contraceptive shipments to distributors
it can be particularly important in linking changes and wholesalers;
(improvements) in the family planning supply envi- ■ sales at the retail level;
ronment to changes in prevalence over time.
■ audits of retail outlets; and
The difference in the sampling used for the two
types of surveys is as follows: the Situation Analy- ■ reports from detail men on the availability of
sis is based on a random sample of SDPs in a family planning services and methods from
country (which may be disproportionately located private doctors and clinics.
in urban areas), whereas the SAM data are col-
Providers in the commercial sector and
lected with respect to a random sample of women
supporters of social marketing and private sector
in the country. 9 Thus, the Situation Analysis
programs are also concerned with the quality of
measures the average SDP, whereas the SAM
services provided. Some of the techniques used to
measures the services available to the average
assess the quality of services are: (1) mystery
women in a given country.
shopper studies to determine whether retailers are
To date, facility surveys have been greatly promoting social marketing products and to
under–utilized for the purposes of evaluating fam- assess the quality of the information provided;
ily planning programs. However, their potential (2) consumer intercepts to assess consumer
utility in assessing impact (in connection with the satisfaction with the services received; and
individual interviews) is discussed in Chapter IV. (3) population–based surveys.
Cost studies can also be conducted at facilities.
These provide information on the cost of providing Special Studies
different services including acceptor and follow– Special studies are generally conducted to respond
up visits by method. Using information on visit to a specific need. They may employ quantitative
patterns and the number of CYP by method, these or qualitative research methods. The list of possible
data can be aggregated to determine the cost per special studies is long; illustrative examples
CYP. In addition, the cost of expanding acceptor include the following:
and follow–up visits as well as the number of CYP ■ a follow–up of sterilization clients to determine
can be calculated taking into consideration the
their level of satisfaction with the procedure;
amount of under–utilized capacity.
36
Methodological Approach: Program Monitoring
■ focus groups among adolescents attending a of DHS surveys have also included a sample of men
given program to assess whether it responds to (either an independent sample of men or a sample
their interests and needs; of husbands of women interviewed for the DHS).
■ a management audit of program documents; The DHS core questionnaire consisting of some
and 250 questions provides detailed information on
fertility and family planning, in addition to infor-
■ mapping of a community to show where eligible mation on maternal and child health, health
couples live and what method of contraception services utilization, and related topics (Robey et
is used. al., 1992).
Population–based Data We use the term “DHS–type surveys” to under-
The primary tool for collecting population–based score that there are other surveys similar to the
data for family planning programs is the DHS–type DHS in content and type of population studied.
survey. Following in the tradition of the World The Centers for Disease Control and Prevention
Fertility Survey (WFS) and the Contraceptive Preva- (CDC) have conducted a number of Reproductive
lence Survey (CPS), the Demographic and Health Health Surveys in selected countries of Latin
Survey (DHS) is generally conducted among a America and other regions of the world. Similarly,
representative sample of women of reproductive certain countries have conducted their own
age in a given country. In recent years, a number national–level surveys on fertility and related issues.
Figure III–4
120,000
100,000
Number of New Acceptors
80,000 Pill
IUD
60.000
Condom
20,000
0
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
Year
37
Methodological Approach: Program Monitoring
Figure III–5
100 100
80 80
60 60
40 40
20 20
n/a
0 0
Pills Condoms Depo** Foam IUD Norplant FP posters Pamphlets/other Health talk Health talk
Provera Tablets* displayed* IEC available* given incl. FP given
Source: Miller et al., 1996. “A Comparison of the 1995 and 1989 Kenya Situation Analysis Study Findings,” unpublished manuscript.
Figure III–6
2000
Other*
1800
1600 Condom
1400 IUD
1200 Pill
ligation, spermicide
800
and NORPLANT®
600
400
200
0
1979 1983 1987 1992
Survey and Year
38
Methodological Approach: Program Monitoring
Figure III–7
Other
Fees
Equipment
Supplies
Donors
Travel
Consultants
Government
Personnel
Actual Budget
Equipment —9.70%
Supplies —7.10%
Travel —7.30%
Personnel— 49.90%
45,000
40,000
35,000
30,000
25,000
20,000
15,000
10,000
5,000
0
Antenatal Family Immunization Training
Care Planning
PHC Service/Ac tivity
Source: Reynolds, J. 1993. Cost Analysis: Primary Health Care Management Advancement Programme, Module 8 Users Guide,
Washington, DC: Aga Khan Foundation and University Research Corporation, page 9.
39
Methodological Approach: Program Monitoring
Figure III–8
Access to Services Program records Number of SDPs Quarterly MOH and private
offering contra– FP association
ception in a defined
geographical area
Costs Financial data and other Cost of acceptor, One–time study Organization
program records follow–up, and responsible
discontinuation visits for study
by method
40
Methodological Approach: Program Monitoring
SUMMARIZING THE
METHODOLOGICAL APPROACH
41
Methodological Approach: Impact Assessment
Chapter IV
Methodological
Approach:
Impact Assessment
■ A Friendly Caveat
■ Overview of the Issues
■ Criteria Guiding the Choice of Methodological Approach
■ Preferred Approaches
■ Alternative Approaches
■ Summary
42
Methodological Approach: Impact Assessment
Chapter IV
M E T H O D O L O G I C A L A P P R O A C H : I M PA C T A S S E S S M E N T
43
Methodological Approach: Impact Assessment
methods intended for readers with limited the extent to which the evaluator can isolate
statistical background, which nonetheless would program effects from the influences of other
allow them to judge the merits of using a given factors; and methods that are more efficient in
technique in their own setting (even if others were doing so are preferred.
to carry out the actual analysis). These general ■ Cost
descriptions are followed by a discussion of key
This refers to the costs of data collection and
methodological issues and a summary of strengths
analysis. Other things being equal, less costly
and weaknesses, intended to provide statisticians
methods are preferred. In most cases, however,
and demographers with further insights into
methods differ on other criteria as well as cost,
the methodological implications of different
and thus cost–benefit decisions must be made.
method choices.
■ Data requirements
Methods vary considerably in data requirements.
CRITERIA GUIDING THE CHOICE OF Aside from differences in the volume of data
METHODOLOGICAL APPROACH needed, some methods require data that
are more difficult to collect and/or are more
Given that the primary purpose of this chapter is
vulnerable to measurement error than other
to assist readers in choosing among alternative
methods, and thus increase the risk that
methods or approaches for measuring program
measurement error may obscure actual program
impact, it is useful to begin by specifying some
effects or exaggerate the magnitude of impact
criteria for making such choices. In assessing the
actually achieved.
various methods described in this chapter, the
evaluator should take the following criteria into ■ Insights into causal pathways
consideration: Methods vary considerably in the amount of
information they provide as to how program
■ Exposure to threats to validity10
inputs are transformed into outputs and
The single most important criterion in assessing outcomes as part of the impact measurement
a method is the validity of its estimates of process. Although not required for the measure-
program impact. While all methods are vulner- ment of impact, such information provides useful
able to some threats to validity, they vary insights as to how programs might be improved
considerably in terms of the number and types in subsequent program cycles.
of threats (i.e., confounding factors) to which
they are subject. ■ Types of outcome indicators used
Some methods have been designed specifically
■ Required assumptions
for the measurement of certain types of outcomes.
Assumptions are required of all methods. Some For example, a number of methods have been
methods require strong assumptions that are developed specifically to measure the
rarely valid in actual practice, while others fertility impact of family planning programs that
require assumptions that are weaker and more are not readily adaptable to measuring impact
likely to be valid. Methods requiring fewer and of other outcomes (e.g., health outcomes). Other
less stringent assumptions are to be preferred, methods are more versatile and may be used to
other things being equal. measure other types of program results, as well
■ Ability to isolate program effects
Ideally, measures of program impact will include
only results that are directly attributable to the
10 The term “validity” as used here means the extent
program. In most settings, factors such as the
to which measurements of program impact from
forces of socioeconomic development, multiple
a given study design constitute unbiased and
social programs, changing demographic struc- unconfounded measures of actual program impact.
ture, and the presence of non–program family In lay terms validity refers to the fact that one is
planning activities complicate attempts to actually measuring the phenomena one intends
measure program impact. Methods differ in to measure.
44
Methodological Approach: Impact Assessment
as results at different levels. Thus, the types of compelling results than the preferred methods,
outcomes of interest in a particular evaluation but are capable of producing defensible esti-
effort will in part dictate choice of method. mates of program impact under certain
circumstances.
■ Degree of program control required
The specific methods considered in this chapter
While certain types of research designs (e.g.,
and their classification into the two groups
randomized experiments and to a lesser extent
defined above are shown in Figure IV–1.
quasi–experimental studies) provide the stron-
gest evidence of program impact, they also
require more controlled conditions in terms of Figure IV–1
the manner in which the program being evalu-
ated and other interventions being implemented Classification of Methods on the Basis of
during a given study period are undertaken. their Overall Utility as “Stand–Alone” Meth-
Other approaches falling under the heading of ods for Impact Assessment
“non–experimental” studies do not require that
programs be implemented in specific ways in
order to provide valid measures of program Preferred Methods
impact, but generally require larger quantities ■ Randomized experiments
of data and more complicated analysis in order ■ Quasi–experiments
to produce valid findings. The degree to which
program activities can realistically be controlled ■ Multilevel regression methods
by implementing agencies so as to facilitate
impact measurement is thus an important Alternative Methods
factor in choice of study design. ■ Decomposition
(proximate determinants model)
■ Technical/statistical skills and resources required
■ Prevalence method
While all of the methods considered require
basic research and statistical knowledge and
skills, some of the methods and approaches Note that the methods listed in Figure IV–1
require relatively advanced skills and in some exclude a number of methods that have been
cases specialized computer software. presented and/or applied elsewhere in the
literature (United Nations, 1979, 1982, 1985;
Thus, a fairly large number of factors need to Chandrasekaran and Hermalin, 1985; Lloyd and
be considered in deciding upon an approach for Ross, 1989; Buckner et al., 1995). These are not
measuring program impact. To facilitate the covered in the present document for several
choice of an appropriate method, we have classi- reasons. Some of these methods (e.g., the standard
fied methods for impact measurement into two couple–years of protection – SCYP, reproductive
categories, reflecting what the authors perceive to process analysis, and component projection meth-
be their overall strength based upon the criteria ods) are based upon facility–level data, and are
outlined above: thus limited in their capacity to measure program
■ Preferred approaches impact at the population level.11 The SCYP and
reproductive process analysis methods all
Methods falling into this category are viewed as require data that are only occasionally available on
being the strongest designs (for reasons indicated a country–specific basis. Other methods (e.g.,
below) and are recommended as a first choice standardization, generic decomposition, and fertil-
wherever possible. ity projection/trend analysis) are relatively crude
■ Alternative approaches
Where the use of preferred methods is not
possible, several useful alternative approaches 11 SCYP is not to be confused with conventional couple–
are available. These methods are based upon years of protection (CYP), which is widely used to
less rigorous designs and generally produce less track program outputs.
45
Methodological Approach: Impact Assessment
methods that often lead to ambiguous conclusions subjects or groups of subjects to be studied have
regarding program impact in actual practice.12 been chosen, some are assigned to the treatment
Finally, simulation is viewed as a generic method group and some to the control (or comparison)
for analysis that may be meaningfully used in group at random. Given sufficiently large sample
program and strategic planning and as a supple- size, randomization enhances the chances of
ment to the more robust methods, but it is not unambiguously isolating the effects of a program
especially useful as a stand–alone method of or intervention by distributing extraneous factors
impact evaluation. equally across comparison groups; that is, by
ensuring that the treatment and control groups
Accordingly, attention is limited in the present
are equivalent with respect to all factors other
document to methods and approaches that are
than exposure to the program being evaluated.
viewed by The EVALUATION Project as having the
This conveys an enormous advantage over other
best prospects for producing relatively “clean”
methods of measuring program impact. It is for
measures of program impact.
this reason that Rossi and Freeman (1993) refer
to the randomized experiment as “the flagship”
PREFERRED APPROACHES of evaluation.
Randomized Experiments Design and Analysis
Description
Two of the more commonly used randomized
It is widely accepted among evaluation researchers experimental designs are illustrated in Figure IV–2.
that the randomized or “true” experiment is the In the first, the “posttest–only control group
“gold standard” for measuring what has design,” it is assumed that randomization has
happened as a result of a program or intervention. produced treatment and control groups that are
The basic idea behind a randomized experiment is equivalent, and it is thus necessary only to compare
quite simple. In a randomized experiment, study outcome measures for the treatment and control
subjects or groups are assigned to “treatment” groups after the program has been operating for a
and “control” groups randomly; that is, once the sufficiently long period of time in order to assess
the impact of the program or intervention being
evaluated. In the second design, the “pretest–
Figure IV–2 posttest control group design,” measurements are
taken for both treatment and control groups prior
Diagram of Two Commonly to program implementation and again after a
Used Randomized Experiments period of time thought to be sufficient for the
program to have had its intended impact. By
Posttest–only control group design taking “before and after” measurements, the
researcher can subsequently correct for the fact
Random Assignment:
Time
Experimental Group X O1 12 It should be noted, however, that some of the
Control Group O2 excluded methods may be meaningfully used in
conjunction with the more robust methods. Stan-
dardization, for example, is often used as a first step
Pretest–posttest control group design
in an impact evaluation in order to determine the
Random Assignment: Time share of fertility change that is attributable to
changes in demographic structure, since this share of
Experimental Group O1 X O2 fertility change clearly cannot be attributed to family
Control Group O3 O4 planning program interventions. Similarly, simulation
techniques may be used to supplement the informa-
Where: tion obtained from certain methods (see, for
X = program or intervention introduction example, the discussion of multilevel regression
O = observations or measurements methods in Section IV) by indicating the magnitude
of change in outcome variables that may be ex-
pected from specified changes in program
Source: Campbell and Stanley, 1963
inputs or outputs.
46
Methodological Approach: Impact Assessment
that randomization may not have produced The Taichung experimental study described in
entirely equivalent groups.13 Figure IV–3 illustrates how the randomized experi-
ment can be used in an applied setting.
In both designs, the outcome measure(s) for
the control group provide(s) an estimate of what
would have been observed for the treatment
group had the program under study not been Figure IV–3
implemented. In the posttest–only design, an
estimate of program impact is provided by the Illustrative Example of the Use of a Randomized
difference in outcomes between the treatment Experiment for Program Impact Assessment
group and the control group, plus/minus an error
component that is taken into account as part of The Taichung experiment was designed to assess the impact of an
the statistical analysis; that is: effort to increase contraceptive awareness and use in the city of
Impact = (O1 – O2) +/– error Taichung, Taiwan during the early 1960s. Local areas, or “lins,” in the
city were randomly assigned (after geographic stratification with
where: O1 = outcome measure for the
regard to density zones) to one of four experimental groups: (1) Full
treatment group;
package, husband and wife: households in this group received home
O2 = outcome measure for the visits by health workers, mailings of information, and neighborhood
control group; and meetings; (2) Full package, wife only: same intervention as for the first
error = design and measurement group, excluding the home visit to the husband; (3) Mailings only; and
errors.14 (4) No intervention other than family planning posters that were
distributed throughout the city (i.e., the control group). Lins were
In the pretest–posttest design, program impact
allocated to experimental groups as follows: (1) n=427, (2) n=427, (3)
is measured by the difference between the
n=768, and (4) n=767. Pre–intervention levels of contraceptive use
observed change in outcome measures for the
treatment group less that for the control group, were assumed to be equal across the randomized groups, and thus the
plus/minus error: posttest–only control group design was used.
Impact = (O2 – O1) – (O4 – O3) The posttest observations of contraceptive acceptance rates (i.e., rates
+/– error per 100 married women aged 20–39 years) for the four experimental
groups for selected time periods (all density sectors combined) were as
where: O1 and O2 = pretest and posttest
follows:
outcome measures,
Contraceptive Acceptance Rates
respectively, for the
treatment group; Experimental Group 13+ months 29+ months
47
Methodological Approach: Impact Assessment
Several additional points regarding randomized they may be applied to assess program results at
experiments warrant mention. First, it should be several different levels; for example,
noted that while random assignment of individual ■ at the level of population–based outcomes (e.g.,
study subjects to experimental groups is quite
in terms of contraceptive prevalence or current
common in clinical trials and smaller–scale studies
fertility);
involving individual program components (e.g.,
the effects of improved counseling on contracep- ■ at the level of program outputs (e.g., improve-
tive continuation), individual assignment is more ments in service accessibility or quality, an
difficult and generally infeasible in large, popula- increase in numbers of new acceptors); and
tion–based studies. In such studies, randomization ■ at the level of functional areas of service delivery
is usually carried out at the group level; for (e.g., the effects of new staff training programs,
example, at the level of villages, municipalities, supervisory systems, or clinic operational proce-
districts, etc. The random assignment of groups of dures on service delivery).
study subjects to experimental groups is illustrated
in Figure IV–3.15 Program results may also be measured in
relation to costs; for example, at the program out-
Second, it is possible that several program put level, the increase in number of new acceptors
activities or interventions may be evaluated simul- may be related to the increase in costs for the
taneously in a randomized experiment by including different interventions or different packages. For
multiple treatment groups in the design, one for the functional areas, the effects of two training
each type or variant of “treatment.” This “facto- programs may be related to their costs.
rial” design is also illustrated in the example in
Strengths
Figure IV–3.
The primary strengths of randomized experi-
Third, it is not necessary for all treatment or ments may be summarized as follows:
benefits to be withheld from the control group in ■ Versatility
order for randomized experiments to be used. This
Randomized experiments may be used to assess
is an important point, since in population–based
the results of program activities at several differ-
studies of family planning program impact, it is
ent levels in addition to overall impact.
difficult indeed to find a population without access
to some family planning services (that is, a “pure” ■ High internal validity
control group). What will be measured in such This design is superior to all other designs for
cases, however, is differential or incremental measuring program results in terms of minimiz-
program impact; that is, the difference between ing threats to internal validity.
the identifiable activity(ies) that constitute the ■ Few assumptions required
intervention or “program” and other programs
The primary assumptions required are that:
that may be operating simultaneously. Although
(a) randomization has produced treatment and
this may seem an undesirable scenario to some,
control groups that are equivalent, (b) influences
the fact that a large percentage of national popu-
external to the study affect both groups equally,
lations in developing countries have access to
(c) all treatment groups (or group members)
some family planning services means that the
receive the same “intensity” treatment, and (d)
impact of new programs will be incremental to
assignment to experimental group does not in
those programs or services already in existence. In
itself alter the behavior of service providers or
this sense incremental impact is an appropriate
study subjects with respect to the outcomes
measure of what a program has accomplished.
under study.
Finally, it should be noted that randomized
experiments are generic research designs as
opposed to methods developed specifically to
measure a particular type of outcome of family 15 Note that where randomization is to be carried out at
planning programs (e.g., methods designed spe- the group level, the ideal configuration is to have as
cifically to measure fertility impact). Accordingly, many small–sized groups as possible.
48
Methodological Approach: Impact Assessment
49
Methodological Approach: Impact Assessment
influences that affect the experimental of choice in such undertakings. Given the limita-
groups differently undermine the validity of tions of randomized experiments for measuring
randomized experiments. Thus, in order for program results at the national level, however,
randomized experiments to be a realistic alternative approaches need to be considered.
option in measuring program impact at the
Quasi–Experiments
population level, it is essential to maintain con-
trol over the introduction of new interventions Description
over the life of the experimental study. The term “quasi–experiment” refers to a group of
➤ Variations in treatment experimental research designs in which study
subjects or groups of subjects are not randomly
Especially in large–scale programs such as
assigned. The most commonly–used quasi–
national family planning programs,
experimental designs, “constructed control
programs are often implemented differently
designs,” follow the same logic and involve the
across geographic areas and/or service
comparison of treatment and control subjects or
providers. For example, one or more
groups of subjects as in randomized experiments.
program elements may be modified to meet
In other designs, referred to as “reflexive control
local conditions, or the prescribed program
designs,” treatment group subjects or groups of
simply may be implemented with varying
subjects serve as their own controls and time–
levels of intensity in different areas. Thus, the
series methods are used to measure net program
measure of program impact will reflect the
impact (Rossi and Freeman, 1993). Though more
average impact of all program modifications
vulnerable to threats to validity than randomized
and variations in intensity, instead of the
experiments, quasi–experiments do not require
program as it was designed. Where process
random assignment to experimental groups and
evaluations are not undertaken concurrently
therefore are generally more feasible than
in order to measure and understand such
randomized experiments.
variability in implementation, this unmeasured
variability might lead to misleading inferences The numerous quasi–experimental research
as to the magnitude of program impact. designs are discussed at length elsewhere
(Campbell and Stanley, 1963; Cook and Campbell,
Threats to validity aside, there is at least one
1979; Rossi and Freeman, 1993; Fisher et al., 1991).
other practical consideration that may limit the util-
In this document, we focus attention on a design
ity of randomized experiments for measuring
that has the widest applicability in assessing the
program impact: it may not make sense from a pro-
impact of family planning programs: the pretest–
grammatic point of view to locate programs or
posttest, non–equivalent control group design.18
interventions randomly. In fact, programs are of-
ten targeted at geographic areas or population
subgroups for two diametrically opposed reasons:
18 A number of potentially powerful quasi–experimental
they are thought to be under–served or especially
designs have been excluded from the discussion because
receptive to the program. While it is possible they are unlikely to be widely applicable in the evaluation
to conduct randomized experiments within such of family planning programs. For example, the time–
special populations, the likely result would be to series design is relatively strong where a reasonably long
dilute the impact of the program over the short– to time–series data exist in few countries, and are largely
limited to countries where population–based surveil-
medium–term. In such a situation, the need to lance systems have been implemented (e.g., Matlab,
demonstrate impact may be in competition with Bangladesh and Cebu, Philippines). A second design, the
the ability to generate it. regression discontinuity design, is perhaps the most
powerful of the quasi–experimental designs available,
To conclude, then, we strongly endorse the use but because the level of program “screening” required
of randomized experiments, but recognize that for the meaningful application of the design (e.g., the
practical realities often limit their use, particularly use of income or other eligibility criteria to choose
in national–level impact studies. A case may be program participants) is unlikely in family planning pro-
grams (which are based upon the notion of consumer
made, however, for the proposition that random-
choice), it is difficult to envision the circumstances under
ized experiments have been under–utilized in which this design would be applicable. Further details on
operations research and in studies involving these designs are provided in Rossi and Freeman (1993)
program–level results, and should be the method and Cook and Campbell (1979).
50
Methodological Approach: Impact Assessment
Design and Analysis design effect error component above and will
The basic layout for the pretest–posttest, directly influence the magnitude of the estimated
non–equivalent control group design is identical impact. This is the primary disadvantage of quasi–
to that in the pretest–posttest randomized experi- experiments in comparison with randomized
mental design displayed in Figure IV–2, except experiments: in randomized experiments, design
that randomization is not used to assign study effects are minimized by random assignment to
subjects or groups of subjects to experimental experimental groups. The validity of quasi–experi-
groups. Instead, one or more control (or compari- mental studies thus rests upon the effectiveness
son) groups are identified that are as similar with which design effects can be minimized
as possible to the treatment group on as many through matching and multivariate analysis.
factors as possible. In many applications, treat- Illustrative applications of the non–equivalent
ment and comparison groups are matched with control group design are provided in Figures IV–4
respect to characteristics thought to be associated and IV–5. Figure IV–4 displays results from a
with the outcome under study (other than, of relatively strong variant of the design under
course, the program or intervention being evalu- consideration that features multiple observations
ated). For example, subjects or population of outcome measures both before and after
subgroups that are as similar as possible to the program implementation. Figure IV–5 illustrates
treatment group with respect to economic status, the more typical situation where single “pre–” and
geographic location, ethnicity, and other charac- “post–” intervention measurements are made.
teristics might be purposively chosen to serve as
comparison groups. Alternatively, geographic areas Strengths
and/or population subgroups that are similar to The primary strengths of the non–equivalent con-
the treatment area/population may be identified trol group design are:
and a random sample chosen to serve as a
comparison group.
■ It provides an approximation to a randomized
experiment when randomization is not possible.
As in randomized experiments, program impact
is measured in the non–equivalent control group
■ It is versatile. Like randomized experiments,
design by the difference between the change in quasi–experiments may be used to measure re-
outcome measures for the treatment group and sults at either the population or program levels.
that for the comparison group, plus or minus ■ When properly designed, controlled, and ana-
random error; that is, lyzed, quasi–experiments can provide evidence
Impact = (O2–O1) – (O4–O3) +/– error of program impact that is nearly as strong as
randomized experiments and stronger than most
where: O1 and O2 = pre– and posttest non–experimental studies.
measures, respectively, for the
treatment group; Limitations and Practical Considerations
O3 and O4 = pre– and posttest The non–equivalent control group design is sub-
measures for the comparison ject to the same general assumptions and
group; and limitations as randomized experiments outlined
earlier (other than those involving randomization).
error = design effects and random
In addition:
measurement error.
■ The design is more vulnerable than randomized
In a quasi–experiment, it is of crucial impor-
experiments to selection bias; that is, that differ-
tance to compensate for differences between
ences in the characteristics of the experimental
treatment and control groups through the applica-
groups will be correlated with the outcomes
tion of multivariate statistical methods. Even in
under study, thus distorting the impact findings.
matched studies, it is usually necessary to intro-
duce statistical controls in order to control for ■ It relies heavily upon multivariate statistical meth-
differences in factors on which it was not possible ods, and is thus sensitive to the use of appropriate
to match. Experimental group differences that are statistical models and to the proper treatment
not adequately controlled will be reflected in the of statistical estimation problems.
51
Methodological Approach: Impact Assessment
Figure IV–4
320
280
240
200
GFR
160
80 Comparison Areas
0
1974 1975 1976 1977 1978 1979 1980
Year by Quarter
Source: Phillips, J. et al., 1982, “The Demographic Impact of the Family Planning–Health Services Project in
Matlab, Bangladesh,” Studies in Family Planning 13 (5): 131–140.
As a practical matter, it is often possible in these developments have been incorporated into
quasi–experimental studies to compensate for the regression–based methods for measuring
experimental group differences on key characteris- program impact that are discussed next in this
tics through matching and multivariate analysis. A chapter. While these developments have not been
lingering concern, however, is whether experi- extensively used in conjunction with quasi–experi-
mental groups differ on unobserved factors that mental research designs to date, there would not
influence the outcomes under study. Unlike the appear to be any reason why they could not be in
distorting effects of differences in factors that are future research.
observable/measurable and which can be
accounted for through matching and the introduc-
tion of control variables in multivariate statistical Multilevel Regression Methods
models, factors that are unobservable (e.g., differ- Overview
ential predisposition or motivation) cannot be
Impact assessments based upon multilevel
compensated for in this fashion and can lead to
regression methods fall under the general heading
misleading and/or biased estimates of program
of non–experimental or observational studies;
impact. This “unobserved heterogeneity” factor is
that is, studies in which there are no experimental
in fact a concern in all study designs other than
and control groups per se. Because the approach
randomized experiments.
is non–experimental, the treatments vary from
Considerable work has gone into the develop- area to area as a result of decision–making
ment of statistical methods for measuring and processes that are beyond the control of the evalu-
controlling the effects of unobserved factors, and ation researcher. The criteria underlying program
52
Methodological Approach: Impact Assessment
53
Methodological Approach: Impact Assessment
Because of these desirable features, as well presence of a family planning clinic in the commu-
as the wide availability of individual–level data nity, number of contraceptive methods offered
from surveys such as the World Fertility Survey and within 30 kilometers of the community) and family
the Demographic and Health Survey, the multi- planning outcomes, controlling for socioeconomic
level approach has largely replaced the areal and other non–program factors. The model uses
regression approach in the toolbox of evaluation randomly chosen communities or other areal units
researchers in recent years. Areal regression (e.g., municipalities, districts, provinces, etc.) and
methods continue to be used, however, in samples of individual women/couples from each
cross–national analyses (see Bongaarts, 1993, for community as units of analysis.
an illustrative recent application).
To illustrate, we consider a relatively simple
In this section, we describe three types of multi- cross–sectional model relating program inputs to
level regression models that may be used for fertility behavior. A typical model might include
measuring program impact. The basic question three types of variables measured at two levels
that each model attempts to answer is summa- (i.e., at the individual/household and community
rized in Figure IV–6. levels):20
The Basic Cross–Sectional Model ■ factors specific to individual women and house-
holds (e.g., age, parity, education, demand for
The cross–sectional multilevel model seeks to
children, household assets, family structure, etc.);
assess whether there is a statistical relationship
between family planning program variables (e.g., ■ factors that are specific to communities or other
population aggregations, but are common to all
households and individuals within the
Figure IV–6 community (e.g., environmental conditions, com-
munity infrastructure, labor market conditions,
Primary Questions Addressed by the Principal etc.);21 and
Regression Modeling Approaches for Measuring ■ community–level measures of family planning
Program Impact program strength, which are also assumed to be
common to all households and individuals in the
Model Question community (e.g., presence of a fixed clinic
Basic cross–sectional Is there a statistical relationship between
providing family planning services in the com-
munity, number of family planning methods
program and outcome variables (e.g.,
available at outlets within a specified distance of
contraceptive prevalence, fertility) when
the community, quality of services).
the effects of other observed factors
influencing the outcome(s) are con–
trolled statistically? 20 Individual– and household–level variables are usually
treated as being at the same level. Models with more
Random effects model Is there a statistical relationship between
than two levels are possible, but the computer soft-
program and outcome variables when ware currently available is limited in certain ways
the effects of other observed factors (e.g., the MLn software package uses approximations
influencing the outcome(s) and unob- in 3– and 4–level models and can handle only di-
served factors that jointly influence chotomous and continuous outcome variables). The
interested reader is referred to Woodhouse (1995)
program and outcome variables are
for details on the MLn package.
controlled statistically? 21 A fourth type of variable is often used in multilevel
Panel model To what extent are observed changes in models: community–level variables that are derived
outcome variables associated with by adding or averaging over individual observations
within sample communities. Examples are mean
changes in program variables when the
household income or the proportion of households
effects of initial levels of and changes in with electricity. This type of variable serves the same
other factors (both observed and unob- purpose as the community–level variables described
served) are taken into account? in the text and has been omitted from the discussion
in order to simplify the presentation.
54
Methodological Approach: Impact Assessment
The basic model may be written as: family planning program variables and outcome
variables (e.g., contraceptive use, fertility) may be
Outcome Variable = Program Factors + Individual
interpreted as meaning that the program has influ-
Factors + Community Factors
enced or caused the observed outcome, net of the
+ Interactions Among
other variables included in the model. However, it
Factors22 +/– Error23.
Of primary interest for program evaluation
purposes are the magnitude and statistical signifi- Figure IV–7
cance of the regression parameter(s) of the
program variable(s) included in the model and Illustrative Application of the
the interactions between program variables Cross–Sectional Multilevel Model, Thailand
and other variables. The regression parameters for
the program variables indicate the strength of
Study Design Entwisle et al. (1984) used data from the second round
association between program measures and indi-
of the Thailand Contraceptive Prevalence Study (CPS2) and multi-
vidual–level outcomes (e.g., contraceptive use,
level regression methods to assess the effects of availability of
fertility) when the effects of other factors included
family planning program outlets on the likelihood of contraceptive
in the model have been controlled; the interaction
use in rural Thailand. Data collected from 4,956 rural women aged
terms provide information on whether the
program had a larger impact on certain population 15–44 years who were married or in union at the time of the survey
subgroups than others. For example, the inclusion were used in the analysis. Sample villages were classified into three
of interaction terms in the model might allow the groups on the basis of their proximity to different types of facilities
evaluator to test whether the provision of CBD providing family planning services: (1) villages located near (i.e.,
points had a greater influence on the contracep- within 4 km.) a district health center, (2) villages located near a
tive behavior of lower–income women (who may tambol (i.e., municipality) health center, and (3) villages located
have less access to fixed facilities) than higher– near neither type of facility. Individual–level variables used in the
income women. analysis included age, education, and desire for more children.
An illustrative application of the cross–sectional Results The regression results indicated strong effects of both family
model to the measurement of the effects of planning service availability and individual–level variables, as well as
selected characteristics of the family planning strong interactions between service availability and desire for more
supply environment on contraceptive use is children and education. The adjusted odds (calculated from the re-
provided in Figure IV–7. gression results) of using a modern contraceptive method among
An important limitation of the cross–sectional women who desire no more children (in comparison with those de-
model is the difficulty entailed in sorting out siring more children) are shown below for categories of the age and
cause–effect relationships from measurements service availability variables.
taken at one point in time. In the cross–sectional Age
model, an observed positive relationship between
Service Availability 15–24 25–34 35–44
22 The term “interaction” refers to the statistical depen- Near District Health Center 1.90 4.45 5.29
dence of a given factor or variable on other variables;
Near Tambol Health Center 2.07 4.06 7.40
for example, the effects of adding CBD points might
depend upon the degree of access to fixed clinics in > 4 km. From Either 2.04 1.72 4.77
a given setting. Interactions between program and
individual factors, between program and community Interpretation The strongest effects of service availability on contra-
factors, and between individual and community
ceptive use were observed among women aged 25–34 years, with
factors are possible.
23
women residing near district or tambol health centers more than
In multilevel models, both community– and indi-
vidual–level errors are present. For the more twice as likely as women more than 4 km. from either to have been
statistically inclined reader, the cross–sectional multi- using a modern contraceptive method at the time of the survey
level model is written out in the form of a when other factors are controlled statistically.
regression equation in Appendix A. The reader will
note the specification of both community– and indi- Source: Entwisle et al., 1984
vidual–level error terms in the regression equations.
55
Methodological Approach: Impact Assessment
is also possible that the causal pathway actually In assessing impact using multilevel regression
runs in the opposite direction; that is, that demand models, it is essential to address the issue of
for health and family planning services caused endogeneity. Two approaches for doing so, multi–
services to be located in areas where contracep- equation random effects models and fixed effects
tive use was predisposed to be high. panel models, are described below.25,26
Program managers often justifiably locate clinics
The Multi–Equation Random Effects Model
or other types of outlets so as to meet existing
demand for services and/or select locations with A number of approaches have been developed for
more advantaged populations or superior infra- dealing with the statistical problems resulting
structure (e.g., roads, electricity, etc.). Such factors from unobserved variables in cross–sectional data.
have no doubt influenced location decisions in The interested reader is referred to Bollen et al.
many family planning programs, particularly at the (1992, 1995) and Mroz and Guilkey (1992) for
early stages of program development. To the detailed discussions and appraisals of alternative
extent that the populations served by these facili- approaches. Here, we focus attention on one
ties are predisposed to higher contraceptive use approach that The EVALUATION Project finds
and lower fertility, there is a danger that impact particularly promising: an adaptation of the
evaluations based upon cross–sectional data may random effects model to the types of data that are
overstate the actual level of program impact. typically available in DHS–type surveys supple-
mented by selected family planning program
Note that the way in which programs are information.
“located” may also lead to underestimating pro-
gram impact from cross–sectional data; for Although random effects models are generally
example, where programs are targeted at areas of associated with longitudinal (as opposed to
high fertility and mortality and/or low contracep- cross–sectional) data, recent methodological
tive prevalence.
The central issue is whether family planning 24 Endogenous factors or variables are independent or
program variables should be viewed as “endog- predictor variables that are determined by the same
enous” or “exogenous” variables in the regression set of factors or the same decision–making processes
equations.24 If programs are implemented uni- that determine the outcome variable being studied.
formly or randomly across sub–national units, Exogenous factors, by contrast, are variables that are
then program variables may be viewed as being not determined by factors that also influence pro-
gram variables. For example, labor force participation
exogenous factors or variables without risk of
is endogenous in a regression equation predicting
estimation bias. However, if programs are imple- contraceptive use, since both labor force participation
mented on a non–uniform or non–random basis and contraceptive use are “choice” variables for
according to some type of decision–making individual women that are influenced by common fac-
process, then the endogeneity problem comes tors (e.g., education, household size and structure).
into play. If so, this would result in a violation of The community wage rate for women would be an
example of an exogenous factor in such a regression,
the standard regression assumption that variables
since this variable is determined by factors that are
in the equation and error terms are uncorrelated. not subject to individual choice.
Of particular concern is whether “unobserved” 25 It will be noted that the basic cross–sectional model
factors that are correlated with the outcomes of described above can also be estimated within a
interest may have influenced program location random effects framework depending upon the
decisions. The crucial point is this — if program assumptions made about the error terms.The distin-
“location” decisions are made on the basis of guishing feature of the random effects approach
factors that cannot be measured and controlled in described in the next section is the use of multiple
equations that are estimated simultaneously.
a statistical model, one may obtain inconsistent
26 Note that other approaches are also available but are
and/or biased estimates of program impact. The
viewed by The EVALUATION Project as being less
interested reader is referred to Bollen et al. (1992 promising than the approaches presented in this
and 1995) for fuller discussions of these issues in publication. The interested reader is referred to Mroz
the context of family planning program impact and Guilkey (1992) and Bollen et al. (1995) for critical
evaluation. appraisals of these.
56
Methodological Approach: Impact Assessment
57
Methodological Approach: Impact Assessment
Figure IV–8
Study Design The modified random effects model was recently the four equations were: (1) probability that a survey
applied to data from the 1988/89 Zimbabwe respondent has had “r” births over the course of her
Demographic and Health Survey and the 1989/90 reproductive career, (2) probability that a survey
Zimbabwe Service Availability Survey in order to respondent has had “r” infant/child deaths over the
measure the effects of access to family planning services course of her reproductive career, (3) fertility intentions,
on contraceptive use. Survey data on 2,050 currently and (4) current contraceptive method. The independent
married women were used in the analysis, along with variables in the four equations consisted of exogenous
community–level and family planning service data for individual– and household–level variables, selected
167 communities (i.e., DHS sample clusters). The service community characteristics, and a set of variables
data contained measures of physical access to service measuring various aspects of the supply environment for
delivery/supply points, as well as a series of measures of family planning services (measured at the community
service delivery system preparedness and functioning level). Each equation also included a parameter repre-
(e.g., presence of electricity and running water, numbers senting unobserved factors hypothesized to influence all
of staff trained in family planning, availability of four outcome variables. The equations were estimated
contraceptive methods and supplies, and courteousness simultaneously using a full–information maximum
of staff). The statistical model employed consisted of a likelihood estimator.
system of four equations. The outcome variables for
Results The following individual–level variables showed model. Simulations based upon the regression results
statistically significant effects on the use of modern revealed the following estimated effects of CBD points:
contraceptives in the reduced form equations: respondent’s
age, religion, years of education, and whether the respon- Proportion Using
dent resided in a commune. Two community–level
Variable and No Modern Traditional
variables also emerged as significant determinants of
Condition Method Method Method
contraceptive use: educational opportunities available in
the community and the presence of a CBD point in the Actual Values .47 .44 .09
community. Most of the unobserved heterogeneity
CBD Point in .44 .47 .09
parameters were statistically significant, indicating that Community
their omission from the model would have resulted in bi-
No CBD .50 .40 .10
ased estimates of the effects of the other variables in the
Interpretation The strongest predictors of modern contraceptive use modern–method contraceptive prevalence would be
were individual–level characteristics. Of the family expected to increase by about 7 percent from observed
planning program variables tested, only the presence of a levels (from .44 to .47) and by 17 percent from the
CBD point in the community had a significant effect on level that would prevail if there were no CBD points (from
modern contraceptive use. Simulations indicate that if .40 to .47) when the effects of other factors are taken
CBD points were to be established in every community, into account.
58
Methodological Approach: Impact Assessment
Figure IV–9
Study Design The cross–sectional random effects model was recently located within 30 km. in year i, (6) the duration of family
applied to data from the 1991 Tanzania DHS and accompa- planning service availability at hospitals, health centers,
nying Service Availability Module. The focus of the analysis and dispensaries in year i, and (7) whether family planning
was on estimating the fertility impact of family planning in services were available at hospitals, health centers, and
Tanzania over the 1969–1991 period, and was based upon dispensaries located within 30 km. when the respondent
household survey data from 5,215 women residing in 242 was 12 years old.
rural sample clusters who were under age 35 in 1991. Infor- Since dispensaries and (to a lesser extent) health centers
mation on the locations of health facilities, family planning had been deployed in areas with high child mortality over
service availability and time of initiation, and other service the past 10–15 years as a matter of policy, it was crucial
delivery characteristics were provided by the SAM. Because to account for non–random program placement in the
reliable data series were not available in Tanzania, it was analysis. Accordingly, three separate equations were
necessary to estimate fertility and child mortality levels and estimated in which the outcome variable was whether or
trends from the retrospective histories gathered in the DHS not there was a hospital, health center, and dispensary
and to establish the dates of initiation of family planning (respectively) located within 30 km. of cluster in year i.
services by “backdating” from the DHS SAM data. Historical Predictor variables in these equations included: (1) the
figures on government health expenditures were, however, district–level child mortality rate in year i, (2) whether
available and were used in the analysis. family planning service were offered by other facilities
The outcome variable used in the study was the probability located within 30 km. in year i, (3) government expendi-
that a survey respondent had a birth in any given year i during tures on health in year i, and (4) the population of the
the study period. The following variables were included as district in which the sample cluster was located in year i.
predictor or independent variables: (1) age of respondent These equations were then estimated simultaneously
in year i, (2) education level, (3) whether there was a hospi- with the outcome (i.e., conception) equation in order to
tal, health center, and dispensary located within 30 km. of estimate the impact of the Tanzanian family planning
the sample cluster in year i, (4) the district–level child mor- program while controlling for other observed and unob-
tality rate for year i, (5) whether family planning services served factors. A full information maximum likelihood
were available at hospitals, health centers, and dispensaries estimation procedure was used in the analysis.
Interpretation If family planning had been available continuously at all probabilities and the mean number of children born per
three types of facilities over the study period (with all other woman would have been 37 percent lower than those
factors held constant at observed levels), annual birth observed for this period (0.120 versus 0.181).
59
Methodological Approach: Impact Assessment
each health center, etc.) and comparing the result- sectional model to the case where observations
ing “expected” values of the outcome variable(s) are obtained for the same sample of individuals or
under study with the results based upon actual communities at two or more points in time. Data
values for these variables measured in a survey.30 sets where the same sample of individuals are
followed over time are relatively rare. However,
The Fixed Effects Panel Model where communities or clusters are used as the
The second alternative approach to dealing with units of analysis, much of the required data may
the problem of unobservable factors influencing be obtained from successive Demographic and
program placement decisions, the fixed effects Health Surveys that include Service Availability
panel model, extends the basic ideas of the cross– Modules undertaken in the same sample clus-
ters.31 This design addresses the question: do
communities experiencing the greatest changes in
Figure IV–10 the family planning supply environment between
two points in time also show the greatest change
Illustrative Impact Evaluation Design Using in contraceptive use (or other outcomes), control-
the Fixed Effects Panel Model, Tanzania ling for changes in other factors?
The basic model may be written as:
The multilevel panel design will be used to measure the impact of
Changes in the Outcome Variable =
National Family Planning Program (NFPP) efforts during the 1991–96
Changes in Program Factors + Changes in
period in Tanzania. The period covered by the impact evaluation
Individual Factors + Changes in Community
coincides rather closely with the project period for the USAID/Tanzania
Factors +/– Error.
Family Planning Services Support (FPSS) Project (1990–97). A
Demographic and Health Survey was conducted in 1991/2 and A key advantage of the panel design is that it
another is planned for 1996. In part to compensate for the limited permits the use of a particular estimation proce-
routine service statistics available, a smaller–scale interim or “mid– dure referred to in the literature as a “fixed
term” DHS (known as the Tanzania Knowledge, Attitudes, and effects” estimator. Under this approach, variables
Practices Survey, or TKAP) was undertaken in 1994. All three survey or factors are divided into two categories:
rounds (1991/2, 1994, and 1996) have been or will be conducted in ■ those that vary during the time period for the
the same sample clusters and will include Service Availability Modules evaluation study, or time–varying factors, and
to provide measures of program strength/activity (and changes
■ those that do not vary during the study period,
therein) at the cluster level.
or time–persistent or fixed factors.
The focus of the analysis will be on assessing the nature and magni-
Unobserved factors that may have influenced the
tude of changes in program service delivery at the community (or
allocation of program resources prior to the study
cluster) level during the 1991–96 period and the role that such changes period are treated as “fixed” in the model and
play in influencing contraceptive behavior, fertility levels, and other “differenced out” of the regression equations,
outcomes of interest, controlling for the effects of changes in other
factors.
30 Note that in the example in Figure IV–8 based upon
The 1991/92 DHS estimated unmet need for family planning at Zimbabwe data, an equation for program placement
30 percent of currently married women. In recent years FPSS has is not included in the model, and it is thus necessary
invested heavily in training, as well as commodities and logistics to assume that the family planning program is exog-
management. Thus, it is anticipated that large effects will be observed enous. This type of model is useful in examining the
contributions of programs to contraceptive use that
in subsequent surveys for “supply environment” variables such as
result indirectly from program effects on fertility in-
“presence of trained staff,” “number of stock–outs in last six months,” tentions.
and “number of contraceptive methods offered at the facility.” It is 31 For example, successive rounds of DHS have been
hypothesized that increases in contraceptive use and decreases in undertaken in the same sample clusters in Morocco
unmet need for family planning will be larger in areas where the (1987,1992, and 1995) and Tanzania (1991 and
greatest improvements in the family planning supply environment have 1994), albeit for half–samples in Morocco in 1995
taken place. and Tanzania in 1994. In Morocco, an attempt was
made to re–interview the same women in 1995 as in
1992, resulting in a “true” panel of individual women.
60
Methodological Approach: Impact Assessment
thus reducing the risk of estimation bias.32 This fertility and mortality levels, standard of living
“differencing” procedure is illustrated in Appendix indicators, etc.).
A, where the fixed effects panel model is written
Few (if any) surveys collect the full range of
out in regression format.
data required for meaningful applications of this
The logic of the multilevel panel design and the approach. However, Demographic and Health
types of data needed for its application are illus- Surveys that include a Service Availability Module
trated in the case of Tanzania in Figure IV–10. (SAM) generally contain a good deal of the infor-
mation needed. These can often be supplemented
Data Requirements with the additional program and community–level
The data requirements for multilevel models are economic data needed at relatively low marginal
rather demanding. The following types of data cost in order to provide an adequate database
are needed in order to construct sound cross– for estimating multilevel models. Other surveys
sectional models of family planning program with comparable information content may also
outcomes: be used.
The data requirements for applying the fixed
■ Household– and individual–level survey data
effects panel model are largely the same as for the
(e.g., from DHS–type surveys) on:
cross–sectional model, with the exception that
➤ demographic and economic characteristics; data at two or more points in time on the same
➤ fertility preferences and intentions; and individuals or the same clusters are needed.
Ideally, DHS data collection in a given country
➤ current contraceptive use, fertility, and/or would be timed to correspond to cycles of family
other “outcome” measures. planning program activities (e.g., surveys at the
beginning and end of a 5–7 year cycle). In this
■ Information on community–level determinants
way, multipurpose surveys such as the DHS could
of fertility and fertility demand (usually obtained
serve as the basic mechanism for gathering the re-
from community–level surveys):
quired information for measuring impact during a
➤ labor market conditions and wage rates; given program cycle. However, it is not necessary
➤ community infrastructure; and that the timing of rounds of DHS correspond
exactly to program cycles. As survey rounds accu-
➤ demographic indicators for prior years (see
mulate, the time frame covered by the multilevel
below for further discussion).
panel study could be extended to compare pro-
■ Information on the supply environment for fam- gram effects across different stages of program
ily planning services, usually obtained from facility implementation. In addition, conducting cross–
surveys or censuses such as the DHS Service Avail- sectional multilevel analyses at different time
ability Module, program statistics, and/or points may be used to show that different inputs
community–level surveys:33 are important to achieving impact at different
➤ number and types of health and facility planning
facilities within a fixed distance of each sample
cluster (e.g., 30 km.);
32 The model is focused on measuring the population
➤ services and contraceptive methods available;
repsonse to changes in the program during the
➤ length of time services and methods have been period of time considered in the evaluation study.
available; and Program resource allocation decisions and invest-
ments made at earlier points in time are assumed not
➤ measures of service quality. to be causes of changes in outcome measures during
the evaluation study period except through delayed
Especially important in the random effects or lagged responses.
models is the availability of information on 33 The Population Council’s Situation Analysis is another
community–level characteristics that may have example of the type of facility survey that may be
been important determinants of prior program used to gather much of the required program–level
location/resource allocation decisions (e.g., information.
61
Methodological Approach: Impact Assessment
stages of program development (effects which variables to apply. Knowledge of the social
may not show up when aggregating across setting and evolution of health and family plan-
program phases). ning services is needed to inform the selection of
appropriate community level and program
Strengths variables.
The multilevel regression approach has a number ■ The models are sensitive to the timing of
of strengths: program investments in relation to the period of
observation in impact assessments. Since some
■ Since the approach relates program input
specifications of the random effects model
measures to outcomes at the community level,
relate family planning program indicators to
it permits the measurement of impact of the
outcome measures in the recent past (see Figure
program as it is actually implemented.
IV–9 for an illustration of this point), accurate
■ It does not require an experimental design. information on the timing of past program
changes is crucial to the accurate estimation of
■ It provides more detailed information on the
program impact. Similarly, since the fixed
pathways through which programs influence
effects model relates changes in program
contraceptive behavior than any other approach.
variables to changes in outcome indicators dur-
Limitations and Practical Considerations ing specific time intervals, the absence (or minimal
levels) of change in program measures during a
There are also several important limitations and
particular period studied will logically result in
constraints in the use of multilevel regression mod-
estimates of minimal program impact.34 The
els:
fixed effects model will also not pick up on
■ The approach is demanding in terms of data. program investments made prior to the period
Practically speaking, large–scale population– under study. For example, if a country had made
based surveys comparable to the DHS are extensive investments in family planning in the
required. This is especially true when panel 1970s and early 1980s and had basically main-
models are to be estimated, as the unit of analy- tained strong support of these activities
sis for such analyses are communities or clusters. thereafter, the country might exhibit fairly high
Unless a large enough sample of clusters is levels of contraceptive prevalence. Yet if one
available, the study design may lack sufficient were to apply the fixed effects model with data
statistical power to detect program effects of from the late 1980s and early 1990s, the analy-
the magnitude that are likely to occur over sis might show relatively little impact, because
relatively short periods of time such as five years. the strong investment from earlier years would
For the cross–sectional model, the availability of not be reflected in the measures of program
community–level data on factors that may have change. This phenomenon might explain, in part,
influenced prior program location decisions is the relatively small program impact found in the
crucial if unbiased estimates of program impact Gertler and Molyneaux (1994) analysis of the
are to be obtained. Measures of program Indonesian family planning program during the
activity, which usually are derived from facility– mid–1980s. Thus, caution needs to be exercised
based surveys, are also required. in applying the fixed effects model in countries
■ The method is sensitive to the use of appropriate with relatively advanced programs, where new
statistical models and to the proper treatment
of statistical estimation problems. Suitably trained
personnel and appropriate computer software
are also needed in order to carry out multilevel 34 In one sense, this is a generic evaluation design issue;
analyses. an analogous problem arises in attempting to mea-
sure the outcome of a randomized experiment too
■ Developing and estimating the statistical models
soon after program inplementation. It is highlighted
require knowledge of the setting. Because the here in connection with the fixed effects model to em-
specific variables included in the models will vary phasize its importance when interpreting the results
from case to case, there is no standard set of of impact studies based upon this approach.
62
Methodological Approach: Impact Assessment
Figure IV–11
Approach Observations
Randomized Experiments The “gold standard,” but generally impractical due to difficulties in
establishing and maintaining controlled experimental conditions in
national–level studies.
Quasi–experiments More practical since random assignment is not required. However,
more vulnerable to selection bias and other threats to validity than
randomized experiments. Use of quasi–experimental designs along
with methods to control for unobserved heterogeneity holds consid-
erable promise.
Multi–Equation Random Effects Refinement of the basic cross–sectional model that provides a way to
account for the possible endogeneity of program location/place-
ment. Requires additional data on factors or variables that might be
correlated with program placement decisions. Statistical
estimation procedures are complex.
Fixed Effects Panel Model Given household and facility data from the same geographic areas
at two or more points in time, provides an alternative (and
computationally simpler) method for addressing the problem of the
endogeneity of program placement. Because sample clusters are the
unit of analysis, generally has less statistical power than the other
models.
63
Methodological Approach: Impact Assessment
two less preferred but nevertheless useful alterna- surveys in a given population.38 Other than the
tives are presented: availability of data at two or more points in time
and information on source of contraceptive
■ a specific type of decomposition and
services/supply, there are no special design
■ the prevalence method. requirements for the use of the method.
The decomposition approach requires the The measurement of program impact proceeds
collection of data at two or more points in time, from the decomposition of total fecundity into
while the prevalence method is cross–sectional components as proposed by Bongaarts and
in nature. Kirmeyer (1980) and Bongaarts and Potter (1983).
The underlying model is a multiplicative model in
Decomposition
which the factors considered are expressed as an
(Proximate Determinants Model)
index measuring their fertility inhibiting effects;
Description that is, as measures of the extent to which each
Strictly speaking, decomposition is a generic tech- factor contributes to the difference between total
nique of demographic or epidemiologic analysis as fecundity and total fertility. The model is depicted
opposed to a method designed specifically for graphically in Figure IV–12.
program impact evaluation purposes. Here, we
In the cross–section, the model may be written as:
focus on a specific method of decomposition, the
approach proposed by Bongaarts and Kirmeyer TFR = TF * Cm * Cc * Ci * Ca
(1980), which has several features that make it a
Where:
more valuable tool for program impact assess-
ment than generic decomposition methods. TFR = observed total fertility rate;
64
Methodological Approach: Impact Assessment
65
Methodological Approach: Impact Assessment
■ The method provides only a measure of gross impact when the collection of data at two or more
impact; that is, it does not account for source points in time and the conduct of “posttest only”
substitution and program catalytic effects (i.e., randomized experiments are not feasible. The
increases in non–program contraception that prevalence method is a cross–sectional method
are the result of program promotional efforts). designed to take advantage of the wide availabil-
ity of survey data on contraceptive prevalence in
Prevalence Method developing country settings. Using survey data on
Description contraceptive prevalence by source of supply or
The final method considered, the prevalence service (i.e., program vs. non–program), current
method, provides a rough estimate of program age–specific fertility, and selected proximate
determinants of fertility, 40 the method estimates
the portion of the difference between potential
fertility41 and observed fertility that may be attrib-
Figure IV–13
uted to program contraception. This, in turn, may
Illustrative Application of the Proximate be converted into two estimates of program
impact: (a) the reduction in fertility rates and
Determinants Decomposition Method
(b) the number of births averted during a
to Successive Surveys in the Philippines
specified interval of time (usually a year) resulting
from program contraception. The method is
Survey Round and based upon the same model of the quantitative
Reference Date of relationship between fertility and its proximate
Estimates TFR Cm Cc Ci TF determinants described above in connection with
the decomposition method (see Bongaarts, 1986,
1978 Republic of 5.60 0.599 0.778 0.761 15.77
for a full presentation of the method).
the Philippines
Fertility Survey Design and Analysis
(1973–1977)
The prevalence method is relatively non–demand-
1982 National 5.28 0.599 0.713 0.778 15.90 ing in terms of data requirements in the sense that
Demographic Survey the required data are normally gathered as part of
(1978–1982) DHS–type surveys. The basic data needed are:
Change in TFR Percentage point change in TFR contributed ■ estimates of contraceptive prevalence for a
by changes in: specified point in time, by five–year age groups
Absolute Pct. Nuptiality Contraception Infecundability Residual and source of supply;
■ age–specific fertility rates for a given reference
–0.32 –5.6 0.0 –8.5 +2.1 +0.8
period;
Interpretation During the period covered by these two surveys, total ■ number of women of reproductive age, in
fertility declined 5.6 percent. The decline in TFR is explained entirely by
five–year age groups; and
an increase in contraceptive use. In fact, if only contraceptive use had
changed during this period, TFR would have declined by 8.5 percent- ■ total population size.
age points. However, changing levels of post–partum infecundability
Setting–specific estimates of use–effectiveness,
resulting from a decline in duration of breastfeeding exerted an
ideally by method and source, and age–specific
upward influence on the TFR of 2.1 percentage points. Unspecified or
residual factors also contributed to an increase of total fertility of 0.8
40 Specifically, information on proportions of women of
percentage points. The contribution of changes in abortion rates could
not be assessed in this example due to the lack of appropriate data. As reproductive age married or in union and average
the survey data indicate that the public sector provided approximately length of post–partum insusceptibilty (often indexed
by mean length of breastfeeding) are used. Informa-
50 percent of contraceptive services and supplies during this time
tion on abortion rates may also be used, where
period, a reduction in the TFR of approximately 4 percent may be available.
attributed to the public sector family planning program. 41 Potential fertility is defined as the level of fertility that
Source: Casterline et al., 1988. would prevail in a given population in the absence of
contraception.
66
Methodological Approach: Impact Assessment
fecundity rates are also useful, but only rarely Two impact measures are normally produced in
available. Where such data are not available for a applications of the method: (1) the reduction
particular setting, standard schedules may be used in fertility rates (i.e., potential fertility minus
(Bongaarts, 1986). observed fertility) attributable to program and
non–program contraception, respectively, and (2)
Under the method, program impact is measured
births averted by program and non–program
by the share of the difference between potential
contraception.
fertility and actual fertility during a specified refer-
ence period (e.g., the 1 or 3 years prior to a survey) For illustrative purposes, Figure IV–14 provides
that is accounted for by program contraception. estimates of effects on crude birth rates and of
Figure IV–14
Interpretation for a selected country: In the absence of the national family planning program in Indonesia, the crude birth rate
would have been 9.6 births per 1,000 population higher than the recorded 1979 CBR of 33. Non–program contraception reduced the CBR
another 2.0/1,000. Over 1.4 million births are estimated to have been averted in 1979 due to program contraception, and nearly 300,000
more births were averted due to non–program contraception.
Source: Bongaarts, J. 1986. “The Prevalence Method.” in United Nations Manual IX Addendum: The Methodology of Measuring
the Impact of Family Planning Programmes on Fertility. New York: Department of International Economic and Social Affairs,
pp. 9–14 (Table9).
67
Methodological Approach: Impact Assessment
68
Methodological Approach: Impact Assessment
Figure IV–15
Multilevel Regression
Characteristic a Random Quasi Single Panel Decompo- Prevalence
Experiments Experiments Survey sition Method
69
Developing an Implementation Plan
Chapter V
Developing an
Implementation Plan
70
Developing an Implementation Plan
Chapter V
D E V E L O P I N G A N I M P L E M E N TA T I O N P L A N
The previous chapters in this manual stress the organizations or individuals who could potentially
technical aspects of designing an evaluation. be interested in how the evaluation is carried out,
However, an evaluation plan will be of little worth what the results show, and how the information
if there is not a clearly defined plan for its might subsequently be used. Also, the list should
implementation. This chapter outlines the issues include any institutions expected to contribute
to be covered in developing an implementation data to the effort. Such potential stakeholders
plan for either program monitoring or impact include the following:
assessment.
■ Official government offices responsible for moni-
toring population phenomena, especially in
DEFINING THE INSTITUTIONS countries that have set demographic targets to
AND INDIVIDUALS RESPONSIBLE be attained:
FOR EVALUATION ➤ Ministry of Planning
Generally, one institution takes the lead in design- ➤ National Population Council
ing and implementing the evaluation of a national
➤ Other
family planning program, although it is often
necessary to enlist the assistance of other service ■ Organizations that provide family planning
delivery or research organizations. In some cases services, including:
the impetus for a large scale evaluation comes
➤ The Ministry of Public Health
from the country itself; in others, it is a donor
requirement. Perhaps the most common scenario ➤ The IPPF affiliate
is that program administrators and donors
➤ Other major NGOs
have a mutual interest in learning whether the
program is on track and how it might be further ➤ Subsidized contraceptive social
strengthened. marketing programs
The lead organization is often the major service ➤ Private sector firms that market
provider in the country, especially if it is a govern- contraceptives
mental institution (e.g., the Ministry of Health). ➤ Associations of private providers
Alternatively, the private family planning associa-
(local OB–GYN society, midwives
tion may take the lead, especially if it is a major
association)
player in service delivery and/or has a strong
research/evaluation capability. Whoever has the ■ Donor agencies that support the program
prime responsibility for the evaluation, it is impor- ■ Women’s health and other advocacy groups
tant to identify and involve other stakeholders.
Ultimately, if the stakeholders do not perceive
Involving Key Stakeholders the data and analysis to be useful for the kinds of
in the Planning Process decisions they need to make about program
To maximize the benefit and utility of evaluating design and implementation, the results may never
the national family planning program, it is impor- be used for their intended purpose but rather may
tant to include the major stakeholders in the be ignored or discredited by those the evaluation
process from the start. “Stakeholders” are all is intended to assist.
71
Developing an Implementation Plan
Defining Technical Needs and Identifying methods in areas identified by the evaluation to be
Available Sources In–Country underserved) may provide an example to other
A large scale evaluation of a family planning institutions as to how academic research can be
program, especially if it includes impact assess- applied to improve programs. Moreover, the
ment, requires technical expertise in study sense of common purpose developed through
design, preparation of data instruments, the evaluation process may serve to reinforce
supervision of data collection, editing and collaboration in the area of service delivery, even
processing of the information, data analysis, and among groups that do not generally work
report preparation. together.
72
Developing an Implementation Plan
Figure V–I
1 Often done to respond to specific concerns that may not be known at onset of programs. Thus, time and funds should be allocated
for such activities, but the exact nature and timing of the studies should be determined by actual need.
73
Developing an Implementation Plan
Examples: Staff salaries and fringe benefits, ■ Estimate the costs for data processing and
office space, office equipment such as computers analysis.
and photocopier, vehicles for field work, etc.
Example: In–house personnel may be able
■ (If not covered by the above) Estimate the per- to handle the processing and reporting for
sonnel and other direct costs for coordinating routinely collected service statistics. By contrast,
the different components of the evaluation. there are generally substantial costs associated
with the processing of situation analysis, DHS
Example: The salary and fringe benefits of an
data, and multilevel analyses to assess impact.
individual employed to collect and synthesize
service statistics from different branches of ■ Estimate the costs for dissemination of results.
the program or from different participating
Example: As will be described in Chapter VI, there
agencies.
are multiple channels for dissemination. Given
■ Estimate the costs for each individual data the importance of this final step of the process,
collection activity to be conducted as part of the it is essential that funds be budgeted for this
overall evaluation. purpose from the start.
Example: If the evaluation were to consist of Budgeting is a skill that may be unfamiliar to
(1) monitoring routinely collected service some evaluators. Just as one seeks out specialized
statistics, (2) conducting a situation analysis at assistance for sampling, study design, etc., it may
the start and end of the five–year project cycle, also be desirable to seek out the assistance of
(3) conducting a DHS at the start and end of the those with expertise in budgeting in a given
five–year project cycle,45 (4) applying the COPE institution to bring reality to the budget estimates
methodology in 30 SDPs, and (5) analyzing and ensure that all line items are included.
the cost per CYP by type of service delivery
mechanism, then it would be important to 45 In the case of Situation Analysis and DHS, the costs
budget each item separately, taking care not to
will vary greatly by country; budgets need to be
double–count items that would appear in more developed in conjunction with personnel from these
than one category, such as the purchase of projects with experience in the budgeting process
microcomputers with multiple uses. specific to these activities.
74
Developing an Implementation Plan
75
Disseminating and Utilizing the Results
Chapter VI
Disseminating and
Utilizing the Results
78
Disseminating and Utilizing the Results
Chapter VI
D I S S E M I N A T I N G A N D U T I L I Z I N G T H E R E S U LT S
79
Disseminating and Utilizing the Results
“What does this mean to me?” Thus, it is essential through this process, place greater importance on
for the evaluator not only to present the results, them. However, there may be two exceptions to
but also to interpret their relevance to ongoing the value of collective interpretation of the
programs. In many cases, the audiences in ques- results. First, if the persons present feel defensive
tion will grasp the program implications without about the results, then this type of session may
their being clearly articulated (e.g., contraceptive be counter–productive. Second, if any aspects of
prevalence is 50% in urban areas, compared to the data are of questionable validity, then the
20% in rural areas). However, most members discussion may focus more on what’s wrong with
of the audience will benefit from having the the data than what’s wrong with the program.
evaluators state the obvious, if only to reinforce
their own interpretation of the findings. More- REMAINING INVOLVED IN THE PROCESS
over, this approach to the presentation of data
makes it seem more practical and useful or “less All too often, evaluators are called in to design
academic.” and conduct an evaluation; they present their
results, conclude with a list of recommendations,
The fourth audience — the research community
and leave. In the ideal scenario, the program
— will in many cases also will be interested in the
administrator or manager will try to incorporate
programmatic implications, especially in journals
the recommendations in restructuring the pro-
or conferences that are applied in nature.
gram. In many other cases, the old adage holds
true: out–of–sight, out–of–mind.
ESTABLISHING A FORUM
There are three reasons why it is valuable for
FOR PRESENTING RESULTS
evaluators to remain in contact with program
THAT TRANSLATE TO ACTION managers over the period of implementing
There are a growing number of well–trained changes based on the evaluation results.
program administrators and program managers ■ The evaluator is available for further consulta-
who can read a report containing statistics and tion regarding the recommendations. For
instantly derive what it all means to their program. example, where the results of multilevel
However, it is common that those in a position to regression point to expected changes in out-
act on the findings will benefit greatly from the come indicators that would result from
opportunity to: implementing certain program actions, the
■ discuss and better understand the findings; program administrator may seek further
clarification on the modifications in service
■ internalize this information; delivery that would be needed to bring about
■ reconcile this “new information” with their the change.
understanding of the program as it currently ■ Regular contact between the program admin-
operates; istrator and evaluator serves as an important
■ verbalize the implications of the findings in reminder that the (original) evaluation was done
their own terms; for a purpose: to identify areas for further
improvement. In direct contrast to “out–of–sight,
■ identify actions that address the situations
out–of–mind,” the presence of the evaluator
uncovered by the evaluation; and
serves to promote action in the areas identified
■ arrive at a plan of action to capitalize on areas by the evaluation.
of strength and improve on areas of weakness.
■ The evaluator can assist in setting up subsequent
The ideal is a forum that promotes discussion designs to determine whether the instituted
and interaction among key individuals in a position changes in service delivery in fact bring about
to influence future program direction. The process the expected changes in service utilization,
of group dynamics will cause those present to contraceptive prevalence, etc. As such, this
question the results, seek to understand the reinforces evaluation as an ongoing process, not
underlying factors that explain the findings, and a one time event.
80
Disseminating and Utilizing the Results
81
Adaptations to Other Reproductive Health Interventions
Chapter VII
Adaptations to
Other Reproductive
Health Interventions
■ STD/HIV Prevention
■ Safe Pregnancy
■ Breastfeeding
■ Women’s Nutrition
■ Adolescent Reproductive Health Services
82
Adaptations to Other Reproductive Health Interventions
Chapter VII
A D A P TAT I O N S T O O T H E R
R E P R O D U C T I V E H E A LT H I N T E R V E N T I O N S
In the wake of the 1994 Cairo International population–based measures of outcome, and
Conference on Population and Development, feasibility of data collection. This chapter is orga-
“family planning” is rapidly expanding from a nized to cover these issues for each area of
narrow focus on contraception to a broader range reproductive health. It does NOT revisit the issues
of reproductive health services. As program of demonstrating causality (except in the case of
administrators and service providers move increas- women’s nutrition) discussed in earlier chapters of
ingly into the realm of reproductive health, this manual, which are equally applicable to family
evaluators must address the question: how does planning as well as to other areas of reproductive
one evaluate these interventions? health.
This chapter focuses on the following areas of It is important to recognize that family planning
reproductive health: STD/HIV prevention, safe has been one of the most closely evaluated
pregnancy, breastfeeding, women’s nutrition, public health interventions in the international
and adolescent reproductive health services. health arena. The existing literature reflects over
“Adolescent services” refers not to a separate 30 years of concerted effort to find the most
health area, but rather to a specific target popula- methodologically sound yet practical means of
tion. Nonetheless, with the growing recognition evaluating this type of program. By contrast,
of the need for services for this group, it is useful attempts to evaluate other areas of reproductive
to consider the implications for evaluating such health are more recent; much of the work has
programs. taken place since the mid–1980s. With greater
The guidelines contained in Chapters I–VI of implementation of reproductive health inter-
this manual for evaluating family planning ventions, evaluators will become more
programs are generally applicable to other knowledgeable in adapting existing evaluation
reproductive health interventions. In terms of methodologies to the special circumstances of
program–based measures, the indicators may these other programs.
change but the basic approach is similar across
other areas of reproductive health. The evaluator
STD/HIV PREVENTION
is interested in monitoring:
■ policy environment; Target Population
■ the number and types of activities carried out in Reproductive health interventions generally target
different functional areas; the population at risk. For certain areas of repro-
ductive health (e.g., family planning, safe
■ access to services;
pregnancy), this includes all women of relevant
■ quality of care; and age groups in the country. Actual interventions
tend to exclude those who by virtue of economic
■ level of service utilization (e.g., number of visits,
status are able to obtain services through private
number of new clients, volume of commodities
sources; however, success in a given country is
distributed).
measured in terms of the total population of
By contrast, there are marked differences women — or of married women — in the age range
between family planning and other areas of repro- (e.g., contraceptive prevalence, percentage of
ductive health in terms of target population, women delivering under supervised conditions).
83
Adaptations to Other Reproductive Health Interventions
With respect to STD/HIV infection, by contrast, long–term measure of success should be the level
not everyone in the general population is at risk. of HIV infection in a given country. Similarly, it
STD interventions target (1)”spread–cluster would be useful to track changes in the incidence
groups” (those most responsible for maintaining of infection (i.e., rate of new infections).
the STD epidemic), (2) symptomatic individuals
However, to date it has proven virtually
seeking relief for their symptoms, (3) individuals at
impossible to obtain data on HIV prevalence
high risk for infection due to behavior and/or
among a randomly selected sample of the general
biologic susceptibility, or (4) those already infected
population in a given country for several reasons.47
(e.g., unborn children of pregnant women with
First, testing for HIV requires biologic specimens
syphilis). In most countries, it would be a waste of
(i.e., blood, urine, or saliva) that are more difficult
resources to try to reach the general population
to collect than the usual verbal responses in the
and a highly ineffective means of reaching those context of a national survey. Second, the expense
at risk. Indeed, the risk of STD/HIV varies markedly of this type of survey is considerable. Third and
within a given country, region, or even a city. possibly most important, HIV testing poses several
The risk is related to (1) the level of STD/HIV ethical dilemmas: can one measure HIV status and
prevalence in a given population or geographical not inform respondents of the results? Is it ethical
area, and (2) the sexual norms and practices of the to inform persons that they are sero–positive but
sub–populations. STD/HIV interventions tend not be able to offer counseling or treatment? How
to be targeted to groups of persons with high does the program deal with the stigmatization
risk behaviors: commercial sex workers, truck that sero–positive individuals would experience in
drivers, migrant workers, and adolescents. Thus, many societies?
programs need to be evaluated in relation to
effects on the behavior of these populations. Data are sporadically available from facilities
that provide pre–natal and/or obstetric services;
In contrast to most other areas of reproductive indeed, many of the commonly quoted data on
health that target women of reproductive age, HIV prevalence are based on this source of infor-
STD/HIV interventions must target both men and mation. Although this might seem to be an easy
women. Some would argue that men are even means of obtaining data on HIV prevalence, in
more important than women, given their role in fact it has numerous limitations. First, the women
sexual (and other) decision–making in many using the services are not necessarily representa-
countries. DHS–type surveys focus primarily on tive of the general population. Second, this type of
women of reproductive age, but in recent years testing would add an additional procedure for
have expanded to include males (as an indepen- facilities that do not routinely test for HIV and are
dent sample of randomly selected males or a not necessarily set up to do so. Third, from the
husband/partner sample). Given that STD/HIV risk perspective of the women in delivery, such testing
behavior appears to be associated with marital would mean an additional needle prick. Fourth,
or partnership status, it is preferable to collect given that hospital records are often of poor
independent male samples rather than the quality, there is no reason to believe data on HIV
husband/partner samples in order to generate status would be better, except in the context of a
more valid national estimates of key dependent or special research project. In sum, whereas data of
independent variables relating to STD/HIV. the sero–status of women in tertiary facilities
In short, there is no standard or conventional is useful for policy purposes (to reflect the
target population for STD/HIV interventions. It magnitude of the problem in general terms), these
differs by program or country and must be data are not very satisfactory for evaluation
assessed in each situation. purposes (Hassig, 1995).
84
Adaptations to Other Reproductive Health Interventions
Finally, while determining HIV prevalence may few countries are well–supplied with sufficient
be the long–term objective, it is not a useful mea- drugs to deal with the problem. Finally, only a few
sure for evaluating program impact, as it does not STDs can be identified by relatively easy invasive
change readily in response to changes in desired procedures such as venipuncture or finger prick
practices or behavior. That is, even if it were samples. Most require swabs or other genitally
possible to stop all further transmission of HIV collected specimens, which are not easy to
infection, this would not result in an immediate obtain; however, urine–based testing or self–
decrease in HIV prevalence. To cite one example administered swabs offer potential means of
based on modeling in a hypothetical high–incidence making population–based testing more feasible.
population, Mertens et al. (1994) estimate that if
HIV incidence decreased by 25% over five years Emerging Solutions to these
due to a successful prevention program, there Measurement Problems
would be minimal if any decrease in observed HIV There is widespread recognition of the importance
prevalence in that same time frame. of including men in evaluation research on STD/
In short, sero–prevalence surveys have not HIV. Indeed, the DHS and CDC Reproductive
been used widely as a means of monitoring the Health Surveys increasingly include male respon-
AIDS epidemic and/or evaluating interventions. dents.
Even if such data were available, it would be However, the nature and scope of the STD/HIV
difficult to associate changes in the incidence of intervention must be carefully considered when
HIV infection with program interventions (in the attempting to interpret any data from such
absence of a randomized experiment). surveys for evaluation purposes. For example, a
Some have argued that STD prevalence is a prevention program that focuses on military and
useful proxy for HIV infection. Indeed, in Thailand, university males, migratory laborers such as
data on decreases in STD rates are cited as a truckers, and several geographically–defined
hopeful sign for decreasing the spread of HIV populations of women at higher risk may have
(Hanenberg et al, 1994). Because there are treat- a major impact on the dynamics of the twin
ments, if not cures, for most STDs, the ethical epidemics of STD and HIV by changing sexual
dilemmas of testing for STDs are somewhat less behavior in those groups, yet yield little if any
than those associated with HIV. However, the evidence of change at the population level of the
technical/operational difficulties are actually more DHS survey.49 Most STD/HIV prevention programs
challenging. First, the available screening methods are neither funded nor implemented at a suffi-
for most STDs (syphilis is the exception) are either ciently high level (i.e., in terms of intensity and
expensive or require significant infrastructural coverage) to have “national” impact.
support for preservation and transportation of For programs targeted to a specific sub–
specimens, as well as the preparation and actual population, the sentinel site approach is emerging
testing of samples for presence of infection (e.g., as a promising evaluation strategy. These pilot
LCR/PCR48 or culture for herpes or gonorrhea). data collection efforts have focused on non–
Second, because several STDs are generally found representative but programmatically relevant
in varying proportions in populations, measure- populations. In the above example, a program
ment of a single STD may be inappropriate; might set up a “behavioral surveillance survey” in
evaluators would need to establish an STD profile the university and military populations that would
at the outset if they were to track a single STD for
evaluation purposes. Third, some STD screening
tests require confirmatory testing to ascertain the
nature (e.g., new, old, treated syphilis) of the 48 LCR refers to ligase chain reaction; PCR is a poly-
infection, while some STDs (e.g., herpes) are merase chain reaction. Both are DNA amplification
chronic conditions (like HIV) and an individual’s techniques.
sero–status will not change even if treatment or 49 Thailand is an exception to this rule. It now appears
behavior change occurs. Fourth, while the that if coverage to “spread cluster groups” is high
existence of therapy/cure options for STDs does enough, it can have an impact among the general
make testing for them more theoretically feasible, population.
85
Adaptations to Other Reproductive Health Interventions
86
Adaptations to Other Reproductive Health Interventions
needed to obtain sufficiently precise estimates and delivery process (i.e., prenatal care, delivery care,
the biases associated with households being newborn care, infant care) to perinatal mortality
dissolved as a result of women having died. More and, accordingly, on the component(s) that
promising are “indirect” estimates derived require further strengthening.
through the “sisterhood method.”50 Although less
Although both facility and survey data could
demanding in terms of sample size than direct
theoretically be used to measure these indicators,
estimates, relatively large samples are never-
it is questionable whether survey respondents
theless required if geographically disaggregated
in many developing countries can provide suffi-
estimates from the sisterhood method are desired.
ciently accurate information on age/length of
Another disadvantage is that estimates of the level
gestation at time of death and distinguish fresh
of maternal mortality produced by the method
from macerated stillbirths. If not, it is necessary to
refer to a point of time in the relatively distant
rely upon facility data. Even here, however, there
past, usually 10 or more years prior to the survey
are problems with determining exact ages at time
date.
of death. Since most births still are delivered at
Similar sources of data and general limitations home in developing countries and birth weights
apply to the perinatal mortality rate (the ratio of are difficult to measure in the home, facility–based
late fetal deaths plus early neonatal deaths per data do not necessarily provide the information
1,000 live births during a designated period of needed to determine failures in health services.
time). However, because of recall problems with
dates of late fetal and early infant deaths and The third group of indicators consists of what
serious under–reporting of deaths occurring soon may be termed “knowledge/coverage” indicators.
after birth in many settings, it is questionable This group includes a series of typical KAP
whether surveys can provide a suitable alternative (e.g., proportion of adults knowledgeable about
to vital statistics and/or hospital records. complications of pregnancy and childbirth,
percent with knowledge of location of essential
The case–fatality rate, which is defined as the obstetric services, etc.) and “coverage” indicators
ratio of the number of deaths from specific (e.g., proportion of population residing within one
complications of pregnancy to the number of hour’s travel time of a facility offering essential
complicated obstetric cases presenting at a obstetric care [EOC], proportion of women
specific health facility during a designated period attended at least once during pregnancy by
of time, measures the likelihood that a woman trained medical personnel, etc.). In addition, one
experiencing an obstetric complication will survive indicator that directly links supply and demand for
once she enters that health facility for treatment. obstetric services has been suggested as being
Aggregated across facilities, it provides a direct especially useful: met need for emergency obstet-
measure of the capacity of the health system to ric care. This indicator has been operationally
deal with obstetric emergencies. However, in defined as the proportion of women estimated
order to obtain estimates for the entire popula- to have direct obstetric complications seen in a
tion, relatively complete coverage of all facilities facility that can provide emergency obstetric care
offering emergency obstetric services is required. during a specified time period.51
A second group of indicators pertains to the Most of the knowledge/coverage indicators
characteristics of late fetal and infant deaths. For can be accurately measured on the basis of
example:
■ percent distribution of perinatal deaths by age
at time of death; 50 The sisterhood method is based upon reports of cases
of sisters of survey respondents who died after age
■ ratio of fresh to macerated stillbirths;
15 of pregnancy–related causes. The method con-
■ birth weight proportionate mortality rate; and verts this information into an estimate of the
maternal mortality ratio. Details on the method may
■ birth weight specific mortality rate.
be found in Graham et al., (1990).
The primary utility of these indicators is to 51 See “Indicators for Safe Preganancy,” the EVALUATION
provide information on the relative contributions Project, (1995) for precise definitions of the terms in-
of failures of different components of the service volved in these indicators and a discussion of their uses.
87
Adaptations to Other Reproductive Health Interventions
conventional sample surveys. The measurement of are at risk of conception in a given population.
access to facilities providing emergency obstetric Particular emphasis is placed on women who are
care is greatly enhanced where Service Availability pregnant, in the perinatal period, or lactating.
Modules undertaken in conjunction with DHS There also are sub–groups with special needs: first
and/or Situation Analysis Studies have been con- time mothers, working mothers, and women with
ducted, as well as where Geographic Information previous breastfeeding “failure.” In addition to
Systems (GIS) are in place. The demand–related the specific groups, certain educational or promo-
measure, although interpreted on a population tional interventions also target entire populations
basis, requires facility–based data on numbers of to foster a culture supportive of breastfeeding
complications for the numerators. There is also in homes, communities, workplaces, and health
some question as to the validity of the standard services.
assumption that 15% of all pregnancies will result
National breastfeeding programs are generally
in complications.52
evaluated in reference to the breastfeeding
Emerging Solutions to these practices among women having given birth
Measurement Problems recently. By contrast, interventions targeting
specific subgroups should be evaluated in
Pending improvements in vital registration and
reference to members of those subgroups. For
continued refinement of the sisterhood method
example, educational interventions aimed at
hold the greatest promise for the measurement of
adolescent and school–aged children could be
maternal mortality. Recently, small–area censuses
assessed in terms of changes in knowledge and
and small “coverage–like” surveys have been pro-
attitudes among the group.
posed in order to provide more current and
geographically disaggregated estimates than are
Outcome Measures and Feasibility
provided by the sisterhood method. Such efforts
of Data Collection
tend, however, to be costly.
The most recent policy on duration of breast-
One of the key issues under study is whether feeding (UNICEF/UNESCO/WHO/UNFPA, 1994)
some indicators currently measured on the basis states that all infants should be fed exclusively on
of facility data can be measured with reasonable breast milk from birth to 6 months. The World
accuracy in household surveys. Ways to obtain Health Assembly Confirmation of the marketing
more accurate information on age/length of code for breastmilk substitutes states that
gestation at time of death and birth weights in supplementary foods should be introduced at
household surveys are much needed. The conduct about six months.
of “verbal autopsies” on larger scale, perhaps in
conjunction with large household surveys such as Breastfeeding programs are generally designed
the DHS, holds some promise for obtaining true to increase (1) the incidence of breastfeeding,
population–based data on a number of the (2) the prevalence of exclusive breastfeeding in
indicators, but requires further testing in the the first 6 months of life, and (3) the duration of
context of measuring safe pregnancy outcomes.53 breastfeeding. Specific interventions may target
The Subcommittee on Safe Pregnancy of The a particular obstacle to optimal breastfeeding
EVALUATION Project’s Reproductive Health practices, such as (1) lack of knowledge among
Indicators Working Group has also proposed an
initiative to revise record–keeping at health
facilities on cases of pregnancy complications in
order to improve the quality of information avail- 52 See “Indicators for Safe Pregnancy,” The EVALUA-
able for program monitoring and management. TION Project (1995), and World Health Organization
(1993), “Indicators to Monitor Maternal Health
Goals”, for further discussion of this issue.
BREASTFEEDING 53 “Verbal autopsies” are survey instruments used to
Target Population gather information on signs and symptoms of illness
that preceded the death of a household member
The target population for breastfeeding programs reported as having died. Details on the method may
includes all fertile women of reproductive age who be found in Gray et al. (1990) and Kalter et al. (1990).
88
Adaptations to Other Reproductive Health Interventions
mothers and health care providers about the Although the focus of this manual is on
health, nutrition, economic, and child spacing national programs, many breastfeeding programs
benefits of breastfeeding, (2) birthing and health are sub–national in scope, operating in selected
care protocols which interfere with successful service delivery posts or communities. Evaluation
lactation, or (3) absence of lactation management of such programs would not be feasible utilizing
expertise in communities and health services that national level survey data. If a facility–based or
can assist mothers in successfully initiating community–based evaluation is to be undertaken,
breastfeeding at birth. Communication/education however, it is important to clarify (1) what the
programs emphasize avoidance of breastmilk specific outcomes/objectives of the program are
substitutes, delay of introduction of supplemen- (e.g., initiation within one hour of birth, exclusive
tary fluids and foods, and ways to resolve lactation breastfeeding for six months), (2) who will use the
problems should they occur. information generated, and (3) what purpose
the evaluation is to serve. Typical users are
Outcome measures for breastfeeding program
program implementers, program funders, and
evaluation frequently include initiation of breast-
policy makers.
feeding in the first hour of life, duration of
amenorrhea, use of lactational amenorrhea for One source of data for obtaining information
contraception, exclusive breastfeeding rates, about neonatal breastfeeding practices (e.g.,
mean duration of breastfeeding, and frequency initiation of breastfeeding during the first hour of
of feeds. life) is the hospital/maternity–based survey or exit
interview. Exit interviews may be particularly
The preferred sources of data for measuring useful if the target group of the intervention is
outcomes related to breastfeeding are the women who give birth in health facilities. It also is
DHS–type survey and simplified cluster survey a good strategy for assessing the extent to which
approaches. In most countries, it is relatively easy hospitals are effectively executing baby–friendly
to ask questions about breastfeeding in a private initiatives.
interview situation. The greatest measurement
problem is recall bias. It may be difficult for Emerging Solutions to these
women to accurately recall the exact number of Measurement Problems
feeds given to the baby, even in the previous
For evaluation purposes, program administrators
24 hours, or the exact age in months
and funding agencies would ideally like to
when breastfeeding stopped. Moreover, the
know the effect of their interventions on the
researcher’s definition of exclusive breast-
breastfeeding practices of the target population.
feeding may differ from the mother’s, and thus
The ever–growing data bank of DHS surveys pro-
indicators need to be carefully operationalized
vides a promising opportunity for further empirical
for breastfeeding. However, in comparison to
evaluation of breastfeeding practices at the
other areas of reproductive health, the outcome
national or regional level.
indicators can be measured fairly easily and
reliably using sample surveys. Researchers in turn are interested in demon-
strating the effect of infant feeding practices on
If DHS surveys are not available as a source of
fertility, infant nutrition, infant health, and related
data for outcome assessment of national
outcomes. The latter requires large samples and
programs, the preferred alternative would be a
preferably longitudinal studies that are challeng-
survey among a large, representative sample of
ing to conduct and analyze (since the impact of
women having recently given birth (since recall of
infant feeding practices is also affected by infant’s
breastfeeding practices in the past is notoriously
age, mother’s health, and the general socio–
unreliable). In designing such surveys, it is
economic conditions of the household).
important to conduct preliminary qualitative
research on the meanings of terms (e.g., exclusive To date, the evaluation research in this area has
breastfeeding) and to carefully pretest the treated these two issues separately: (1) the
instruments. However, the cost of such surveys is effects of program interventions on breastfeeding
high, putting this option beyond the possibilities of practices, and (2) the effects of breastfeeding on
most programs. fertility and mortality. The challenge remains to
89
Adaptations to Other Reproductive Health Interventions
demonstrate empirically in the context of a given Adolescent mothers represent a special target
study that program interventions not only affect population because they must support not only
behavior in the medium term but also influence their own nutritional requirements for growth but
fertility and health status in the long run. their fetus/infant’s as well. Where resources
permit, however, women’s nutrition programs
Whereas demonstrating impact remains one
increasingly target a broader age range. This is
objective in the evaluation of breastfeeding
being advocated in part because pre–pregnancy
interventions, there is also much to be learned
nutritional status is an important determinant of
from less rigorous methods, which are useful in
reproductive outcomes, and because women’s
improving programs at the local level. For
example, neonatal and infant health outcomes health issues are now considered as a priority in
tracked before and after “baby–friendly” hospital their own right.
interventions can provide a useful means of Outcome Measures and the
assessing their effectiveness among select Feasibility of Data Collection
populations.
The long–term objective of nutritional programs is
to reduce the incidence and prevalence of
WOMEN’S NUTRITION protein–energy and micronutrient deficiencies.
This is, however, an ambitious undertaking. The
Target Population immediate causes of nutritional deficiencies are
The definition of the target population for inadequate dietary intake and/or poor utilization
women’s nutrition interventions in the context of of nutrients (often the result of infections and
reproductive health is complicated by the poor health care). Nutritional requirements of
cumulative effects of malnutrition. Poor nutrition women in the developing world also are often
during early childhood can and does cause elevated by high physical workloads and frequent
irreversible physical effects. Growth deficits recurring cycles of pregnancy and breastfeeding.
caused in part by malnutrition during childhood Protein–energy malnutrition (PEM) is most
are never recovered by many women in the devel-
frequently assessed using anthropometric
oping world, resulting in shortness of stature,
measures such as weight, height and arm
cephalopelvic disproportion, poor pregnancy
circumference, measures that are now routinely
outcomes, and possibly other physiologic
collected as a part of the DHS. Shortness among
consequences. Therefore, interventions targeted
women(<145 cm) reflects nutritional deficiency
at women’s nutrition problems can potentially
during childhood and is therefore unlikely to
cover a broad age spectrum.
change during the life of most nutritional
The most common forms of malnutrition in the programs (except possibly in the case of
developing world are protein–energy malnutrition adolescents). Measures such as weight, weight
and micronutrient deficiencies (vitamin A, iodine, in relation to height, and arm circumference
and iron). All of these deficiencies are linked to reflect a women’s thinness. These measures
poverty and illiteracy. However, iodine deficiency should be sensitive to nutritional interventions
may be more geographically defined. Vitamin A that are targeted to the PEM problem. These
deficiency also is determined in part by ecologic maternal anthropometric measures are easily
factors and traditional dietary habits. Iron defi- collected, and indicators based on these are
ciency is widespread throughout the developed widely viewed as valid and reliable.
and developing world.
The assessment of micronutrient status is more
Most of the resources devoted to women’s difficult than anthropometric status and thus is not
nutrition, however, are concentrated on pregnant a routine part of DHS. Laboratory assays and/or
and lactating women for two main reasons: somewhat invasive techniques are required, as
(1) during pregnancy and lactation, nutritional micronutrient status is based on biochemical
requirements increase, and (2) maternal nutri- analysis of blood (iron and vitamin A), breast milk
tional deficiencies during pregnancy can have (vitamin A) or urine (iodine). Despite technological
serious consequences for women and their improvements, these tests still substantially compli-
infants. cate the logistics of field work. Technological
90
Adaptations to Other Reproductive Health Interventions
91
Adaptations to Other Reproductive Health Interventions
programs continue to extend services to young etc.). Still others may offer one or more non–
people into their early twenties (Stewart and health services, such as income–generating
Eckert, 1995). activities, legal services, employment counseling,
recreational activities, and so forth. Anecdotal
The World Health Organization (WHO) has
evidence suggests that programs with a broad
defined adolescence in two stages. All persons
range of activities may be less controversial than
between the ages of 10 and 19 are defined as
those offering reproductive health services only.
adolescents, with the younger group from 10 to
14 classified as “early adolescence” and 15 to 19 In short, there is no single objective or even set
as “late adolescence.” The latter category may of objectives against which to systematically
be further subdivided into 15–17 and 18–19 evaluate adolescent reproductive health services.
brackets, where programmatically appropriate. Rather, the objectives must be defined in relation
WHO further suggests that the terms “youth” may to specific programs. Moreover, to the extent
be used to refer to persons between the ages of that reproductive health services are provided in
15–24, and “young people” for the entire age connection with other activities, the evaluation of
group of 10–24 (WHO, 1989). such programs may also need to track non–health
results.
A second possible dimension used in defining
the target population for adolescent programs Program– Versus Population–based Measures
is marital status. In many countries adolescent If adolescent services were as well developed
programs have evolved to meet the reproductive and far–reaching as are contraceptive services for
health needs of unmarried young women who do adult women in many countries of the world, one
not feel comfortable using the same services as would expect changes in these population level
older, (mostly) married women. This focus is indicators as a result of program interventions.
particularly appropriate where adolescents The reality, however, is that most adolescent
become sexually active at a relatively early age programs (where they exist at all) are still in their
and/or marriage is postponed for educational or infancy. They reach a relatively small segment of
other reasons. However, in parts of South Asia and the population, often limited to the major urban
the Middle East, there is a need to focus services area(s) of the country. While these programs may
on young married couples, especially where have a pronounced effect on the individuals
existing programs ignore the needs of nulliparous they do reach, at present the coverage of such
women or even make childbirth a condition for programs is extremely limited.54 Thus, whereas
getting family planning services (Mensch, 1995). one would expect to find changes in knowledge,
In short, there is no standard definition of attitudes, and behaviors among the clients or
adolescence. The target population used to participants in such programs, it is unlikely
evaluate adolescent reproductive health inter- that adolescent programs in their current form
ventions should be consistent with the criteria are of sufficient magnitude or intensity to affect
established by the program in question. population–based measures of behavior.
Conceptually, changes at the population level
Outcome Measures and
are the long–term goal of most adolescent
the Feasibility of Data Collection
programs; however, evaluating such programs in
Objectives terms of changes at the population level would be
Adolescent reproductive health programs have of questionable utility. In fact, it could even be
multiple objectives that differ from one program detrimental to continued support of such efforts,
to another. For example, some programs are de- if those without a full understanding of the techni-
signed primarily as educational interventions to cal issues were to conclude that adolescent
increase knowledge, create awareness, and form
positive attitudes; they may have no service com-
ponent. Others, such as comprehensive health 54 One notable exception is the musical video featuring
services, may provide multiple services (e.g., family Tatiana and Johnnie that swept through Latin
planning, STD treatment, nutritional counseling, America in the late 1980s; this could aptly be de-
shelter from abusive situations, drug counseling, scribed as an intervention, but not a program per se.
92
Adaptations to Other Reproductive Health Interventions
programs “don’t produce results.” In many cases attribution of change to specific program
it may be more productive to evaluate adolescent interventions is difficult.
reproductive health services in terms of changes in
knowledge, skills, and behaviors among the clients Emerging Solutions to
or participants in these programs (Stewart and the Measurement Problems
Eckert, 1995). An important first step in evaluating adolescent
Feasibility of Data Collection reproductive health programs is to develop an in-
ventory of the different objectives common to
Most data collection to date has involved inter-
these programs and to define the corresponding
views with adolescents (at school, at home, in
intermediate outcomes. Although it may be
clinics, etc.), self–administered questionnaires or
possible to measure change at the population level
“tests” (e.g., pretest–posttest to measure knowl-
in relatively few cases, nonetheless this inventory
edge gain from an educational intervention),
would represent a useful menu for those involved
focus groups, observation in clinical settings,
in evaluating such programs.
analysis of clinical records, and other standard
methods. Special attention must be given to the Depending on available resources, programs
wording of questions (to be sure that key concepts may take one of two (or both) directions: (1) to
are presented in the vernacular of adolescents, obtain descriptive data—often of a qualitative
rather than the terminology of the medical nature—on how the program has functioned and
community). Also, special authorization may be where improvements need to be made, and (2) to
required, for example, to administer a question- conduct experiments or quasi–experiments
naire to adolescents in the school or interview an to measure the extent to which a given interven-
unmarried teen in her home. tion affects behavior (either among program
participants or in special circumstances at the
Intermediate outcomes or the behavioral mea-
population level). The former will be important in
sures of key interest in adolescent programs
providing feedback to managers about changes
(e.g., age at first intercourse, use of contraception
they can make in the short–term and at the local
at first and at last intercourse, unintended
level to increase the acceptability of the in-
pregnancy, self–report of STDs, self–report of
terventions to clients, potential clients, and the
drug use, etc.) can be obtained through self–
community at large. The latter will be extremely
report on surveys. Although there are some issues
important in demonstrating to the donor commu-
relating to the validity of the responses given by
nity and other interested parties the type and
adolescents to sensitive questions,55 nonetheless
magnitude of change that can be expected from
this information can be collected through direct
different types of interventions.
interview or self–administered questionnaire.
To conclude, the evaluation of adolescent
The feasibility of measuring long–term
reproductive health interventions is still at such
outcomes varies by reproductive health area, as
an early stage that it is premature to identify
reflected in the previous sections of this chapter.
“emerging solutions.” Rather, what is promising is
For example, it is fairly easy to measure age –
the mounting interest in systematically evaluating
specific fertility rates or wanted fertility
interventions in this area.
rates among adolescents (if defined as aged
15–19) based on self–report from a DHS–type
study. By contrast, it is highly problematic to 55 Anecdotal evidence suggests that adolescents seem
measure the maternal mortality rate or HIV relatively willing to discuss these subjects frankly if the
prevalence among the adolescent population. interviewer is of the same sex and also young (under
Nutritional status is easy to measure, but 30) (Morris, 1995).
93
Summary: Checklist of Steps in Designing an Evaluation Plan
Chapter VIII
Summary:
Checklist of Steps
in Designing
an Evaluation Plan
92
Summary: Checklist of Steps in Designing an Evaluation Plan
Chapter VIII
This manual is designed to assist professionals in appropriate study design must be identified and
the international family planning/reproductive implemented, as outlined in Chapter IV.
health community to:
Although the original mandate of The
■ Differentiate between the main types of EVALUATION Project was to evaluate the impact
program evaluation: program monitoring of family planning programs on fertility, the
and impact assessment; scope of many family planning programs has
expanded in recent years to include other areas
■ Critically evaluate the strengths and limitations of reproductive health. To this end, this manual
of alternative methods for impact assessment; reviews a series of methodological issues
(e.g., target population, measurement of
■ Assess and select the type(s) of evaluation most outcome indicators, feasibility of data collection)
appropriate to a given setting; for evaluating programs in other areas of
reproductive health (as described in Chapter VII):
■ Identify appropriate indicators and sources of
data for the evaluation; and ■ STD/HIV prevention
■ Design an evaluation plan outlining study ■ Safe pregnancy
design(s), indicators, and sources of data that ■ Breastfeeding
serves as a plan of action for subsequent
implementation. ■ Women’s nutrition
■ Adolescent reproductive health services
The manual reviews a series of steps to cover in
designing an evaluation plan. It assumes that the The basic steps to follow for program monitoring
evaluation will include some type of program and impact assessment can be summarized
monitoring. Where the objective is also to assess in a checklist format, as outlined in the boxes
impact, many of these same steps apply, but the on page 94.
93
Summary: Checklist of Steps in Designing an Evaluation Plan
Define the Scope of the Evaluation ❐ Assess the feasibility of using one of the
(Chapter II) preferred approaches
❐ Determine the program goals and ❐ (If one of the preferred approaches is
objectives possible) Identify and negotiate the special
data needs for the specific country setting
❐ Describe how the program “should” work
(conceptual model) ❐ (If the preferred approaches are not feasible)
Review the alternative approaches to
❐ Establish the objectives of the evaluation measuring impact:
❐ Outline the scope of the evaluation in ■ Decomposition
the evaluation plan (Proximate Determinants Model)
■ Prevalence Method
Define the Methodological Approach:
Program Monitoring (Chapter III) ❐ Determine and implement the optimal design
for the country–specific circumstances.
❐ Clarify the primary purpose of monitoring
❐ Identify the components to be monitored Develop an Implementation Plan
(Chapter V)
❐ Define relevant indicators
❐ Identify sources of data ❐ Define the institutions and individuals
responsible for the evaluation
❐ Design a format for the presentation of
results ❐ Establish a timetable for specific activities
❐ Summarize the methodological approach ❐ Budget for the evaluation
94
Summary: Checklist of Steps in Designing an Evaluation Plan
REFERENCES
AND
APPENDIX
95
References
References
AIDSCAP, 1995. “Application of a Behavioral Surveillance Survey Tool,” AIDSCAP Evaluation Tools,
Module 4. Family Health International, Arlington, VA.
Angeles, G., T.A. Mroz, and D.K. Guilkey. 1995. “Purposive Program Placement and the Estimation
of Program Effects: The Impact of Family Planning Programs in Tunisia.” Paper presented at the
Annual Meeting of the Population Association of America, San Francisco, CA.
AVSC International. 1995. COPE: Client–Oriented Provider Efficient Services. A Process and Tools
for Quality Improvement in Family Planning and Other Reproductive Health Services. New York,
New York.
Bauman, K., C. Viadro, and A. Tsui. 1994. “Use of True Experimental Designs for Family Planning
Program Evaluation: Merits, Problems, and Solutions.” International Family Planning
Perspectives 20(3): 108–113.
Bertrand, J.T., R.J. Magnani, and J.C. Knowles. 1994. Handbook of Indicators for Family Planning
Program Evaluation. Chapel Hill, NC: The EVALUATION Project.
Bertrand, J.T., R. Santiso, S.H. Linder, and M.A. Pineda. 1987. “Evaluation of a Communications
Program to Increase Adoption of Vasectomy in Guatemala.” Studies in Family Planning
18(6):361–370.
Bertrand, J.T. and A. Tsui (eds.), 1995. Indicators for Reproductive Health Program Evaluation.
Chapel Hill, NC: The EVALUATION Project.
Bogue, D.J. 1970. Family Planning Improvement through Evaluation: A Manual of Basic Principles.
Community and Family Study Center, University of Chicago, Manual No. 1.
Bollen, K.A., D.K. Guilkey, and T.A. Mroz. 1995. “Binary Outcomes and Endogenous Explanatory
Variables: Tests and Solutions with an Application to the Demand for Contraceptive Use in
Tunisia.” Demography 32(1):111–31.
Bollen, K.A., D. Guilkey, and T.A. Mroz. 1992. “Methods for Evaluating the Impact of Family
Planning Programs in Structural Models.” Chapel Hill, NC: The EVALUATION Project.
Bongaarts, J. 1993. “The Fertility Impact of Family Planning Programs,” New York: Population
Council, Working Paper No. 47.
Bongaarts, J. 1986. “The Prevalence Method.” in United Nations Manual IX Addendum: The
Methodology of Measuring the Impact of Family Planning Programmes on Fertility. New York:
Department of International Economic and Social Affairs, pp. 9–14.
Bongaarts, J. and S. Kirmeyer. 1980. “Estimating the Impact of Contraceptive Prevalence on Fertility:
Aggregate and Age–Specific Versions of a Model.” in Hermalin, A. and B. Entwisle (eds.) The
Role of Surveys in the Analysis of Family Planning Programs. Liege, Belgium: Ordina Editions,
pp. 381–408.
98
References
Bongaarts, J. and R. Potter. 1983. Fertility, Biology, and Behavior: An Analysis of the Proximate
Determinants. New York: Academic Press.
Brown, S.S. and L. Eisenberg (eds). 1995. The Best Intentions: Unintended Pregnancy and the
Well–being of Children and Families. Washington: National Academy Press.
Buckner, B., A.O. Tsui, K. McKaig, and A.I. Hermalin. 1995, A Guide to Methods of Family Planning
Evaluation: 1965–1990. Chapel Hill, NC: The EVALUATION Project.
Campbell, D. and J. Stanley. 1963. Experimental and Quasi–Experimental Designs in Research. Boston,
MA: Houghton Mifflin Co.
Casterline, J., L. Domingo, and Z. Zablan. 1988. “Trends in Fertility in the Philippines: An Integrated
Analysis of Four National Surveys.” Manila, Philippines: University of the Philippines Population
Institute.
Chamratrithirong, A., P. Prasartkul, and A. Bennett. 1986. “Multivariate Areal Analysis of the Efficiency
of the Family Planning Programme and Its Impact on Fertility in Thailand.” Bangkok: Economic and
Social Commission for Asia and the Pacific, Asia Population Studies Series No. 68.
Chandrasekaran, C. and A. I. Hermalin. 1985. Measuring the Effects of Family Planning Programs on
Fertility. Dolhain, Belgium: Ordina Editions.
Cook, T. and D. Campbell. 1979. Quasi–Experimentation: Design and Analysis Issues for Field Settings.
Boston, MA: Houghton Mifflin Co.
Entwisle, B., A.I. Hermalin, P. Kamnuansilpa, and A. Chamratrithirong. 1984. “A Multi–Level Model of
Family Planning Availability and Contraceptive Use in Rural Thailand.” Demography 21(4): 559–74.
Fisher, A., J. Laing, J. Stoeckel, and J. Townsend. 1991. Handbook for Family Planning Operations
Research. New York: Population Council.
Fisher, A., B. Mensch, R. Miller, S. Askew, A. Jain, C. Ndeti, L. Ndhlovu, and P. Tapsoba. 1992. Guidelines
and Instruments for a Family Planning Situation Analysis Study. New York: The Population Council.
Freedman, R. and J.Y. Takeshita. 1969. Family Planning in Taiwan. Princeton, NJ: Princeton University
Press.
Garate, M.R., et. al., “Comparison Between Two Payment Models to Physicians in Two Private Family
Planning Agencies in Peru: Final Report,” The Population Council, Lima, Dec. 1993.
Garcia–Nuñez, J. 1992. Improving Family Planning Evaluation: a Step–by–step Guide for Managers and
Evaluators. West Hartford, Connecticut: Kumarian Press, Inc.
Gertler, P. and J. Molyneaux. 1994. “How Economic Development and Family Planning Programs
Combined to Reduce Indonesian Fertility.” Demography, 31(1): 33–63.
Graham, W., W. Brass, and R.W. Snow. 1989. “Estimating Maternal Mortality.” International Family
Planning Perspectives, 20(3):125–35.
99
References
Gray, R.H., H.D. Kalter, and P. Barass. 1990. “The Use of Verbal Autopsy Methods to Determine Selected
Causes of Death in Children.” Occasional Paper No. 10. Baltimore, MD: Institute for International
Programs, Johns Hopkins University.
Grosskurth, H., F. Mosha, J. Todd, et al. 1995. “Impact of Improved Treatment of Sexually Transmitted
Diseases on HIV Infection in Rural Tanzania: Randomized Control Trial.” Lancet: 346:530–536.
Guilkey, D. and S. Cochrane. 1994. “Zimbabwe: Determinants of Contraceptive Use at the Leading
Edge of Fertility Transition in Sub–Saharan Africa.” Chapel Hill, NC: Carolina Population Center.
Hermalin, A. 1979. “Multivariate Areal Analysis.” in United Nations Manual IX: The Methodology of
Measuring the Impact of Family Planning Programmes on Fertility. New York: Department of
International Economic and Social Affairs, pp. 97–111.
Hermalin, A. 1975. “Regression Analysis of Areal Data.” in Chandrasekaran, C. and A. Hermalin (eds.),
Measuring the Effect of Family Planning Programs on Fertility. Dolhain, Belgium: Ordina Editions,
pp. 245–99.
Hermalin, A. 1982. “Some Cautions in the Use and Interpretation of Regression Analysis for the
Evaluation of Family Planning Programs.” in United Nations Evaluation of the Impact of Family
Planning Programmes on Fertility: Sources of Variance. New York: Department of International
Economic and Social Affairs, pp. 265–67.
Jain, A. and J. Bruce, 1994. “A Reproductive Health Approach to the Objectives and Assessment of
Family Planning Programs,” in Sen, G. et al. (eds.), Population Policies Reconsidered: Health,
Empowerment, and Rights, Boston, MA: Harvard University Press. pp. 103–209.
Janowitz, B. and J. H. Bratt. 1992. “Costs of Family Planning Services: A Critique of the Literature.”
International Family Planning Perspectives, 18: 137–144.
Janowitz, B. and J.H. Bratt. 1994. Methods for Costing Family Planning Services, New York: UNFPA,
and Research Triangle Park, NC: Family Health International.
Jolly, C. and J. Gribble. 1993. “The Proximate Determinants of Fertility.” in Foote, K., K. Hill, and L.
Martin (eds.) Demographic Change in Sub–Saharan Africa. Washington, DC: National
Academy Press.
Kalter, H.D., R.H. Gray, R.E. Black, and S.A. Gultiano. 1990. “Validation of Postmortem Interviews to
Ascertain Selected Causes of Death in Children.” International Journal of Epidemiology, 19:380–6.
Koblinsky, M., K. McLaurin, P. Russell–Brown, P. Gorbach (eds.), “Final Report of the Subcommittee
on Safe Pregnancy.” 1995. Indicators for Reproductive Health Program Evaluation. Chapel Hill,
NC: The EVALUATION Project.
100
References
Lloyd, C. and J. Ross. 1989. “Methods for Measuring the Fertility Impact of Family Planning Programs:
The Experience of the Last Decade.” Research Division Working Papers, No. 7. NY: The Population
Council.
McInerney, M., and C. de la Quintana, “A Comparative Study of Three Strategies to Improve the
Sustainability of a Bolivian Family Planning Provider,” The Population Council, La Paz, 1994.
Mensch, B., A. Jain, et al. 1994. “Assessing the Impact of Family Planning Services on Contraceptive
Use in Peru: A Case Study Linking Situation Analysis Data to the DHS.” Paper presented at the 1994
Annual Meeting of the Population Association of America, Miami, FL.
Mertens, T., M. Carael, P. Sato, J. Cleland, H. Ward, and G.D. Smith. 1994. “Prevention Indicators
for Evaluating the Progress of National AIDS Programmes.” AIDS 8:1359–1369.
Miller, R., K. Miller, L. Ndhlovu, J. Solo, and O. Achola. 1996. “A Comparison of the 1995 and 1989
Kenya Situation Analysis Study Findings.” New York: The Population Council (unpublished
manuscript).
Mroz, T.A., and D.K. Guilkey. 1992. “Discrete Factor Approximations for Use in Simultaneous
Equation Models with Both Continuous and Discrete Endogenous Variables.” Chapel Hill, NC:
The EVALUATION Project.
Newman, J. 1988. “A Stochastic Dynamic Model of Fertility.” In Schultz, T.P. Research in Population
Economics. New York: JAI Press, Inc., pp. 41–68.
Ojeda, G., R. Murad, F. Leon, “Testing Pricing/Payment Systems to Improve Access and Cost–Recovery
from Norplant®: Final Report,” The Population Council, Lima, May, 1994.
Phillips, J.F., W.S. Stinson, S. Bhatia, M. Rahman, and J. Chakraboty. 1982. “The Demographic Impact
of the Family Planning–Health Services Project in Matlab, Bangladesh.” Studies in Family Planning
13(5):131–140.
Poston, D., and B. Chu. 1987. “Socioeconomic Development, Family Planning, and Fertility in China.”
Demography. 24(4): 531–51.
Reinis, K. 1992. “The Impact of Proximate Determinants of Fertility: Evaluating Bongaarts’s and Hobcraft
and Little’s Methods of Estimation.” Population Studies, 46(2): 309–326.
Reynolds, J. 1993. Cost Analysis, Primary Health Care Management Advancement Programme, Module
8, Users Guide. Washington, DC: Aga Khan Foundation and University Research Corporation.
Reynolds, J. 1970. A Framework for the Design of Family Planning Program Evaluation. International
Institute for the Study of Human Reproduction: New York.
Reynolds, J. and K. C. Gaspari. 1985. Cost–effective Analysis. Chevy Chase: Primary Health Care
Operations Research Project (PRICOR).
101
References
Robey, B., S.O. Rutstein, L. Morris, and R. Blackburn. 1992. “The Reproductive Revolution: New Survey
Findings.” Population Reports, Series M. No.11.
Rossi, P.H., and H. Freeman 1993. Evaluation: A Systematic Approach. Newbury Park, CA: Sage
Publications.
Rossi, P.H., J.D. Wright, and A.B. Anderson. 1983. Handbook of Survey Research. New York:
Academic Press.
Senderowitz, J. 1995. Adolescent Health: Reassessing the Passage of Adulthood. Bank Discussion
Paper. World Bank, Washington, D.C.
Sherris, J. D., K. A. London, S. H. Moore, J. H. Pile and W. B. Watson, 1985. “The Impact of Family
Planning Programs on Fertility.” Population Reports, XIII, 1:J733–J771.
Stewart, L. and E. Eckert (eds.), 1995, “Indicators for Adolescent Reproductive Health Services,” in Tsui,
A. and J. Bertrand (eds.), Indicators for Reproductive Health Program Evaluation. Chapel Hill, NC:
The EVALUATION Project.
Suarez, E., and C. Brambila, “Cost Analysis of Family Planning Services in Private Family Planning
Programs, FEMAP, Mexico: Final Report,” The Population Council, Mexico City, June 1994.
Tsui, A.O. and P.D. Gorbach, forthcoming in 1996. Framing Family Planning Program Evaluation:
Cause, Logic and Action. Chapel Hill, NC: The EVALUATION Project.
United Nations. 1982. Evaluation of the Impact of Family Planning Programmes on Fertility: Sources
of Variance. New York: Department of International Economic and Social Affairs.
United Nations. 1986. Manual IX Addendum: The Methodology of Measuring the Impact of Family
Planning Programmes on Fertility. New York: Department of International Economic and Social
Affairs.
United Nations. 1979. Manual IX: The Methodology of Measuring the Impact of Family Planning
Programmes on Fertility. New York: Department of International Economic and Social Affairs.
United Nations. 1985. Studies to Enhance the Evaluation of Family Planning Programmes. New York:
Department of International Economic and Social Affairs.
USAID (United States Agency for International Development). 1995. “The Agency’s Strategic
Framework and Indicators 1995/96,” Performance Measurement and Evaluation Division, Center
for Development Information and Evaluation, Bureau for Policy and Program Coordination.
Veney, J.E. and P. Gorbach, 1993. “Definitions for Program Evaluation Terms.” Chapel Hill, N.C.:
EVALUATION Project (Working Paper Series No. WP–TR–01).
Vian, T. 1993. “Analyzing Costs for Management Decisions,” Family Planning Manager 2(2):1–18.
Wawer, M.J., R.H. Gray, T.C. Quinn, N.K. Sewankambo, F. Wabwire–Mangen, D. Serwadda, L. Paxton,
1995. “Design and Feasibility of Population–based Mass STD Treatment, Rural Rakai District,
Uganda.” Paper presented at the 1995 Annual Meeting of the International Society for STD
Research, New Orleans, LA, August 1995.
102
References
Wishik, S.M., and K.H. Chen. 1973. Couple–Year of Protection: A Measure of Family Planning Program
Output. International Institute for the Study of Human Reproduction: New York.
Woodhouse, G. 1995. A Guide to MLn for New Users. London: University of London, Institute of
Education.
World Health Organization. 1989. “Contribution to the Working Paper of UNESCO.” Compiled for the
World Youth Congress, Barcelona.
World Health Organization. 1993. Indicators to Monitor Maternal Health Goals. Report of a Technical
Working Group, Geneva, Nov. 8–12, 1993.
103
Appendix A
Appendix A
M U LT I L E V E L M O D E L R E G R E S S I O N F O R M AT S
104
Appendix A
The cross–sectional equations for two survey rounds may be written as follows (for the sake of simplicity,
all interaction terms have been omitted):
Yij1 = α + βPi1 + ΓZ i1 + γX ij1 + µ i1 + εij1 , and
Yij2 = α + βPi2 + ΓZ i2 + γX ij2 + µ i2 + εij2
where:
Yij = time–varying outcomes;
Pi = time–varying program variables;
Zi = time–varying community characteristics;
Xij = time–varying individual characteristics;
µi = fixed unobservable community–level characteristics;
εij = random error;
the “1’s” and the “2’s” refer to survey rounds; and
α, β, Γ, and γ are parameters to be estimated.
By differencing the two equations, we obtain:
Yij2 – Yij1 = α + θ(Pi2 – Pi1) + ξ(Z i2 – Z i1)+ φ(X ij2 – X ij1 )+ (eij2 – eij1)
Because fixed parameters are invariant during the study period, they drop out of the difference equation.
Of primary interest for program evaluation purposes is the “φ “ parameter, which measures the relative
importance of changes in program variables in explaining observed changes in outcome variables during
the time period studied. 56
56 Note that where the multilevel panel model is applied to successive surveys in the same sample clusters,
changes in individual/household–level variables pertain to aggregate changes in these characteristics at the
community level.
105