52090
Making Smart Policy: Using Impact
  Evaluation for Policy Making
Case Studies on Evaluations that Influenced
                  Policy

                June 2009
Acknowledgement

This publication reviews the experiences presented in the conference "Making Smart Policy:
Using Impact Evaluation for Policymaking", held in January 2008. The editors, Michael
Bamberger and Angeli Kirk, would like to thank the presenters for their thoughtful and
honest insight into their impact evaluation experiences: Orazio Attanasio, Antonie de Kemp,
Jocelyne Delarue, Pascaline Dupas, Joseph Eilor, Deon Filmer, Emanuela Galasso, John
Hoddinott, Michael Kremer, Emmanuel Skoufias, Miguel Urquiola, Dominique Van De
Walle, Adam Wagstaff. They also thank the chairs of each of the four parallel sessions: Judy
Baker, Sustainable Development; Halsey Rogers, education; Norbert Schady, CCTs; and
Charles Teller, health; as well as Elizabeth King, who chaired the plenary session that
brought together the lessons from the parallel sessions. This note was task managed by
Emmanuel Skoufias.




                                                                                      2
                                                                    TABLE OF CONTENTS:

1.        OVERVIEW ......................................................................................................................................... 4
     A.      PRESENTATION FORMAT ..................................................................................................................... 5
     B.      CONCEPTUALIZING UTILIZATION AND INFLUENCE.............................................................................. 7
     C.      REVIEWING THE EVIDENCE: HOW WERE THE EVALUATIONS UTILIZED AND WHAT KINDS OF
             INFLUENCE DID THEY HAVE ? ............................................................................................................. 9
     D.      FACTORS AFFECTING EVALUATION UTILIZATION AND INFLUENCE ....................................................13

2.        EDUCATION ......................................................................................................................................29
     A.      INTRODUCTION .................................................................................................................................29
     B.      GETTING GIRLS INTO SCHOOL: EVIDENCE FROM A SCHOLARSHIP PROGRAM IN CAMBODIA ...............29
     C.      IMPACT EVALUATION OF PRIMARY EDUCATION IN UGANDA ............................................................31
     D.      THE EFFECTS OF GENERALIZED SCHOOL CHOICE ON ACHIEVEMENT AND STRATIFICATION:
             EVIDENCE FROM CHILE'S VOUCHER PROGRAM ................................................................................33

3.        ANTI-POVERTY AND CONDITIONAL CASH TRANSFER (CCT) PROGRAMS...................38
     A.      INTRODUCTION .................................................................................................................................38
     B.      EVALUATING A CONDITIONAL CASH TRANSFER PROGRAM: THE EXPERIENCE OF FAMILIAS EN
             ACCION IN COLOMBIA .......................................................................................................................38
     C.      THE ROLE OF IMPACT EVALUATION IN THE PROGRESA/ OPORTUNIDADES PROGRAM OF MEXICO 40
     D.      ASSESSING SOCIAL PROTECTION TO THE POOR: EVIDENCE FROM ARGENTINA ................................42

4.        HEALTH .............................................................................................................................................47
     A.       EVALUATION OF INSECTICIDE-TREATED NETS IN KENYA ..................................................................47
     B.       KENYAN DEWORMING EXPERIMENT .................................................................................................49
     C.       CHINA: VOLUNTARY HEALTH INSURANCE SCHEME..........................................................................50

5.        SUSTAINABLE DEVELOPMENT ..................................................................................................55
     A.       INTRODUCTION .................................................................................................................................55
     B.       IMPACT EVALUATIONS OF MICROFINANCE INSTITUTIONS IN MADAGASCAR AND MOROCCO ...........55
     C.       ETHIOPIA'S FOOD SECURITY PROGRAM ............................................................................................57
     D.       RURAL ROADS IN VIETNAM ..............................................................................................................59

6. LESSONS LEARNED: STRENGTHENING THE UTILIZATION AND INFLUENCE OF
IMPACT EVALUATION ...........................................................................................................................65
     A.       HOW ARE IMPACT EVALUATIONS USED?............................................................................................65
     B.       WHAT KINDS OF INFLUENCE CAN IMPACT EVALUATIONS HAVE? .......................................................65
     C.       GUIDELINES FOR STRENGTHENING EVALUATION UTILIZATION AND INFLUENCE ................................66
     D.       STRATEGIC CONSIDERATIONS IN PROMOTING THE UTILIZATION OF IMPACT EVALUATIONS ...............70

              ANNEX 1 ...........................................................................................................................................73




                                                                                                                                                                3
    1. Overview
Impact evaluation has blossomed in recent years as a powerful tool for enhancing
development effectiveness. The numbers of both evaluations and methodologies have
multiplied very quickly. This growth, however, has been uneven both geographically and
across sectors, leading to questions of how to bolster impact evaluation in regions and
sectors where it is least common and perhaps most needed. Additionally, as methods
mature and the collection of evidence accumulates, the conversation is expanding to
include reflection on how we ­ development practitioners, policy makers, and researchers
alike ­ can assure that impact evaluation reaches its potential for influencing project and
policy design. A key question is, how can we strategically use scarce evaluation
resources more effectively? That is, how can we ensure that impact evaluations are better
utilized and more influential?

To explore these issues, the World Bank, with support from DFID and the Government of
the Netherlands, held a conference Making Smart Policy: Using Impact Evaluation for
Policymaking in January 2008.1 One session - Evidence and Use: Parallel Sector
Sessions - brought together 12 case studies to ground the discussion in concrete examples
of impact evaluations that have been completed and to provide researchers' perspectives
on the ways in which they had been influential ­ or not ­ and why. Evidence and Use
comprised four separate thematic sessions: education, conditional cash transfers (CCTs),
health and sustainable development.

This publication reviews the experiences presented in the conference session and draws
lessons concerning different ways that impact evaluations are utilized and how they can
contribute to improving program design and policy formulation. The overview chapter
begins by describing the structure and general content of the conference presentations and
proposing a framework for considering utilization and influence. It then briefly describes
the evaluations and pulls together the most salient examples of how they were used and
the type of influence they had. Finally, lessons are drawn on ways to enhance evaluation
utilization and its contribution to program design and policy.

It should be noted that the primary purpose of the report is not the discussion of the
impact evaluation methodology. Nevertheless, the overview chapter includes a chart
summarizing the evaluation designs and findings, as there can be linkages among the
policy questions being addressed, how an evaluation was designed, the findings and how
they were communicated, and how the evaluation contributed to program design and
policy formulation.

The remaining chapters are devoted to more in-depth syntheses of the case studies
presented in the workshop with respect to the evaluation of education, anti-poverty
programs, health, and sustainable development, with a final chapter on lessons learned.



1
 The conference website, with videos of the sessions and supplementary material, may be found at:
www.worldbank.org/iepolicyconference.


                                                                                                    4
    A. Presentation format

Presenters in each session of Evidence and Use: Parallel Sector Sessions were asked to
reflect on an impact evaluation experience. As well as briefly describing the project,
evaluation design, and findings, the speakers discussed the dissemination process, how
the evaluation findings were utilized, and what kinds of influence they had. Interestingly,
while the focus was meant to be on utilization and impact rather than project details or
evaluation technique, the distinction proved somewhat artificial, as details of the design
and context of both the project and the evaluation were often central to the use and
influence or lack thereof. After the presentations, the groups reflected on the general
lessons that could be drawn from the case studies concerning the different kinds of
contributions that evaluations can make to program management and policy formulation.
Guidelines were then proposed on ways to increase the utilization and influence of
evaluations for development programs and policies.

There is an important caveat: this report does not offer an "impact evaluation of impact
evaluations". It is difficult to interpret associations between the conduct of an evaluation
and its recommendations on the one hand, and causal relations ­ changes in program
design or an increased use of research by policymakers ­ on the other. (For example, did
evaluations of the education system in Uganda lead to increased appreciation and use of
the management information system; or was the evaluation conducted and used because
there was already an awareness of the value of research and statistics?). The evidence and
recommendations concerning evaluation utilization are drawn from the impressions and
observations presented by the researchers who conducted the evaluations and the
subsequent discussions with workshop participants. In only one case (Uganda Education
for All) was a representative of the host country partner agency present. None of the
evaluators had conducted systematic studies on the utilization of their evaluations (such
as interviews with stakeholders), and no kind of attribution analysis was conducted. It is
quite possible that evaluators may not be fully aware of how the evaluations were used,
and their reflections on their experiences might introduce a certain bias.


Box 1: The programs, the evaluation designs and the main findings
Table 1 (end of chapter) describes the programs, the key evaluation questions and the main
findings of each evaluation, and Table 2 summarizes the evaluation designs. More details are
given in the following chapters.

Education. The objectives of the education programs in Cambodia and Uganda were to increase
school enrolment and retention for low-income students, particularly girls; and in the case of
Uganda to also improve education quality. The program in Chile, which already had very high
enrolment rates, was intended to improve quality for low-income students through increased
access to private education. In addition, all of the programs sought to enhance the efficiency of
program management. Each of the evaluations was also intended to assess the effectiveness of
specific interventions such as vouchers, scholarships and management training, in enhancing
enrolment and/or improving quality.




                                                                                               5
Impact evaluation designs included retrospective comparisons, regression discontinuity, using
data from management information systems to measure changes over the life of the project, and
using secondary data to match project and comparison groups through propensity score matching.

The findings showed that in both Cambodia and Uganda enrolment and retention increased for
low-income families. However, the quality of education remained low, although pilot projects in
Uganda, focusing on management training showed promising results with respect to quality
improvement. In Chile, contrary to popular belief, there was no evidence that vouchers improved
educational outcomes. However, the "sorting" mechanisms that resulted from the scholarship
programs meant that better qualified students tended to move to private schools ­ an outcome that
was not intended and that had negative consequences for public schools and perhaps for low-
income students.

Anti-poverty programs. The programs in Mexico (PROGRESA/Oportunidades) and Colombia
(Families en Accion) were conditional cash transfer (CCT) programs providing cash payments to
low-income families on the condition that their children enrolled in school and went for regular
health check-ups (and in the case of Mexico also received nutritional supplements). The two
programs were quite similar in many ways, and in fact both the program design and the
evaluation design of Familias en Accion drew on the experience of the Mexican programs. The
Argentina Emergency Safety Net (the "Jefes") program provided cash payments to under-
employed heads of low-income households to mitigate the impact of the 2000-2002 economic
crisis. Household heads of poor families received monthly cash payments on the condition that
they attended education or training programs or participated in community public works
programs. While the Jefes program could also be considered a CCT as beneficiaries were
theoretically required to attend training or participate in community improvement projects, in
practice this requirement was often not enforced and the Safety Net was widely considered as an
entitlement program (i.e. participants were entitled to receive the payments without any
conditionality).

All three evaluations used experimental or strong quasi-experimental designs. Mexico used
randomized control trials (RCT) for selection of beneficiaries at each phase. Colombia and
Argentina each used propensity score matching (PSM); in Colombia, recipients were matched to
households in ineligible areas and, in Argentina, participants were matched to applicants who
had not yet been chosen to participate.

The Colombia and Mexico evaluations both found that CCTs increased school enrolment and
access to health services. All three evaluations found that they were effective in reducing the
proportion of the population below the poverty line or, in the case of Argentina, effective in
preventing families from falling below the poverty line. However, the impacts varied by factors
such as student age and urban/rural location.

Health. The health programs comprised insecticide treated mosquito nets in Kenya to reduce the
incidence of malaria; school deworming in Kenya to reduce school absenteeism due to sickness;
and a health insurance scheme in China. The goals of the China program were to reduce out-of-
pocket expenses by patients, to encourage greater use of preventive care, to reduce excessive use
of high-tech services and to encourage the use of health services.

The two Kenyan programs used randomized control trial evaluation designs. The China
evaluation was integrated with a large government health sector evaluation and used double-
difference analysis with propensity score matched samples.



                                                                                               6
The evaluations of both Kenyan programs found that ability to pay was a key factor in utilization.
Efforts to introduce cost-recovery significantly reduced coverage ­ in the case of the insecticide
net program, free distribution resulted in a 63 per cent coverage rate compared to 14 per cent
compared to the highest price. Deworming participation also dropped dramatically when parents
were asked to pay even small amounts. The evaluation of the China health insurance program
found that utilization had increased but that out-of-pocket payments did not decrease. Facilities
data found that revenue had increased more than utilization. The results showed that medical
insurance is not guaranteed to decrease expenses, leading to questions about the level of care
provided and whether services were selected because of medical necessity or for revenues.

Sustainable development. The programs comprised microfinance programs in Morocco and
Madagascar, food security in Ethiopia and the rehabilitation of rural roads in Viet Nam. All four
programs were intended to achieve sustainable reductions in poverty.

Three of the evaluation designs used retrospective comparisons with different levels of rigor in
the matching of the project and comparison group samples. The fourth (Morocco) used
randomized control trials.

The findings showed that the Viet Nam roads program was successful in diversifying and
strengthening livelihoods but the scope was more limited than planned. The Ethiopia Food
Security program also achieved its main objectives but failed to achieve integration with other
complementary programs. Microfinance in Madagascar was not found to have an impact on
economic trends among clients. Findings for microfinance in Morocco are still forthcoming.


    B. Conceptualizing utilization and influence

When assessing the use and utility of an evaluation, it is helpful to consider two
components: we term them "utilization" and "influence."

Utilization: How were the evaluation findings (and even the process) used - by whom and
for what purpose? The first uses that generally come to mind are those related to impact
evaluation as an assessment tool. For example, one may conduct an evaluation in order
to:
     monitor project implementation,
     measure the benefits of an existing program and check for unanticipated side
        effects,
     assess the distribution of participation and benefits across different segments of
        the target population,
     make informed changes and improvements to an ongoing project,
     test options for the design of a project that will be implemented in the future, and
     compare the cost-effectiveness or benefit/cost ratio of alternative programs for
        budget planning purposes.

In practice, however, impact evaluations are also very commonly used as a political tool.
They are frequently employed to:
     provide support for decisions that agencies have already decided upon or would
        like to make,


                                                                                                7
        mobilize political support for high profile or controversial programs,
        provide independent support (the international prestige and perceived
        independence of the evaluator is often important) for terminating a politically
        sensitive program, and
        provide political or managerial accountability.

In fact, in the end it is likely to be the potential political benefit or detriment that causes
decision makers to embrace or avoid evaluations. As a result, those who would like to
promote impact evaluation as an assessment and learning tool will have to be fully aware
of the given political context and navigate strategically.

Influence: In assessing the influence of an impact evaluation, there are a number of
aspects one might consider:
     What causes or facilitates an impact evaluation's influence? It is important to
       remember that it is not only the findings of an impact evaluation that can have an
       impact. The decision to conduct an evaluation, the choice of methodology, and
       how the findings are disseminated and used can all have important consequences
       ­ some anticipated, others not; some desired and others not. For example, the
       decision to conduct an evaluation using a randomized control trial can influence
       who benefits from the program, how different treatments and implementation
       strategies are prioritized, what is measured and the criteria used to decide if the
       program had achieved its objectives.2 In other cases, if findings are presented in a
       manner that is too technically complex for its audience, decision makers may
       either misinterpret the findings, leading to misinformed choices, or ignore the
       findings altogether.

        Where can the evaluation's influence be seen? Some possibilities include
        administrative realms such as program design and scope, or the political realm in
        the form of popular support for a program or its associated politicians. One may
        also consider the resulting perceptions and understanding of impact evaluation, by
        policymakers and project administrators as well as by researchers who conduct
        future evaluations. For high profile programs, the influence of the evaluation may
        also be seen in how the debate on the program is framed in the mass media.

        How much influence did the evaluation have on the decisions and actions of
        managers, planners and policymakers? Did it have a major influence, or did it
        only corroborate what was already known or support decisions that had already
        been made? That is, to what degree have any decisions actually been made
        differently as a result ­ has the impact evaluation had any impact? Decision-
        makers are exposed to many different sources of information, advice and
        pressure, of which the evaluation is only one ­ and usually not the most
        significant.

2
 A frequently cited example from the US was the decision to assess the performance of schools under the
No Child Left Behind program in terms of academic performance measured through end-of-year tests.
This meant that many schools were forced to modify their curricula to allow more time to coach children in
how to take the tests, often resulting in reduced time for physical education, arts, and music.


                                                                                                         8
While utilization and influence are distinct as concepts, in practice they are often ­
though not necessarily ­ found to overlap. For example, if an evaluation is utilized to
determine the most effective project design, then the influence may be that a future
project is chosen based on strong evidence rather than on other criteria. On the other
hand, there are times when the influence of an evaluation does not reflect its utilization,
such as a number of cases in which an evaluation was used to gain political support (the
utilization) but in the process, impact evaluation in general came to be viewed as an
important and even necessary tool (an impact).


   C. Reviewing the evidence: how were the evaluations utilized and what
      kinds of influence did they have ?

The following brings together the utilization and influence that were observed in the 12
case studies. In most cases the information is based on the perceptions and experience of
the evaluators themselves, although in one case a representative of the government client
agency was also present. The types of use and influence seen in the presented cases can
be broadly grouped into three categories: project implementation and administration,
political support, and the process and culture of evaluation itself.

Project implementation and administration:
    Evaluations were often used for the design of future programs. They provided
       specific operational guidance or general guidance for the strategic focus. They
       often helped identify logistical and administrative problems that had been
       overlooked.
    The Ethiopian food security evaluation identified a number of process failures,
       although they still found positive impacts, and authorities found it useful to have
       learned that there were process problems, as these were practical issues that could
       be addressed. In Ethiopia and in China's health insurance evaluation, "bad news"
       was delivered sufficiently early so that it didn't just condemn a completed project
       ­ but instead provided practical guidance for improvements.
    Several cases were cited where the extensive dissemination of evaluation findings
       also served to raise the profile of the category of programs being evaluated.
       Examples include deworming and conditional cash transfers.
    Evaluations help clients understand their programs in a broader context.
       Evaluations helped identify broader systemic implications of programs and
       contributed to understanding of local contextual factors affecting how projects
       operate in different districts or locations.
    Several evaluations have made specific contributions to choices among policy
       alternatives. For examples, two health evaluations in Kenya helped convince
       government and donors to provide free anti-malarial bednets and deworming in
       schools rather than to seek cost-recovery by charging.




                                                                                         9
Political support:
 Evaluations are often used to justify continued funding for a program, or to ensure
   political support for a new or expanded program. The evaluations of the first
   CCTs in Mexico and Colombia are both considered to have helped convince new
   administrations to continue high profile programs started by their predecessors.
   In several cases an evaluation was used to justify a new program, even when in
   fact the evaluation findings did not support this new program (for example,
   expansion of the Colombian Familias en Accion program from rural to urban
   areas).

Culture of and capacity for impact evaluation:
 Evaluations that are favorably received by clients often lead to increased interest
   in further evaluations. Well designed and implemented evaluations have helped
   legitimize evaluation as a useful planning or policymaking tool. Initially many
   clients or local districts were either skeptical about an evaluation's utility or were
   afraid that the findings would be too negative or critical. In several cases attitudes
   became more positive and utilization increased as the evaluations progressed. Not
   surprisingly, it was much easier to gain acceptance for the evaluation process and
   findings when the findings were mainly positive. Well received evaluations often
   lead to follow-up evaluations to assess more specific issues that had been
   identified.
 There were, however, examples, where initial negative findings created reluctance
   to accept or use an evaluation, but where attitudes gradually became more
   favorable. The health insurance evaluation in China was very poorly received in
   the beginning because it showed negative results on the primary objective of
   reducing out-of-pocket health care expenditures (though positive results for a
   secondary objective of increasing use of health care services). In the end, though,
   authorities accepted the results and were able to use them to make some reforms
   (especially increased funding), and the process seemed to have increased general
   acceptance of impact evaluation as a tool.
 The Ethiopian food security evaluation identified a number of process failures,
   although they still found positive impacts, and authorities found it useful to have
   learned that there were process problems, as these were practical issues that could
   be addressed. Again, in both Ethiopia and China, however, "bad news" was
   delivered sufficiently early so that it didn't just condemn a completed project but
   instead provided practical guidance for improvements.
 Several cases were cited where the extensive dissemination of evaluation findings
   also served to raise the profile of the kinds of programs being evaluated.
   Examples include deworming and conditional cash transfers.
 Several well designed and well received evaluations have contributed to the
   development of a culture of evaluation and a move towards the institutionalization
   of evaluations rather than the ad hoc and fortuitous way in which earlier
   evaluations were selected and funded. Once the benefits of well designed
   evaluations became understood, this helped raise expectations concerning the
   level of rigor required in future evaluations. Methodologies, such as randomized



                                                                                      10
   control trials, double-difference designs, or regression discontinuity provided
   models that were then replicated in other program areas.
   Where several sequential evaluations were conducted, the effect on client
   attitudes toward and use of evaluation is cumulative, and clients have learned to
   demand the kinds of information that they need and can use.
   A strengthened culture of evaluation can also stimulate evaluation capacity
   development, in some cases strengthening government research agencies such as
   the statistics bureau, in other cases training to improve the quality of monitoring
   data collection and use.




                                                                                   11
      evence new program.
Figure 1: The influence and utilization of impact evaluations

                                    Demand for follow-up studies
                                    Demand for new evaluations
    Created demand for further      Demand for more rigorous evaluations
    and more rigorous               Demand to develop evaluation systems for particular
    evaluations
                                    sectors
                                    Demand for methodologies to evaluate pilot projects
                                    Appreciation of the value of independent, external
                                    evaluations




                                    Development of more rigorous evaluations
    Strengthening impact            Strengthened MIS and data quality
    evaluation methodology          Demonstrated the value of contextual analysis
                                    Introducing evaluation capacity development
                                    Institutionalization of evaluation
                                    Strengthening quality of international evaluation




    Strengthening program           Strengthening program implementation
    design and implementation       Strengthening the design of future projects
                                    Assessing the cost-effectiveness of alternative
                                    interventions
                                    Implementation and evaluation of pilot projects before
                                    launching major interventions


    Providing evidence to
    support programs and
    justify replication             Providing evidence to respond to critics
                                    Justifying continuation of program
                                    Justifying continued inclusion of particular component
                                    Raised profile of a program/intervention

    Providing evidence to
    challenge a program

                                    Used by donors to challenge program they do not agree
                                    with


    Involves wider group of
    stakeholders
                                    Provides basis for stakeholder engagement with
                                    policymakers and implementers


                                                                                        12
    D. Factors affecting evaluation utilization and influence

The following is a synthesis of the broad range of factors identified in the presentations
as potentially affecting evaluation utilization.

Timing and focus on priority stakeholder issues:
    The evaluation must be timely and focus on priority issues for key stakeholders.
       This ensures there is a receptive audience. Timing often presents a trade-off: on
       the one hand, designing an evaluation to provide fast results relevant for the
       project at hand, in time to make changes in project design and while the project
       still has the attention of policymakers. On the other hand, evaluations that take
       longer to complete may be of higher quality and can look for longer term effects
       on the design of future projects and policies.
    The evaluator must be opportunistic, taking advantage of funding opportunities,
       or the interest of key stakeholders. Several countries that have progressed toward
       the institutionalization of evaluation at the national or sector level began with
       opportunistic selection of their first impact evaluations3.
    The evaluator should always be on the look-out for "quick-wins" ­ evaluations
       that can be conducted quickly and economically and that provide information on
       an issue of immediate concern. Showing the practical utility impact evaluations
       can build up confidence and interest before moving on to broader and more
       complex evaluations.
    Also, there is value in firsts. Pioneer studies may not only be useful for showing
       the impact of the intervention, but in a broader context they may also change
       expectations about what can and should be evaluated or advance the methods that
       can be used. Again, even less-than-ideal evaluations that are first or early in their
       context may contribute by building interest in and capacity for impact evaluation.
    A series of sequential evaluations gradually builds interest, ownership and
       utilization.

Effective dissemination
     Rapid, broad and well targeted dissemination are important determinants of
        utilization. One reason that many sound and potentially useful evaluations are
        never used is that very few people have ever seen them.
     Making data available to the academic community is also an important way of
        broadening interest and support for evaluations and also of legitimizing the
        methodologies (assuming they stand up to academic critiques as have
        PROGRESA and Familias en Accion).


3
  See IEG (2008) Institutionalizing Impact Evaluation within the Framework of a Monitoring and
Evaluation System. The Education for All evaluations in Uganda were cited as an example of
institutionalization at the sector level and the SINERGIA evaluation program under the Planning
Department in Colombia is an example of institutionalization of a national impact evaluation system. The
report is available at:
http://lnweb90.worldbank.org/oed/oeddoclib.nsf/DocUNIDViewForJavaSearch/E629534B7C677EA78525
754700715CB8/$file/inst_ie_framework_me.pdf, or at www.worldbank.org/ieg/ecd.
Figure 2: Factors affecting evaluation utilization and influence

   Factors facilitating evaluation utilization and influence
                                                               Timeliness of the evaluation           Promote demand for more
                                                                                                      rigorous evaluation methods
                                                               Focus on priority issues for
                                                               stakeholders
                                                                                                      Positive and non-threatening
                                                                                                      evaluation findings
                                                               Effective communication and
                                                               dissemination strategies
                                                                                                      Promote systematic evaluation
                                                                                                      capacity development strategy
                                                               Active engagement of national
                                                               counterparts



                                                               Demonstrate the utility of             Demonstrate the transparency and
                                                               evaluation as a tool for policy        independence of the evaluation
                                                               makers and planners

                                                                                                      Donor pressures can (sometimes)
                                                               Convince agencies of the               help strengthen evaluation design
                                                               importance of quality M&E data


                                                               Ensure methodological rigor and        Controversial/unexpected findings
                                                               use of methods considered              can sometimes stimulate interest in
                                                               credible by stakeholders               further research/evaluations
   Challenges to evaluation utilization and influence




                                                                                                      More rigorous evaluations may
                                                               Delays in start of data collection     delay delivery of findings and
                                                               can affect the quality and             clients may lose interest or
                                                               utilization of an evaluation           withdraw support



                                                                                                      Staff turnover in donor and
                                                               Multiple donors can affect             national agencies can reduce
                                                               communication and coordination.        continuity and reduce support
                                                               Can make it more difficult to agree
                                                               on evaluation design

                                                                                                      Initially approved evaluation funds
                                                                                                      are often cut over time
                                                               Variations in the technical level of
                                                               counterparts makes it more
                                                               difficult to define the technical
                                                               level of evaluation reports            Not all agencies welcome the
                                                                                                      accountability that evaluations can
                                                                                                      bring


                                                                                                                               14
Providing rapid feedback to government on issues such as the extent of corruption or
other "hot" topics enhances utilization.
    Continuous and targeted communication builds interest and confidence and also
        ensures "no surprises" when the final report and recommendations are submitted.
        This also allows controversial or sensitive findings to be gradually introduced.
        Trust and open lines of communication are important confidence builders.
    Where there is existing demand for a particular evaluation, the results may
        partially disseminate themselves and may be more likely to be used.

Clear and well communicated messages
    Clarity and comprehensibility increase use. It helps when the evaluation results
       point to clear policy implications. This may also apply to the comprehension of
       methods. While stakeholders may be willing to "trust the experts" if an evaluation
       offers results that support what they want to hear, there may be a reasonable
       tendency to distrust results ­ and particularly methods ­ that they don't
       understand.

Active engagement with national counterparts
     The active involvement of national agencies in identifying the need for an
        evaluation, commissioning it, and deciding which international consultants to use
        is central to utilization.
     Close cooperation with national counterpart agencies proves critical in several
        ways. It gives ownership of the evaluation to stakeholders and helps ensure the
        evaluation focuses on important issues. It often increases quality by taking
        advantage of local knowledge and in several cases reduces costs (an important
        factor in gaining support) by combining with other ongoing studies. This
        cooperation can enable evaluators to modify the initial evaluation design to reflect
        concerns of clients ­ for example, changing a politically sensitive randomized
        design to a strong quasi-experimental design.
     Involving a wide range of stakeholders is also an important determinant of
        utilization. This can be achieved through consultative planning mechanisms,
        dissemination and ensuring that local as well as national level agencies are
        consulted.
     In some contexts (such as the China health insurance scheme), the involvement of
        the national statistical agency increases the government's trust ­ the results and
        the process have been better accepted when overseen and presented by the
        statistics                                                                 agency.

Demonstrating the value of evaluation as a political and policymaking tool
   When evaluation is seen as a useful political tool, this greatly enhances utilization.
     For example, managers or policymakers often welcome specific evidence to
     respond to critics, support for continued funding or program expansion.
     Evaluation can also be seen as a way to provide more objective criticism of an
     unpopular program.
   Once the potential uses of planning tools such as cost-effectiveness analysis are
     understood, this increases the demand for, and use of, evaluations. Evaluations


                                                                                         15
       can also demonstrate the practical value of good monitoring data, and increased
       attention to monitoring in turn generates demand for further evaluations. When
       evaluations show planners better ways to achieve development objectives, such as
       ensuring services reach the poor, this increases utilization and influence.
       Increasing concerns about corruption or poor service delivery have also been an
       important factor in government decisions to commission evaluations. In some
       cases, a new administration wishes to demonstrate its transparency and
       accountability or to use the evaluation to point out weaknesses in how previous
       administrations had managed projects.
       Evaluations that focus on local contextual issues (i.e. that are directly relevant to
       the work of districts and local agencies) are much more likely to be used.

The methodological quality of the evaluation and credibility of the international
evaluators
    High quality of an evaluation is likely to increase its usefulness and influence.
       Quality improves the robustness of the findings and their policy implications and
       may assist in dissemination (especially in terms of publication). However, an
       impact evaluation of a compromised quality may still be useful if it can provide
       timely and relevant insight or if it ventures into new territory: new techniques,
       less-evaluated subject matter, or in a context where relevant stakeholders have
       less experience with impact evaluations.
    The credibility of international evaluators, particularly when they are seen as not
       tied to funding agencies, can help legitimize high profile evaluations and enhance
       their utilization.
    In some cases the use of what is considered "state of the art" evaluation methods,
       such as randomized control trials, can raise the profile of evaluation (and the
       agencies that use it) and increase utilization.
    New and innovative evaluations often attract more interest and support than the
       repetition of routine evaluations.
    On the other hand, while studies on the "frontier" may be more novel or attract
       more attention, subsequent related studies may be useful in confirming
       controversial findings and building a body of knowledge that is more accepted
       than a single study, especially a single study with unpopular findings.
    Evaluation methods, in addition to being methodologically sound, must also be
       understood and accepted by clients. Different stakeholders may have different
       methodological preferences.

Positive and non-threatening findings
    Positive evaluations, or those that support the views of key stakeholders, have an
       increased likelihood of being used. While this is not surprising, one of the
       reasons is that many agencies were either fearful of the negative consequences of
       evaluation or (to be honest) considered evaluation as a waste of time (particularly
       the time of busy managers) or money. Once stakeholders have appreciated that
       evaluations were not threatening and were actually producing useful findings,
       agencies have become more willing to request and use evaluations and gradually



                                                                                         16
       to accept negative findings ­ or even to solicit evaluations to look at areas where
       programs were not going well.
       There is always demand for results that confirm what people want to hear. There
       may be some benefit in taking advantage of opportunities to present good results,
       especially if it helps the process of getting stakeholders to understand and
       appreciate the role of impact evaluation. Sometimes, though, demand can be built
       despite less-positive results ­ by special efforts to target the relevant stakeholders.
       Concerns over potential negative results, bad publicity, or improper handling of
       the results may reduce demand; sensitivity, trust-building, and creative
       arrangements           may          help         overcome            these       fears.

Evaluation capacity development
    Evaluation capacity, especially at a local level, is an important factor in the
       quality of an impact evaluation that also affects the ability of stakeholders to
       demand, understand, trust, and utilize the results.
    Capacity building is an iterative process and may improve both demand and
       quality.

Pursuing easy wins alongside harder challenges
    The most effective strategy for developing a strong culture of evaluation may be
       two-pronged: opportunism where there are "easy wins" ­ willing partners, high
       capacity, good data, good results, etc., since these may require less effort and
       fewer resources and may generate familiarity with the process; and at the same
       time "chipping away" systematically at the harder problems where there is less
       capacity or less tradition of evaluation.




                                                                                           17
Table 1: The evaluation questions and the main findings for each of the evaluation
Program                        Evaluation questions                               Main findings
EDUCATION PROGRAMS
1. Cambodia: Japanese Fund          Do scholarships increase enrollment of girls from     Scholarship recipients had significantly lower socio-
for Poverty Reduction [JFPR]:      low-income families in secondary school?              economic status than non-recipients (so program was reaching
Secondary School Scholarship        Do scholarships increase retention?                  the target group)
Fund                                                                                      Recipients had approximately 30 per cent higher
Goals: Increase enrolment and                                                            enrolment and retention than non-recipients
retention of girls from poor                                                              Effect size much higher than similar programs in other
families in lower secondary                                                              countries (e.g. Progresa in Mexico)
schools



2.    Cambodia: World Bank            Assess program impacts on:                          Similar to the JFPR project (increased enrolment and
Girls    Secondary      School                      effectiveness of providing larger    retention but no effect on learning or the quality of education
Scholarship Fund [follow-up to                        scholarships to poorer girls
JFPR program]                                       retention
Goals: Improve targeting of                         learning
low-income girls                                    inter-household issues
                                                    child labor


3. Uganda: Universal Primary          Trends in attendance and learning since 2000        Progress in access to education
Education (UPE)                       Determinants of trends                              Effectiveness of investments in teachers, classrooms,
Goals: Test the effectiveness of      Size and cost-effectiveness of each intervention   books and other facilities
improved management                   Use of MIS for evaluation                           School management important
                                                                                          Investments more effective if combined with improved
                                                                                         management
                                                                                          Quality of primary education remains poor and
                                                                                         absenteeism and drop-outs high
Program                             Evaluation questions                                    Main findings
4.    Uganda: UPE.         Pilot        Effects of improved management                          Educational performance in project schools:
program in Masindi District             How does this enhance other interventions?
Goals: Test the effectiveness of                                                                    o   50-60 per cent better than control schools
improved management                                                                                     outside the district
                                                                                                    o   35 per cent than Masindi schools not covered by
                                                                                                        the project


5. Chile: Vouchers for private       Assessing the effects of vouchers on the quality of     No evidence that vouchers and increased choice
schools                             education                                               improved educational outcomes
Goals:      Improve quality of       Were changes due to improved quality or to              Vouchers did lead to sorting as better students from
education by providing low-         skimming off better students from the public schools?   public schools more likely to move to private schools
income students access to private
education and stimulating public
schools to perform better




CONDITIONAL CASH TRANSFERS [CCT] AND POVERTY REDUCTION PROGRAMS
1.      Familias en Accion:               Cost-effectiveness of increasing access of poor    Increased primary school enrolment in rural but not urban
Colombia.      Conditional cash     children to health and education                        areas
transfers promoting children's       Effectiveness of targeting mechanisms in reaching       Increased secondary school enrolment in both rural and
health    and     primary    and    the low-income target population                        urban areas
secondary school enrolment           Replicability of programs on a large scale              Some improvement in rural nutrition but very limited
Goals:     Short-term     poverty    Replicability in urban areas of programs developed     impact in urban areas
reduction through cash transfers.   in rural areas                                           Influence on diarrhea in rural but not urban areas
Long-term investment in human
capital development through
increasing access to health and
education




                                                                                                                                                          19
Program                              Evaluation questions                                    Main findings
2. Progresa/ Oportunidades:           Are CCTs cost-effective in increasing access of         Poverty targeting worked well4
Mexico.         Conditional cash     poor children to health and education?                   PROGRESA reducesby 10% people living below poverty
transfers promoting children's       Effectiveness of key program components:                line
health, nutrition and education.      Direct monetary transfers versus in-kind grants         Positive impact on school enrolment for boys and girls
Goals: As for Colombia                Targeting the extremely poor versus all families        Children entering school earlier, less grade repetition and
                                      New, standard targeting procedures versus existing     better grade progression
                                     program client lists                                     Younger children have become more robust against
                                      Transfers to households versus to communities          illness
                                      Non-discretionary rules for whole country versus        Women's role in household decision-making increases
                                     flexibility for local authorities                        Estimated cost-benefit ratio of 27%
                                      Directing benefits directly to women versus to
                                     household head
                                      Program impacts on fertility
                                      Criteria for defining size of transfer
                                      Merits of family co-responsibility and certification
3. Jefes de Familia, Emergency        Effectiveness of cash transfers as an emergency        Findings on program performance
Safety       Net        Program:     measure to aid poor families                             Eligibility criteria were poorly enforced ­ particularly
Argentina. Cash transfer for          Are programs cost-effective, efficiently managed       with respect to women not in the labor force
unemployed household heads           and relatively free of corruption?                       Targeting worked well in practice as eligibility criteria
with dependent children               Effectiveness of targeting procedures. Did they        correlated with structural poverty
Goals: Short term goal, using        reach the intended groups?                              Findings on program impact
monthly cash transfers to stop        How did households respond to the program?              Prevented 10% of families falling into extreme poverty
families falling into poverty.       Labor force participation, labor supply and household    Net income gains equal to 50-65% of cash transfer
Longer term goal of developing       division of labor                                        Foregone income greater for previously employed and for
skills to facilitate re-entry into    Impact on household income                             household head than for spouse
the labor market.                     Impact on aggregate rates of poverty                    2.5% drop in aggregate unemployment rate
HEALTH
1. Kenya: Bed net distribution        Is free distribution or cost-recovery more effective       Cost recovery did not increase distribution or use
experiment: Free vs. Cost-           for increasing distribution and use of nets?                Cost recovery appears to reduce demand
Recovery                              How price elastic is demand?
Goals: Increased distribution
and use of insecticide-treated
nets

4
    The PROGRESA findings were not reported in the conference but were taken from IFPRI (2002) PROGRESA: Breaking the Cycle of Poverty.




                                                                                                                                                            20
Program                               Evaluation questions                                       Main findings
2.        Kenya:     Deworming         Does (school-based) deworming improve worm                 Deworming pills reduce worm loads among treated
treatment       and      worm-        load?                                                      children and children nearby.
prevention health                      Does it improve schooling outcomes?                        School attendance increased; drop-outs decreased.
Messages                               Do health messages on worm-prevention induce               There were no changes in worm-prevention behaviors.
Goals: Reduced worm infections,       the preferred behaviors?                                    Cost-sharing reduced uptake.
increased prevention behaviors,        How does cost-sharing affect uptake?                       Social learning (knowing others who had taken the
improved schooling outcomes            How does social learning affect uptake?                   treatment previously) seemed to reduce uptake.
3. China: Voluntary Health             Does the health insurance scheme reduce out of             Increased household utilization of health services
Insurance Scheme                      pocket expenditures?                                        No reduction in out-of-pocket payments
Goals: Reduced out of pocket           Does it increase use of services?
healthcare          expenditures,
increased utilization of needed
health services
SUSTAINABILITY
1.        Madagascar: ADeFI            Does participation in microfinance improve                   No impact found
Microfinance         Institution.     financial turnover, production, value added, staff,
Provides      credit  to    small     capital and labor productivity and capital productivity?
businesses
Goals: Assist very small, small
and medium business to develop
their activities

2.      Morocco: Al Amana                 Activities and sales of enterprises                       Uptake rates were low
Microfinance. Provides credit to                                                                    Additional results still pending
urban areas; expanding into rural
areas
Goals: Provide access to credit
for impoverished people
3.    Ethiopia: Food Security          Effectiveness of targeting and delivery of benefits        Targeting was successful
Program. Labor-intensive public        Impacts on food security and asset growth                  Food security was improved
works      safety-net     program,     Were constructed assets considered useful by               Assets constructed through the public works projects
unconditional      transfers    for   stakeholders?                                              were considered useful
certain     vulnerable      groups,                                                               Increased borrowing for productive purposes
agricultural     assistance    and                                                                Increased use of agricultural technologies
technologies                                                                                      Frequent payment delivery delays




                                                                                                                                                         21
Program                             Evaluation questions                                      Main findings
Goals: Improved food security                                                                  Little overlap among program components, despite
and the well-being of chronically                                                             intentions
food-insecure people in rural
areas
4.     Vietnam: Rural Roads          Did the project fund achieve what it intended ­ did         Fewer km of rehabilitated roads than were intended
(1997-2001)                         resources supplement or substitute for local resources?      More new roads built
Goals: Rehabilitation of rural       Impact on market and institutional development              Improved quality of roads
roads to commune centers, to                                                                     Access to markets, goods, and services increased
link communities to markets and                                                                  Livelihood diversification
reduce poverty                                                                                   Increased primary school completion
                                                                                                 Some short-, some longer-term effects
                                                                                                 Larger impacts in poorer communes




                                                                                                                                                      22
Table 2: Summary of the evaluation designs
Sector            Evaluation designs
Education         1. Regression analysis to control for socio-economic differences between the two
programs              groups or to compare groups above and below the eligibility cut-off point for
(see Table 1 for      the maximum $60 scholarship
details)          2. Propensity score matching to create ex-post control group
                     3.   Quasi-experimental designs in which schools receiving project interventions
                          are compared with schools outside the district; and with schools in treatment
                          districts not receiving the interventions
                     4.   Retrospective (post-test) comparison of scholarship recipients and non-
                          recipients
                     5.   Secondary data sets were used to increase the number of indicators (MIS data)
                          and to analyze learning scores, household socio-economic characteristics, child
                          labor and inter-household issues
                     6.   Triangulation among indicators
                     7.   When programs covered the whole country: natural restrictions or differences
                          in geographical distribution (for example of private schools) used to create
                          comparator group
                     8.   Average school productivity in each commune (district) compared for private
                          and public schools and average productivity estimated for all schools
Conditional cash     1.   Randomized selection of beneficiary communities (RCT) for each phase of
transfers    and          project
poverty reduction    2.   Pre-test/post-test comparison group design using propensity score matching
programs                  and with measurement after one and four years
                     3.   Comparison group divided into those who had starting receiving cash transfers
                          before the baseline and those who had not
                     4.   A propensity-score matching (PSM) design was used with households eligible
                          to be selected for Phase 2 being used as the control group for Phase 1
                     5.   Formal surveys combined with structured and semi-structured interviews,
                          focus groups and workshops
Health               1.   Randomization of treatments
                     2.   Randomization, using phased-in project implementation
                     3.   Double difference with matching
                     4.   Integrated into the government's own evaluation and was done in collaboration
                          with government staff
Sustainable          1.   Randomized control trial
development          2.   Double difference with propensity score and/or judgmental matching
                          techniques
                     3.   First evaluation: ex-post matching of beneficiaries and non-beneficiaries
                     4.   Second evaluation: double difference: theoretically robust but high attrition
                          rates left low statistical significance in the results
                     5.   Beneficiaries and non-beneficiaries compared using retrospective data
                     6.   Controls for local conditions, events over time, etc
                     7.   Pre-program baseline data compared with follow-up rounds in three different
                          years
    Note: This table summarizes the range of designs used by the evaluations in each sector. The
    following chapters provide more details on the specific design used for each of the evaluations.
 Table 3: Examples of the influence and use of the evaluations
                                    Use                                        Examples
Created    demand    for Created demand for further evaluations                 Cambodia education
further       and more                                                          Follow-up micro-finance project - Morocco
rigorous evaluations      Promoted controversy in the academic field and        Education - Chile
                          encouraged further research
                          Created demand for methodologies to evaluate pilot    Follow-up urban project, CCT-Colombia
                          projects
                          Generated follow-up studies                             Assessing service delivery: Food security - Ethiopia
                                                                                  Health insurance - China
                        Increased appreciation of the need for independent,       Ethiopia
                        external evaluation                                       Mexico
                        Helped introduce impact evaluation to particular          Rural roads - Vietnam
                        sectors
Strengthened quality of Strengthened MIS and data quality                       Demonstrated to local districts the importance of good
impact evaluations                                                              data -Uganda education
                          Strengthened MIS and data quality                     Demonstrated to local districts the importance of good
                                                                                data - Uganda education
                          Encouraged more rigorous evaluation as a standard     Cambodia education
                          component of new programs
                          Lead to evaluation capacity building                  Health insurance - China
                                                                                Statistics agency and government - Ethiopia
                          Raised the standards for evaluation                   Methods and questionnaires used in             other     road
                                                                                evaluations - Rural Roads, Vietnam
                          Institutionalized impact evaluation systems           Social sector evaluation systems introduced (Mexico)
                                                                                Created a culture of evaluation (Ethiopia)
                          Enhanced the role and rigor of impact evaluation      CCT-Mexico and Colombia
                          internationally
Contributed to program Identifies which components are /are not effective        Raised interest in incorporating scholarship programs in
design             and and improves program operation                            government projects- Education, Cambodia
implementation                                                                   Showed investment in program management more cost-
                                                                                 effective than building classrooms or hiring more
                                                                                 teachers ­ Education, Uganda
                                                                                 CCT-Mexico
                         Improved design of future projects                      Follow-up urban project: CCT-Colombia
                                                                                 Smaller grants for primary school: CCT-Colombia
                                                                                 CCT-Mexico
                         Convinced agencies to design and test pilot projects    Follow-up urban project: CCT-Colombia
                         before going to scale                                   Self-employment program - Argentina
                         Broadened program and policy options                    New labor market intervention options - Argentina
                        Identified administrative and logistical problems        Food security - Ethiopia
                        that had been overlooked
Provided evidence    to Provided evidence to respond to program critics          Education -Uganda
support programs        Provided evidence to support programs and justify        CCT-Mexico and Colombia
                        continuation under new government                        Emergency Program-Argentina
                        Evaluations used to justify new programs even            Government used findings to justify expansion to urban
                        when findings did not support this                       areas: CCT-Colombia
                                                                                 New self-employment program - Argentina
                         Provided evidence to continue components agencies       Community day care centers: CCT-Colombia
                         had planned to cut
                         Contributed to replication in other countries.          CCT-Mexico and Colombia
                         Helped agencies decide between alternative              Free distribution of mosquito nets - Kenya/ Somalia
                         strategies
                         Raised the visibility of programs                       Deworming now commonly discussed among
                                                                                 international agencies such as WHO and World Bank
Provided evidence to                                                             Extension of CCT to urban areas - Colombia
challenge programs                                                               Self-employment programs - Argentina
Involved wider group of Provided basis for engagement with policymakers          Education - Uganda
stakeholders            and implementers




                                                                                                                                       25
Table 4: Factors affecting evaluation utilization and influence
A. Factors facilitating evaluation utilization and influence                               Examples
Timeliness                      a. The evaluation must be commissioned and the findings    a. There was a demand for information on the
                                   produced when there is current interest in the issues   questions being addressed (Madagascar)
                                   being studied                                           b. An interim evaluation report allowed for mid-
                                b. At least preliminary findings should be available in    program changes (Ethiopia)
                                   time to make adjustments to program implementation
Focus on the clients priority a. The evaluation incorporated local contextual data         a.      The focus on local contextual issues
issues                          b. Contributed to current policy debate                    demonstrated the practical utility of the findings at
                                c. Impact reduced when the evaluation does not focus on    the district level (Uganda)
                                priority concerns of stakeholders                          b. Many national and international agencies were
                                                                                           already debating the merits of cost-recovery versus
                                                                                           free distribution of bednets (Kenya)
                                                                                           c. The evaluation focused on economic issues
                                                                                           rather than the social and behavioral factors of
                                                                                           concern to government (Madagascar)
Effective communication and a. Rapid and wide-spread dissemination of findings             a. Data was available on Internet and was widely
dissemination strategies    b. Clear and well communicated messages                        used in academic publications (Mexico)
                            c. "No surprises". Ongoing communications and periodic         b. The evaluation was rigorous but the findings
                               one-on-one meetings to keep stakeholders informed of        were communicated in a very technical way that
                               the progress and initial findings of the evaluation         was difficult for non-specialists to understand
                                                                                           (Madagascar)
                                                                                           c. Due to frequent interactions between evaluators
                                                                                           and stakeholders the latter became more
                                                                                           comfortable with the evaluation process (Ethiopia)
Active     engagement    with a.   National agencies involved in design and                a(i). Evaluators revised evaluation design in
national counterparts             implementation of the evaluation                         response to concerns about RCT (Cambodia)
                               b. Provided mechanism for greater stakeholder               a(ii). The evaluation was commissioned by
                                  involvement                                              policymakers not donors (Mexico)
                               c. Reducing costs through coordination with ongoing         a(iii). Ministry of Labor and Bureau of Statistics
                                  national surveys                                         actively involved (Argentina)




                                                                                                                                             26
                                   d. Evaluation integrated into      ongoing      government a(iv). Defined as "A true partnership from the
                                      evaluation/research program                             beginning" (Madagascar)
                                                                                              a(v). In-country team facilitated communication of
                                                                                              evaluation progress and findings (Ethiopia)
                                                                                              a(vi). Close cooperation with the Statistics Bureau
                                                                                              in preparing the survey instrument (Ethiopia)
                                                                                              b. Uganda
                                                                                              c. Piggy-backing the evaluation with a Ministry of
                                                                                              Labor survey (Argentina)
                                                                                              d. Evaluation also conducted in collaboration with
                                                                                              national agency staff (China).          Recognized
                                                                                              government concern about data security and
                                                                                              analysis done on government computers.
Ability to demonstrate the         a. Findings were of practical utility to managers and a. Ministry had for the first time specific evidence
value of evaluation as a           policymakers                                               to respond to critics (Uganda education)
political and policy tool          b. Cost-effectiveness analysis proved a useful tool        b. Uganda
                                   c. Findings demonstrated the programs were reaching the c. Colombia
                                   low-income target populations was important to
                                   policymakers
Demonstrated the value of          a.    The practical utility of the evaluation findings a. Local districts saw for the first time the practical
good quality data                  demonstrated the value of good quality data                utility of good evaluation data (Uganda)
The methodological quality of      a. Rigorous methodology "set the bar" for other countries a. The rigor of the Progresa evaluations and the
the evaluation and the                who felt the need to replicate these standards          surrounding publicity convinced Colombia of the
credibility of the international   b. Credibility and independence of the international need to use equally rigorous evaluations
evaluators                            evaluators                                              b. Mexico
                                   c. The use of innovative evaluation methodologies c. This was the first time a RCT had been
                                      creates interest                                        conducted on micro-finance (Madagascar)
Demand for more rigorous           a. Policymakers and line ministries aware of the need for a. Large number of road projects had not been
evaluation methodologies               more rigorous evaluation methods                       evaluated (Vietnam)
Positive and non-threatening       a.     Initial evaluations produced positive findings, a(i). Cambodia education
findings                           encouraging agencies to support further evaluation         a(ii). Uganda
                                                                                              a(iii). Colombia
                                                                                              a(iv). Mexico




                                                                                                                                              27
Evaluation            capacity                                                               Ca
                                   a. Sequential evaluations permitted national agencies to a. Cambodia education
development                        develop capacity over time
Unexpected            findings     a. Controversial findings stimulated interest in further a. Findings challenged conventional wisdom by
stimulated interest in further     research                                                         showing there had been no improvement in school
research                                                                                            performance (Chile)
Demonstrated transparency of       a.     The ability to demonstrate transparency and a(i). Mexico
the evaluations                    professional rigor was important in countries where earlier a(ii). Argentina
                                   programs had been criticized for corruption and a(iii). Colombia
                                   politicization
Donor pressures                    a. Donors need rigorous data to justify continuation or a. Argentina
                                   expansion of program                                             b. In Colombia, donors pressured government to
                                   b. Donor pressure to ensure collection of baseline data to delay start of program in some areas to permit
                                   increase credibility of findings                                 collection of baseline data
B. Challenges to evaluation utilization and influence
Data collection often does not happen until the program has a. This affects the quality of the evaluation
been operating for some time
Multiple donors                                                     a. Affects communication and coordination
                                                                    b. May be difficult to reach consensus on evaluation design
Variations in technical expertise of stakeholders                   a. Difficult to present findings at the right technical level
Tensions between donors and government                              a. Can affect willingness to support evaluation
                                                                    b. Difficult to reach consensus on evaluation designs
Long time before results are available                              a. Outcomes and impacts cannot be measured for a long time. This reduces
                                                                    interest of many stakeholders
                                                                    b. Low technical capacity of patterns may slow process of data collection,
                                                                    analysis and dissemination
Project staff turnover                                              a. People who are interested leave and replacements may not be as interested
Funds for evaluation may be reduced                                 a. Originally approved evaluation funds may be reduced as project develops
Not everyone wants accountability                                   a. Evaluation may be seen as a threat




                                                                                                                                                28
   2. Education
   A. Introduction

The education workshop discussed the evaluation of projects in Cambodia, Uganda, and
Chile. The objectives of the education programs in Cambodia and Uganda were to
increase school enrolment and retention for low-income students, particularly girls; and
in the case of Uganda to also improve education quality. The program in Chile, which
already had very high enrolment rates, was intended to improve quality for low-income
students through increased access to private education. In addition, all of the programs
sought to enhance the efficiency of program management. An overview of the education
projects, the key evaluation questions, the main evaluations findings, the evaluation
designs; how the evaluations were utilized, their influence on project implementation and
policy, and the factors affecting utilization are presented in Chapter 1 (Tables 1 ­ 4).
This chapter provides more detail on each of the education evaluations.

   B. Getting girls into school: Evidence from a scholarship program in
      Cambodia

The program

The program being evaluated was the Japan Fund for Poverty Reduction (JFPR)
scholarship program in Cambodia. The program, which began in 2004, awarded
scholarships to poor girls who were completing 6th grade, and who wished to enter
secondary school. The program tested the efficacy of scholarships as a way to increase
secondary school enrolment among girls from low-income families and to encourage
them to complete the full three years of lower-secondary school. The rationale for the
program is the large literature documenting associations between female education and a
variety of social outcomes (e.g., health, nutrition, fertility, and child mortality).

The program covered 15% of all secondary schools and in each a maximum of 45 girls
were awarded scholarships. The $45 scholarship was quite large compared to the mean
per capita GDP of $300. The "scholarship" program was in fact a conditional cash
transfer provided to the family on the condition that the girl is enrolled in school,
maintains a passing grade and maintains a high attendance rate.

A follow-up World Bank scholarship program was also discussed in the workshop. This
had similar objectives to JFPR, but was able to use a more sophisticated targeting system
as students were also scored on the probability of drop-out.

The evaluation (see Table 5 at the end of the chapter)

The purpose of the evaluation was to test the effectiveness of a scholarship/conditional
cash transfer program in increasing the transition of girls from 6th grade primary school to
the first year of lower-secondary school. As the evaluation was not commissioned until
late in the project, a retrospective (ex-post) evaluation design was used. Two sources of
data were used: application forms for the scholarship program (information on parental
education, household composition, ownership of assets, housing materials, and distance
to the nearest secondary school) and data on school enrolment and attendance collected
during unannounced school visits. The analysis compared scholarship recipients (the
"treated" group) and non-recipients (the "comparator" group) using regression models.

The evaluation of the follow-up World Bank project used a Regression Discontinuity
Design. Girls just above the cut-off line for $60 scholarship eligibility were compared
with girls just below the line. The evaluation had access to a richer database and was
also able to look at learning, intrahousehold issues and child labor.

The evaluation findings

Scholarship recipients had significantly lower socio-economic status than non-recipients,
confirming that the program had been successful in targeting poorer girls. After
controlling for household characteristics, it was found that girls receiving scholarships
had an almost 30 per cent higher attendance and enrolment rate than non-recipients, and
that the effects of the program were greatest for the most disadvantaged girls ­ poorer,
lower parental education and living further from school. These program effects compare
favorably with similar programs in other countries. For example, the highly regarded
PROGRESA program in Mexico was only estimated to have increased the transition from
6th grade to 7th grade (the first year of secondary school) by 11.1 percent.

The preliminary findings from the follow-up World Bank evaluation also showed that the
scholarships affected attendance but did not improve learning.

Evaluation utilization and influence (see Table 6 at the end of the chapter)

The retrospective evaluation of the JFPR, even though it was "messy" because of the
limited access to baseline data, did "create an appetite that engendered a demand for the
kind of more rigorous evaluation" that was implemented for the follow-up project. The
cumulative effect of these two had two immediate effects: Government is planning to
incorporate some of the evaluation
                                           "At the beginning there was no appetite for
design features in their own evaluation. There was no demand for it. There was
scholarship      program,     and      to no appreciation of it. There was no capacity for it.
incorporate a rigorous evaluation into And while we have overcome some of these barriers, I
a large fast-track catalytic fund grant. still think there is a limited capacity to understand
Several factors increased utilization of and use evaluation directly."
                                           Deon Filmer. Development Research Group. The
the first evaluations and stimulated World Bank
interest in more rigorous future
evaluations. First, even though the original design of the JFPR did not include an impact
evaluation, the methodologically "messy" retrospective evaluation was able to produce
useful findings in a short period of time. It identified operational issues to address in the
subsequent projects and created an appetite for more rigorous evaluations. Second, the



                                                                                           30
fact that the first evaluation showed the project had some positive results created interest
among national stakeholders in the use of evaluation as management tool. If the
evaluation had not found any positive results it might have been more difficult to
convince stakeholders to support future evaluations.

Third, the program and evaluation teams worked closely with government to prepare a
program design that would facilitate a strong evaluation and produce findings that could
be used by policymakers. The Bank's willingness to replace the original RCT with a
rigorous but politically less sensitive quasi-experimental design built confidence and
increased then likelihood that the results would be utilized.

Several lessons were identified with respect to evaluation utilization. Developing a
demand for and a capacity to generate and use rigorous impact evaluations is a long
process that evolves over the course of several evaluations. The process will often be
opportunistic taking advantage of interest and opportunities, even though the first
evaluations may be "messy". It is also essential to work closely with national
counterparts, to be responsive to political concerns, and every opportunity must be taken
to strengthen national evaluation capacity. Finally, in cases where a clearly defined
selection cut-off point can be defined and implemented (in this case the score on a
poverty/probability of drop-out scale), the regression discontinuity design (RD) can
provide a methodologically strong design while avoiding political and ethical concerns
about RCTs. There are quite a few programs where RD designs could be considered.

   C. Impact Evaluation of Primary Education in Uganda

The program

The purpose of the evaluation was to assess the effectiveness of a number of
interventions introduced into the primary education system between 2000-2006 and
contributing to the national goal of Universal Primary Education. The interventions
included: management improvements, infrastructure, teaching materials and increased
number and quality of teachers. These interventions form part of the national "full
coverage" education services but were also tested in more depth in the Masindi District
Education Development Project.

The evaluation

The central evaluation questions were: How have school attendance and learning
achievement developed since 2000? What were the main determinants of these
developments? Which interventions have the largest and most cost-effective impact on
educational outputs? How effectively has the Management Information System been
used for purposes of evaluation?

The evaluation was conducted at two levels: nation-wide and in the Masindi District.
The evaluation was based on a program theory intervention model that identified four
sets of interventions (school management, infrastructure, teaching materials and teachers)



                                                                                         31
that would enhance school performance through improving access and learning
achievement; and in turn produce a set of welfare outcomes. The outcomes would be
affected by local contextual factors that could affect results in each district.

Given the countrywide coverage of the education programs, the many different donors
and agencies involved and the large number of contextual factors in each region, it was
difficult to define a counterfactual. So a number of different approaches were used:
combining different data bases to increase the range of variables included in the analysis,
using triangulation to obtain independent estimates of key indicators, and using natural
restrictions (e.g., remote rural areas where well educated parents do not have a choice of
selecting schools with smaller class sizes); and propensity score matching to create ex-
post comparator groups comparable with the intervention groups. In Masindi, a quasi-
experimental design was used where schools receiving the project interventions were
compared both with a comparator group from outside the district and with schools in the
district that did not participate in the project.

The evaluation findings

The main findings of the evaluation were the following:
   Uganda has made enormous progress in improving access to primary education.
   The analysis confirmed the effectiveness of investments in teachers, classrooms,
     books and other school facilities. It also confirmed that high pupil-teacher ratios
     and high pupil-classroom ratios have a negative effect on learning achievements.
   There are also significant effects from teacher education and training.
   Head teacher qualification is also important.
   Investments in teachers, classrooms and books are more effective when combined
     with improvements school and district management. Privately funded schools,
     which are generally better managed, outperform government schools by 40%.
   The quality of primary education remains poor and absenteeism and dropout pose
     serious threats to the efficiency and effectiveness of primary education.
   The in-depth evaluation of the Masindi District Project found that educational
     performance in project schools were 50-60 per cent better than the comparator
     group from surrounding districts, and 35% better than other schools in Masindi.

Impact of the evaluation

The report was disseminated in a number of ways, including presentations in stakeholder
workshops. A presentation at the National Stakeholder Conference in 2007 in Kampala to
discuss measures to promote the quality of primary resulted in a pilot project being
launched in 10 districts with a "rigorous impact evaluation strategy". At the same
workshop, a follow up evaluation was discussed. The final report was also sent to the
parliament and stakeholders in Uganda.

During the workshop, the Director of the Education Planning Department of the Ugandan
Ministry of Education and Sports identified a number of domestic effects of this
evaluation. At the local level, the evaluation created a very positive response from


                                                                                        32
district level officials, who said this was the first time they had received effective
feedback about one of their programs.

At the national level, this was the first time the Ministry of Education could respond to
Parliament providing concrete evidence of the impacts and cost-effectiveness of the
education programs, and refuting criticisms that the money would have been better spent
on other social programs. In particular, the evaluation showed that improved management
could have a greater impact on education outcomes than simply building more
classrooms and hiring more teachers. By providing an objective basis for engagement
with policy makers and implementers, the evaluation encouraged the involvement of a
wider range of stakeholders in education sector activities.

The evaluation has also improved the quality and effectiveness of the Education
Management Information System (EMIS). Demonstrating how the information can be
used in an evaluation has encouraged central agencies and local authorities to improve
the quality of the data they collect. The evaluation also demonstrated the importance of
contextual analysis to complement and go beyond the statistical data to understand the
particular characteristics of each region and how these affect educational performance.

In the Netherlands, the report was published and sent to the parliament. The results of the
report were used in the Netherlands in an extensive evaluation of (Dutch) Africa policy in
2008. One of the workshops of the conference confirmed the importance of management
in schools. Also in the Netherlands, there has been a discussion of the low level of
achievements in primary schools in Uganda. These findings coming out of the impact
evaluation have grounded this broader "quality of primary education" discussion, linking
demands to improve pupil and teacher attendance and the reduction of absenteeism to an
improvement of the management in schools.

In both countries, the evaluation has contributed to an interest on impact evaluation as a
management tool. In Uganda, the evaluation contributed to the mentioned initiative to
enhance the quality of primary education with impact evaluation as one strategy for
evidence based policy formulation and decision-making. Moreover, several officers have
followed a course on impact evaluation, and one officer is doing a PhD on the impact of
interventions in the education sector. The Ministry of Education and Sports (MoES) and
IOB have started a new (impact) evaluation. This evaluation analyses the impact of
primary education on the future of boys and girls through further education and
employment opportunities.


   D. The Effects of Generalized School Choice on Achievement and
      Stratification: Evidence from Chile's Voucher Program

The program

In 1981, Chile introduced nationwide school choice providing vouchers to any student
wishing to attend private school. More than 1,000 private schools entered the market, the



                                                                                        33
private enrollment rate increased 20 percentage points, mainly in larger, urban and
wealthier communities and a very competitive private schools market developed.

The evaluation

The evaluation examined the widely-held belief that providing vouchers and permitting
parents to transfer their children to private schools will increase the effectiveness of the
educational system. Two hypotheses are examined: First, private schools are more
effective so that allowing children to move to private schools will increase efficiency and
second, schools respond to incentives so the provision of vouchers will also encourage
public schools to become more effective to avoid losing their students.

The evaluation collected data on most of the 300 communes, each of which has an
autonomous government that manages schools and public services, has an average
population of 39,000 and an average of 27 schools of which 18 were public, 7 private
voucher schools and 2 tuition charging private schools. Three outcome measures were
used: mathematics and language test scores; repetition rates; and years of schooling
among 10-15 year olds. Students' socioeconomic status was measured using Ministry of
Education data, classifying schools based on parents education, and the national
household survey data (CASEN), that identifies the school attended by each child
covered by the survey, permitting the creation of a detailed socio-economic status school
profiles.

Two methodological challenges were addressed. First, how to separate the effects of
school productivity from the effects of sorting (the "best" students leave the public
schools and go to the private schools thus increasing average performance in private
schools even without any increased productivity). Sorting could produce gains in private
schools by depressing performance in public schools, both through skimming off the best
students and by reducing peer pressure to perform well in public schools. This problem
was partially resolved by computing average productivity effects for all schools in each
commune, and while this cannot control for peer effects, it does net out the "direct" effect
of changes in each sector's student composition.

The second challenge concerned how to define an adequate counterfactual for a nation-
wide program for which all students are eligible to apply? The evaluation took advantage
of the fact that private sector voucher schools expanded more rapidly in some markets, so
that markets with slower voucher school growth could be used to approximate the
counterfactual. This approach has limitations, including the effect of pre-existing
differences in the characteristics of different markets, differential concurrent trends, and
heterogenous treatment effects that might affect private entry and subsequent
achievement growth. Several procedures were used to partially control for these factors5.

5
 Procedures included: controls for pre-existing and concurrent trends, the identification of instrumental
variables that affect the extent of private entry but are ideally uncorrelated with trends in academic
outcomes, or with the productivity advantage of the private sector.




                                                                                                            34
Findings of the evaluation

There was no evidence that choice improved average test scores, repetition rates, and
years of schooling, but the voucher program did lead to increased sorting as the "best"
students in public schools left for the private sector schools.

The evaluation made two contributions to the school choice debate. First, it pointed out
the difficulties in determining to what extent observed improvements in school
performance in voucher schools can be attributed to increased productivity and to what
extent this is due to sorting (skimming off the best students from the public schools).
Second, it appears that, due to these complicating factors, the positive effects of school
vouchers may be less than claimed by many advocates. The authors stress their findings
are exploratory and should not be interpreted as claiming that voucher programs do not
work. They do, however, emphasize the need to understand the effects of interventions
such as vouchers on the whole of the educational system, and that negative as well as
positive consequences must be considered.

Evaluation utilization and influence

To disseminate the findings of the impact evaluation, the paper was published in the
Journal of Public Economics and presented in academic and policy conferences. In Chile
itself, newspapers discussed the evaluation and a couple of them interviewed the authors.

The main impact of the evaluation was to promote controversy in the academic literature
and to stimulate more evaluations. The academic community tended to agree with the
finding that vouchers increase stratification, but several authors challenged the finding
that there was no improvement in school performance ­ as this goes against established
theory. It is not clear, however, what impact, if any, the evaluation had in Chile. While
electoral candidates have focused on the lack of quality and problems of stratification in
the education system, it is not clear whether they were influenced by the evaluation as the
government was already publishing similar statistics on the education sector.




                                                                                        35
Table 5: The evaluation questions and the evaluations designs for each of the education evaluations
Program                                       Evaluation questions                           Evaluation design
1.    Cambodia:      Japanese Fund for Poverty       Do scholarships increase enrollment of girls           Retrospective (post-test) comparison of
Reduction: Secondary School Scholarship Fund        from low-income families in secondary school?           scholarship recipients and non-recipients.
Goals: Increase enrolment and retention of girls     Do scholarships increase retention?                    Regression analysis to control for socio-
from poor families in lower secondary schools
                                                                                                            economic differences between the two groups

2. Cambodia: World Bank Girls Secondary School      Assess program impacts on                               Regression discontinuity design comparing
Scholarship Fund [follow-up to JFPR program]         Effectiveness of providing larger scholarships         groups above and below the eligibility cut-off
 Two levels of scholarship for poorest ($60) and    to poorer girls                                         point for the maximum $60 scholarship. Richer
next poorest ($45) girls                             Retention
                                                                                                            data set also permitted analysis of learning,
Goals: Improve targeting of low-income girls         Learning
                                                     Inter-household issues                                 child labor and inter-household issues
                                                     Child labor
3. Uganda: Universal Primary Education (UPE)         Trends in attendance and learning since 2000        Using MIS and other data bases to increase
Goals: Improve attendance and quality of             Determinants of trends                             number of indicators
education       through  better     management,      Size and cost-effectiveness of            each      Triangulation among indicators
infrastructure, teacher materials and more and      intervention                                         Using natural restrictions as form of control
better trained teachers                              Use of MIS for evaluation                           Propensity score matching to create ex-post
                                                                                                        control group
4. Uganda: UPE. Pilot program in Masindi District       Effects of improved management                   Quasi-experimental design in which Masindi
Goals:    Test the effectiveness of improved            How does this enhance other interventions?          District schools receiving project interventions
management                                                                                                  were compared with schools outside the
                                                                                                            district; and with Masindi District schools not
                                                                                                            receiving the interventions

5. Chile: Vouchers for private schools               Assessing the effects of vouchers on the quality    Secondary data used to measure math and
Goals: Improve the quality of education by          of education                                        language scores, repetition rates, average years of
providing low-income students access to private      Determining whether changes are due to             schooling and socioeconomic status
education and stimulating public schools to         improved quality or to skimming off the better       Average school productivity in each commune
perform better                                      students from the public schools                    (district) compared for private and public schools
                                                                                                        and average productivity estimated for all schools
                                                                                                         Selection of comparison group difficult as
                                                                                                        program covered whole country but design took
                                                                                                        advantage of the fact that public schools grew more
                                                                                                        rapidly in some areas than in others




                                                                                                                                                         36
Table 6: The possible effects and influence of each education evaluation and the reasons why the evaluations were influential
Influence/ effects of the evaluation                                   Reasons why influential
Cambodia secondary school scholarship projects: [Summary of Japan Fund for Poverty Reduction and World Bank projects]
1. Created a demand for further evaluations                                       The first evaluation produced positive findings which encouraged central
2. Evaluation findings raised government interest to incorporate scholarship and local government to support further evaluations
component in their own project                                                    The evaluation team worked closely with government on the design and
3. Government decided to include rigorous evaluation component in their changed the proposed RCT design in response to political concerns. This
own projects                                                                     increased the government feeling of ownership of the evaluation
                                                                                  The sequential evaluations enabled government experience and capacity to
                                                                                 gradually develop over time
Uganda Universal Primary Education: [Summary of the National Program interventions and the Masindi District Pilot Project]
1. The Ministry of Education was able to respond to Parliament with concrete  The evaluation provided, for the first time, specific evidence and
evidence demonstrating the impacts and cost-effectiveness of the education arguments to respond to critics. This enhanced the Ministry's awareness of
programs. This helped defend the programs from the criticisms that the money the value of evaluation
would have been better spent on other social programs                             Cost-effectiveness analysis was seen to be a powerful tool, both for
2. Previously there had been strong pressure to build more classrooms and political and planning purposes
recruit more teachers, but the evaluation showed it is often more cost-effective  The evaluation created positive response from district officials as this was
to invest in improving management of the education programs                      the first time they have received feedback about their programs
3. The evaluation involved a wider range of stakeholders, by providing a  Provided a mechanism for greater stakeholder involvement
basis for engagement with policy makers and implementers                          Demonstrated how the EMIS could be used, encouraging agencies to pay
4. The evaluation improved the quality and effectiveness of the Education greater attention to the quality of the information to put into the MIS
Management Information System (EMIS): showing how data can be used  The national level evaluation found positive findings and made the
encouraged agencies to improve the quality of data collection                    District feel more comfortable working with the evaluation
5. The Masindi District evaluation demonstrated the importance of contextual  The study addressed local contextual factors, making the evaluation
analysis going beyond statistical data to understand how the particular approach and findings relevant and easy to understand by District officials
characteristics of each region affect educational performance
Chile: Vouchers for private schools
1. The finding that there was no improvement in school performance was
quite controversial and may have stimulated further academic research
2. Not clear whether the evaluation had any influence in Chile as the
Government was already publishing extensive data on the program, and the
issues of quality and accessibility to low-income families were already being
discussed by politicians




                                                                                                                                                            37
   3. Anti-poverty and conditional cash transfer (CCT)
      programs

   A. Introduction

The second session discussed the evaluation of three large anti-poverty and conditional
cash transfer (CCT) programs in Colombia, Mexico and Argentina. The Mexico and
Colombia programs both provided cash transfers to low-income families with children on
the condition that children enrolled in school and had regular health check-ups and
vaccinations. The Argentina Emergency Safety Net program provided cash transfers to
unemployed household heads to reduce the risk of families falling below the poverty line,
with the requirement of spending four hours per day in community work programs,
training or education. However, there was considerable flexibility concerning how strictly
the requirements were enforced by each municipality. An overview of the anti-poverty
and conditional cash-transfer projects, the key evaluation questions, the main evaluations
findings, the evaluation designs; how the evaluations were utilized, their influence on
project implementation and policy, and the factors affecting utilization are presented in
Chapter 1 (Tables 1 ­ 4). This chapter provides more detail on each of these evaluations.


   B. Evaluating a Conditional Cash Transfer Program: The Experience of
      Familias en Accion in Colombia

The program

Familias en Accion (FeA) is a conditional cash transfer (CCT) program launched in
Colombia in 2001 and funded by the World Bank and the Inter-American Development
Bank. It promoted increased access to health and education by providing monthly grants
to poor families on the condition that children were brought to the local clinic for regular
health check-ups and vaccinations and that children attended school regularly. All
payments were made to the mother on the assumption that the money was more likely to
benefit children. The program operated in municipalities with populations of less than
100,000 and required a bank branch to which funds could be transferred. Beneficiaries
were selected from the lowest stratum of the social security register (Sisben).

The evaluation (see Table 7 at the end of the chapter)

A pre-test/post-test comparison group design was used with the comparator groups
selected from municipalities ineligible to participate in the program, in most cases
because there was no bank branch to handle the funds transfer. The availability of good
secondary data permitted the use of propensity score matching to reduce sample selection
bias.
The baseline studies were conducted in 2002 with follow-ups in 2003 and 2006. A total
of 57 project and 65 control municipalities were sampled with approximately 100
interviews per municipality. Political pressures due to the upcoming elections forced
FeA to advance the program launch and families in a number of municipalities had
already received payments before the baseline study was conducted. The World Bank
and IDB were able to convince Government to delay program launch in some areas until
the baseline could be conducted. Consequently the baseline was divided into two groups:
those who had not received any payments prior to the baseline and those who had.

The evaluation findings

The first follow-up study (2003). Positive results could already be seen this early stage,
particularly in rural areas. The evaluation attracted a lot of attention and findings were
widely disseminated through a major conference in 2004 and newspaper editorials. The
results of the Second 2006 follow-up were similar to the 2003 study: there was increased
primary school enrolment (8-12 year olds) in rural but not urban areas; increased
secondary school enrolment (12-17 age group) in rural and urban areas; some
improvements in nutritional status in rural areas6, but not in urban areas7; and an impact
on diarrhea occurrence for younger rural children but not for either rural or urban areas
for children over 36 months. A major concern was the lack of effects on anemia, which
affects half of all poor children. Reservations were expressed in the report and in
conversations with policymakers concerning the extent to which findings from the small
municipalities could be extrapolated to urban areas.

Evaluation utilization and influence (see Table 8 at the end of the chapter)

The Government used the results of the evaluation to justify expansion of the program to
the urban areas despite the fact that the evaluation found the program much less effective
in urban areas. The government was strongly committed for political reasons to
expanding the program to urban areas, and wide publicity was given to the positive
results of the program in rural areas to justify the urban areas, resulting in an increase of
total beneficiaries from 400,000 to 1.5 million. Although the evaluation did not justify
the urban expansion, it did encourage redesigning of various program components. Most
importantly, the earlier competition with Hogares de Bienestar Comunitria (HBC) was
transformed into broad-based cooperation. Also small pilot interventions were
introduced to refine program implementation ­ replacing the earlier approach of starting
the program on a massive scale without time for adequate testing.




6
  There improvements in rural areas on the height per age, chronic malnutrition and weight per age but not
for global malnutrition and weight per height.
7
  When the urban population was disaggregated into two age groups, improvement was found on one
indicator for the under 36 month population (probability of global malnutrition) but for none of the over 36
months age group.


                                                                                                          39
Lessons learned

Evaluators must adapt evaluation designs to political realities when deciding what
evaluation strategies will be both technically sound and politically feasible. Evaluations
of large, politically sensitive programs should be designed at an early stage, before the
programs have developed a large constituency and become resistant to questioning of
their goals and methods. Evaluations should begin early in the program with greater use
being made of small pilot projects to assess operational procedures and viability for
expansion. Finally, the Colombian experience showed that multilateral agencies can have
an important role to play in promoting evaluation and ensuring it is technically sound.

   C. The Role of Impact Evaluation in the PROGRESA/ Oportunidades
      Program of Mexico

The program

PROGRESA, now renamed Oportunidades, is a conditional cash transfer (CCT) program
that provides cash directly to low-income families on the condition that children attend
school regularly and family members visit health centers. PROGRESA was one of the
first CCT programs in Latin America and was influential in the design of later programs
in other countries. The program had two main objectives: to produce short-run effects on
poverty through cash transfers, and to contribute to long-run poverty alleviation through
investment in human capital (i.e. education, health and nutrition). The focus is on
children because early interventions have much higher returns over the life-cycle.
Payments were made to the mother to increase the likelihood that children would benefit.

The program included a number of innovative features, several of which were considered
quite controversial at the time, and all of which were assessed by the evaluations. Some
of the measures included: (a) direct monetary transfers instead of providing vouchers or
food in-kind, or improving supply side services; (b) the programs targeted the
extremely/structurally poor rather than all families; (c) PROGRESA developed a single
national roster of beneficiaries rather than working from existing lists; (d) transfers were
given directly to households rather than communities; (e) uniform, non-discretionary
rules were introduced for the whole country; and (f) there was a requirement of family
co-responsibility and certification.

The evaluation

The program began one year before the 1999 Presidential elections and there was
pressure from the ruling party (PRI) to ensure that the findings of the evaluation would be
available prior to the election. When Vicente Fox was elected in 2000, the new
administration continued to support a rigorous, independent evaluation to provide
objective evidence that their programs were more effective and transparent than those of
the PRI regime that had been in power for the previous 80 years. The rigorous and
expensive evaluation systems were justified on three grounds: (a) Economic: to improve
the design and effectiveness of the programs and to compare impacts and cost-



                                                                                         40
effectiveness of different programs; (b) Social: Increasing transparency and
accountability, and (c) Political: the evaluations increased the credibility of the programs,
and this, combined with increased transparency and accountability, helped break with
past practices, such as political influence in beneficiary selection.

The evaluation design: The program was implemented in phases, and for each phase
beneficiary communities were selected randomly, with non-selected communities
providing a non-biased comparator group. Randomization was politically acceptable
because communities not selected in one phase were likely to be included in the next
phase. Also the government was strongly committed to the use of rigorous, state-of-the-
art evaluation design to ensure credibility of the findings. 24,077 households were
interviewed in 320 treatment and 186 control communities. Families were interviewed at
the start of the program and at several points during implementation, avoiding problems
of linear extrapolation when only one post-test measurement is made.

Main findings of the evaluation

The following are some of the findings highlighted in a 2002 IFPRI report on the first
post-test evaluation8:
    Poverty targeting worked well.
    PROGRESA reduced by 10% people living below the poverty line.
    Positive impact on school enrolment for boys and girls.
    Children entered school earlier, with less grade repetition and better grade
        progression.
    Younger children have become more robust against illness.
    Women's role in household decision-making increases.
    The program had an estimated cost-benefit ratio of 27%.

Evaluation utilization and influence

According to the presentation, the evaluation had the following kinds of influence:
    Continuation of the program under a new administration. The independence,
      credibility and positive outcomes of the early stages of the evaluation
      significantly contributed to the program's continuation under the new
      administration.
    Improved operational performance. The early operations reports identified
      implementation issues, such as delivery of food supplements and intra-household
      conflicts, and issues with targeting rules that were addressed as the program
      evolved.
    Contributed to program expansion to urban areas. A youth job creation program
      (Jovenes con Oportunidades) created income generating opportunities for poor
      households through preferential access to microcredit, housing improvements,
      adult education and social/health insurance.


8
    IFPRI (2002) PROGRESA: Breaking the Cycle of Poverty.


                                                                                          41
       Contributed to the development of a more systematic policy evaluation approach
       in Mexico. This move was formalized by the creation of CONEVAL (Council for
       Program Evaluation) in 2006.
       Enhanced policy evaluation internationally. The evaluation findings were
       available on the internet and were used widely by academic researchers. The
       design and findings were able to withstand critical scrutiny, greatly enhancing the
       credibility and influence of the findings.
       Contributed to the initiation of CCT programs in many other countries. Similar
       programs, most of which have been influenced to some extent by PROGRESA,
       have been started in at least 10 other countries.


   D. Assessing Social Protection to the Poor: Evidence from Argentina

The program

During 2001-2002, Argentina suffered one of its worst macroeconomic crises in recent
history, and in January 2002 the government launched an Emergency Safety Net Program
(the "Jefes" program), co-financed by the World Bank, which by the end of 2002 reached
2 million beneficiaries. To ensure the program was only attractive to the poor,
beneficiaries had to spend 4 hours per day on community work or education programs.
The program was targeted for unemployed heads of household with children under 18,
who received a cash transfer of 150 pesos (approximately US$ 50) per month. The
program was decentralized with the details of eligibility and work/training requirements
decided at the local level, causing accusations of political manipulation, and making it
more difficult to introduce standardized, implementation procedures.

The evaluation

This was a large and high priority program that was being rapidly scaled-up. In addition
to the need to learn about the effectiveness of the program, a rigorous and transparent
evaluation was also required to address accusations of abuses and implementation
problems. The World Bank also required empirical evidence to justify its financing. The
cost of the evaluation was significantly reduced by piggy-backing the evaluation on an
existing labor force survey. The policy questions addressed by the evaluation included:
How effective was the Jefes program as a rapid and targeted poverty alleviation program
and a safety net? Did the program reach the intended groups and how did they respond?
Did it mitigate income loss due to the crisis and stop families falling into poverty or
extreme poverty? How did it affect aggregate poverty and unemployment rates?

Evaluation methodology: The Ministry of Labor and the Statistical Institute agreed to add
questions about program participation to their panel sample. The central research
question was to estimate the net impact of the 150 peso monthly transfer on beneficiary
household income. Net effect was expected to be less than the gross transfer due to the
opportunity cost of foregone earnings. The control group was defined as household
heads who had applied for the program but had not yet been accepted. As the project was


                                                                                       42
implemented in phases it was possible to use a "pipeline" evaluation design, with the
control group for phase 1 comprising households selected for phase 2. This reduced
selection bias because the control group wished to participate, so their motivation is
similar to that of the phase 1 participants.

The evaluation findings

Evaluation findings on program performance: The evaluation found the eligibility
criteria were poorly enforced, due in part to the practical difficulties of defining
employment status in a country with a large informal sector. However, the targeting
procedure worked quite well in practice as the eligibility criteria were correlated with
structural poverty and 70 percent of beneficiaries had household per capita income in the
lowest two deciles. The survey data was compared with program administrative data to
check on allegations of fraud and ghost participants, as well as the practical difficulties of
defining and implementing the targeting criteria. While some of the claims of abuse were
corroborated, no evidence was found to substantiate many of the accusations.

Evaluation findings with respect to program impacts: About 10% of participants would
have fallen below the food poverty line in the absence of the program. Many,
particularly male participants, had to forego other income to participate and net income
gains were equivalent to between one half and two thirds of the cash transfer. The effect
of the program on aggregate poverty rates was quite small and the impact on extreme
poverty was only marginal. When the economy began to bounce back in 2003, this
significantly increased the opportunity cost of continued participation in the program, and
the net gains to those remaining in the program dropped from two thirds to one half of the
cash transfer. Half of those exiting the program found employment while about one third
(mainly women) returned to their previous economic inactivity.

Evaluation utilization and influence

The rigorous evaluation was made possible due to a combination of factors. First, it
provided rapid information on the implementation effectiveness of this high priority
program which increased the support of the Ministry of Labor and the Statistics Bureau.
The relatively low cost of the evaluation also made the decision to commission the
evaluation easier. Given the controversial start of the program with the allegations of
corruption and poor administration, there was also strong pressure from the World Bank
to conduct an evaluation to justify continued funding. These factors, combined with the
rapid, though limited, dissemination to local counterparts meant the evaluation was able
to influence government policy in a number of areas: it helped justify continued financing
for the Jefes program; identified future policy options, including new supply and
demand-side labor market options; pressured the Ministry of Social Development to
incorporate more rigorous evaluation and encouraged government to do this for other
programs.




                                                                                           43
Lessons learned

This experience showed that a well designed evaluation can give credibility to a program
and can provide useful and rapid operational feedback and policy guidance. This can be
particularly useful when emergency programs must respond rapidly to challenging
circumstances or when allegations of inefficiency or corruption must be investigated.
Close cooperation with national agencies is critical for creating ownership, acceptance
and utilization of the findings, for improving the technical quality of the evaluation and
for reducing costs through piggy-backing on an existing evaluation.




                                                                                       44
Table 7: The evaluation questions and the evaluation designs for each of the anti-poverty and conditional cash transfer [CCT] programs
Program                                       Evaluation questions                           Evaluation design
1.    Colombia: Familias en Accion.         CCT        Are CCTs cost-effective in increasing access of           Pre-test/post-test comparison group design
promoting children's health and primary and           poor children to health and education?                    using propensity score matching.
secondary school enrolment                             Effectiveness of targeting mechanisms in                  Comparison group divided into those who had
Goals: Short-term poverty reduction through cash      reaching low-income populations                           starting receiving cash transfers before the baseline
transfers. Long-term investment in human capital       Feasibility of large-scale replicability of              and those who had not
development through increasing access to health       programs                                                   post-test measurements after one year and four
and education                                          Urban replicability of programs successfully                  years
                                                      implemented in rural areas
2. Mexico: PROGRESA/ Oportunidades. CCTs               Are CCTs cost-effective in increasing access of           Randomized selection of communities for each
promoting children's health, nutrition and            poor children to health and education?                    phase of project
education.                                            Effectiveness of key program components:                   Pipeline design using families not selected for
Goals: As for Colombia                                 Direct monetary transfers versus in-kind grants          a given phase as the control for that phase
                                                       Targeting the extremely poor versus all                   24,077 households interviewed in 320
                                                      families                                                  treatment and 186 control communities
                                                       New, standard targeting procedures versus                 Formal surveys were combined with structured
                                                      existing program client lists                                  and semi-structured interviews, focus groups
                                                       Transfers to households versus communities                    and workshops
                                                       Non-discretionary rules for whole country
                                                      versus administrative flexibility for local authorities
                                                       Directing benefits to women versus to
                                                      household head
                                                       Criteria for defining size of transfer

3. Argentina: Jefes de Familia, Emergency              Effectiveness of cash transfers as an                        A pipeline treatment/control group design was
Safety Net Program.          Cash transfer for        emergency measure to aid poor families                        used with households selected for Phase 2
unemployed household heads with dependent              Are programs cost-effective, efficiently                     being used as the control group for Phase 1.
children                                              managed and relatively free of corruption?
Goals:      Short-term goal using monthly cash         Effectiveness of targeting procedures. Did
transfers to stop families falling into poverty.      they reach the intended groups?
Longer-term goal of giving skills to facilitate re-    How did households respond to the program in
entry into the labor market                           terms of labor force participation, labor supply and
                                                      household division of labor?
                                                       Impact on household income
                                                       Impact on aggregate rates of poverty
Table 8: The possible effects and influence of each CCT evaluation and the reasons why they were influential
Influence/ effects of the evaluation                                            Reasons why influential
Familias en Acción (FeA): Colombia
1. Evaluation influential in convincing the new Government to continue the       The widespread publicity given to the findings of the PROGRESA
program that had been started by its predecessor                                evaluations convinced Colombian policymakers of the need to introduce an
2. Findings used to justify expansion to the urban areas [even though the       equally rigorous evaluation of FeA
evaluation findings had shown little impact in urban areas]                      Findings from Phase 1 were widely disseminated and, because of
3. Findings used to make adjustments to the design and implementation of        credibility of the international evaluators, widely accepted
the urban program                                                                Findings were largely positive, making them politically more acceptable
4. Convinced planners to give smaller grants for attending primary school        The findings showed FeA was effective in providing service access for the
5. Community day care centers (HCB) integrated into the program rather          low-income population; and that it was possible to develop transparent and
than being eliminated as previously planned                                     independent systems for ensuring accountability (a Presidential priority)
6. Small scale pilot programs were incorporated to test implementation           Pressures from donors concerned that the evaluation findings did not
strategies before going to full scale                                           justify the rapid urban expansion, encouraged government to incorporate a
                                                                                rigorous evaluation into the expanded urban program
PROGRESA/ Oportunidades CCT: Mexico
1. Influenced the continuation of the program under a new Administration         The evaluation was commissioned by Mexican policy-makers, not donors
2. Improved operational performance                                              IFPRI's reputation and independence ensured credibility of the evaluation
3. Improved program design                                                       Data was available on Internet, and was used in academic publications,
4. Contributed to the introduction of more systematic social program            increasing international familiarity with methodology and findings
evaluation in Mexico                                                             Findings were rapidly and widely disseminated inside and outside Mexico
5. Enhanced the role and rigor of policy evaluation internationally              The findings were strongly positive ­ making it easier for them to be used
6. Contributed to the promotion of CCT programs in many other countries          The evaluation showed the new administration how to ensure transparency
                                                                                and show a break with past politicization of major programs
Emergency Safety Net Program ("Jefes"): Argentina
1. Helped justify continued financing for the Jefes program                      Pressure from donors for rigorous evaluation to justify continued funding
2. Provided feedback on future program and policy design and broadened the           Accusations about corruption and poor implementation created pressure
range of policy options on supply and demand-side labor market interventions    to include an independent and rigorous evaluation component. The rapid
3. The evaluation was used by the Ministry of Social Development to justify     feedback on these concerns was considered valuable by Government
a new self-employment program (even though the evaluation findings did not       The active involvement the Ministry of Labor and the Statistics Institute
support this)                                                                   strengthened understanding of the local context and how the program operated
4. Encouraged the government to build-in an evaluation component to the          Piggy-backing the evaluation on a Ministry of Labor survey greatly
new self-employment program and to use of evaluation at an early stage of the   reduced cost and time requirements and strengthened local ownership
new program to assess viability of scaling-up                                    Evaluations were found useful by donors and government both to justify
                                                                                programs they supported and to criticize those they did not




                                                                                                                                                         46
   4. Health
The third session discussed the evaluation of three health interventions: insecticide-
treated nets and deworming, both in Kenya, and a health insurance scheme in China. The
Kenyan projects were both randomized, with the nets experiment distributing insecticide-
treated nets to pregnant women at prenatal clinics for free or at subsidized prices, and the
deworming intervention offering treatment to children at schools for free or with cost-
sharing. The Chinese health insurance scheme was meant to reduce out-of-pocket
expenditures for health care. An overview of the health projects, the key evaluation
questions, the main evaluations findings, the evaluation designs; how the evaluations
were utilized, their influence on project implementation and policy, and the factors
affecting utilization are presented in Chapter 1 (Tables 1 ­ 4). This chapter provides
more detail on each of these evaluations.


   A. Evaluation of insecticide-treated nets in Kenya

The program and evaluation

This study explored the relative benefits of free distribution and cost recovery practices
for maximizing coverage and usage of health products ­ specifically anti-malarial
insecticide-treated nets. In particular, there was interest in the competing effects of higher
prices reducing willingness or ability to pay and higher prices increasing the perceived
value of the product, potentially reducing resource wastage. The experiment randomized
the price of nets offered in 20 prenatal clinics from zero to 40 shillings ($.60), subsidizing
the price by 100 to 90 percent, and then compared the uptake and usage of the nets at
various prices. The primary evaluation interests were the effects of the various prices on
demand for acquiring the nets and on usage a few months after acquiring a net. Uptake
and usage rates were multiplied together to measure "effective coverage". The evaluation
did not explore any potential loss of quality or service that might occur in association
with eliminating cost-sharing outside of the experimental framework.

The evaluation findings

The evaluation found that demand drops very quickly as prices increase, and the highest
price offered in the experiment is still lower than the common cost-sharing price. On the
other hand, usage didn't vary much across price paid. Combining uptake and usage, free
distribution led to a 63 percent coverage rate compared to 14 percent for the highest price
group. Women who paid for the nets were not found to have worse health at the time of
the prenatal visit, suggesting that ability to pay may be more of a limiting factor than
need or willingness to pay and thus that full subsidies may be most effective in
maximizing the effective coverage rate. However, it was noted that in Kenya, where the
experiment was carried out, extensive efforts have resulted in most of the population
being familiar with the benefits of nets.
Evaluation utilization and influence

At the time of the presentation, dissemination was still in the early phases. The results
were first presented to the Ministry of Health. The news was well received, as they
already preferred free distribution, but they noted that they would have to work to find
funding and to convince donors and NGOs in the area.

To some degree, the evaluation has disseminated itself because of high existing demand
for the evidence. An example was DFID contacting the authors before the paper was
finished, to help decide whether to give or sell nets in Somalia, so it had immediate
impact on other projects, as well as on the organization's official views.

There seems to have been a mixed response in private foundations. In particular, there
were rumors of methodological critiques from a major net-distributing organization.
From the local branch of the same organization, however, feedback was received to say
that they were really pleased with the evaluation and its results. They were changing their
model to dispense nets for free, and the evaluation would help them defend their choice.
Since then, the local branch has helped disseminate the study.

This presentation noted some of the broader influence of impact evaluations may be
harder to track, such as when evaluations contribute to a larger body of evidence. An
individual evaluation may not be entirely conclusive, but conclusions drawn from an
accumulation of evidence may be more difficult to refute.

Lessons learned

This experience showed that when an evaluation addresses an existing demand for
evidence, the results may partially disseminate themselves, as interested audiences seek
out the information.

Also, it seems that people tend to trust or distrust evidence based on what they already
believe, looking for results that confirm what they believe and looking for ways to
discredit contrary information. Perhaps one reason is that it is difficult to distinguish
between good and bad evidence. Currently, there is much ongoing work to provide
training in measurement and evaluation for donors and policymakers: when individuals
have a greater understanding of impact evaluation, they may be better able to recognize
differing qualities of evidence they come into contact with, allowing individual
evaluations to have greater impact.

Similarly, people may not trust evidence (especially evidence contrary to their beliefs)
that comes from methods they do not understand, so training in or exposure to impact
evaluation as well as the use of easy-to-understand methods may make evaluation results
more convincing.

In the end, an individual evaluation may not be entirely conclusive, but conclusions
drawn from an accumulation of evidence may be more difficult to refute.



                                                                                        48
   B. Kenyan Deworming Experiment

The program

Because intestinal worms, which can create health problems such as anemia, are
expensive to diagnose but inexpensive to treat, the WHO recommends mass treatment in
schools where there is high worm prevalence. Implementation often has been difficult,
however, because of overlaps between ministries of health and education, as well as some
question about the prioritization of worms relative to other health interventions,
especially in the absence of evidence on educational benefits. Also, since treatment
should be readministered every six months because of reinfection, worm treatment has
not always been appealing from a sustainability point of view.

Between 1998 and 2001, the Dutch NGO Internationaal Christelijk Steunfonds Africa
(ICS) and the Busia District Ministry of Health implemented the Primary School
Deworming Project in 75 rural schools in Western Kenya. The intervention included
deworming treatment and some worm-prevention health messages.

The evaluation

Taking advantage of the fact that treatment was rolled out in three phases to
accommodate financial and administrative constraints, assignment of school to each of
the phases was randomized for evaluation purposes. The final phase introduced cost-
sharing to compare uptake between cost-sharing and free distribution. The data came
from student and school questionnaires fielded in early 1998, 1999, and 2001 and a
parent questionnaire added in 2001. The evaluation compared the groups that had been
treated to those who would be treated in later rounds, with the last round adding the cost-
sharing element to part of the early treatment groups.

The evaluation findings

The results showed that treatment increased school participation by an average of 15
percent of the school year, between reduced dropouts and higher attendance rates.
Benefits accrued both to treated children, as well as to children nearby, because of
reduced worm loads in the area. Indicators related to the health messages such as clean
hands, wearing shoes, and not swimming in the lake showed no impact. Cost-sharing
decreased program uptake, from 70 to 15 percent. Overall, deworming was found to be a
cost-effective way of achieving schooling attendance, though no impact was found for
test scores. The evaluation did not speculate on any trade-offs between eliminating cost-
sharing and maintaining service quality with reduced available funds.




                                                                                        49
Evaluation utilization and influence

The findings were first disseminated to ICS and local school officials and the Ministry of
Education. As a result, ICS expanded the deworming program from 75 schools to an
entire province instead of moving to other types of programs. The Ministry of Education
incorporated deworming into the national education plan and dropped the cost-sharing
component.

The evaluation has been actively disseminated outside of Kenya as well, with joint efforts
among the researchers, the Poverty Action Lab, the World Bank, and others. The
evaluation has appeared in academic and policy publications and has generated
significant interest in academic and development circles. At the same time, it has reached
mass media, with mention by the US president and in the NY Times, for example.

As a result, a low-profile health challenge has received more recognition as both a health
intervention and an education intervention. At the same time, only a fraction of those who
need deworming get it, and it is still not high profile compared to other health concerns
such as malaria or HIV/AIDS, and there remain bureaucratic challenges and the simple
fact that treatment must be repeated.

Lessons learned

Achieving such widespread coverage may partially be the product of a fortunate
combination of factors ­ high-quality evaluation, high-profile authors, and surprising
and compelling findings (deworming increases school attendance and even has spillovers
to children who aren't treated) ­ but considerable and cooperative advocacy efforts play
a key role as well.

Perhaps it is not surprising that, in this case, the stakeholders were willing to use the
findings, given their previous willingness to randomize for evaluation purposes. That is,
working with cooperative stakeholders may increase the likelihood that the evaluation
will be influential.

Despite the combination of favorable factors and cooperative stakeholders, deworming is
unlikely to become a global health priority. It seems that certain kinds of problems or
interventions may have some limit to what their maximum impact can be, depending on
the nature of the issues or interventions they pertain to.

   C. China: Voluntary Health Insurance Scheme

The program

After the collapse of China's Cooperative Medical Scheme in the 1980s, health facilities
were allowed to charge government-determined prices for certain high-tech services in
order to cross-subsidize more basic care. There was evidence, however, that these
changes led to over-application of high-tech services and high out-of-pocket payments,



                                                                                       50
reducing use of needed services. In 2003, therefore, the government of China
implemented the New Cooperative Medical Scheme ­ a voluntary (but heavily
"promoted") rural primary health insurance scheme ­ to reduce out-of-pocket payments
and encourage use of care. While the scheme is heavily subsidized, the coverage is fairly
small relative to average annual health expenditures in rural China. Initially introduced in
three counties per province, the scheme is meant to reach the entire country by 2010.

The evaluation

Initially there was some resistance to having an external evaluation because the
government was conducting its own evaluation but, in the end, an agreement was reached
that the external impact evaluation would be conducted cooperatively with government
statisticians and would serve as an input into the government's evaluation. The
government statisticians had strong survey experience despite limited familiarity with
impact evaluation techniques.

The key evaluation questions focused on utilization of inpatient and outpatient services,
out-of-pocket expenditures, and facility revenues. To examine these, the external team
preferred a double difference with matching approach, using non-participant counties for
comparison. The government counterpart preferred not to survey non-participant
counties, using a comparison between insured and uninsured in participant counties and
regression analysis to control for differences. In the end, data were collected in
participant and non-participant counties, but because of non-comparability, the final
analysis used a double difference with matching between insured and uninsured in
participant counties.

The evaluation findings

The findings from household data showed that utilization had increased, but out-of-
pocket payments did not decrease. Facilities data confirmed the increased utilization of
services and found that revenues had increased more than utilization. These results
showed that medical insurance is not guaranteed to decrease expenses, leading to
questions about the level of care provided and whether services were selected because of
medical necessity or for revenues.

Evaluation utilization and influence

Dissemination efforts included a report for the Chinese government in Chinese that
included the joint findings and analysis by the government statisticians as well as a
jointly-written scientific journal article. Initially, the findings were not well-received,
perhaps because of the news that the primary objective of reducing out-of-pocket
payments had not been achieved; however, after internal discussions to review and
explain the findings, the government became more comfortable with the results, and the
report was incorporated into the larger government. In order to address program and
wider health problems raised in the evaluation, a committee was formed to include
various ministries, international organizations, and other consultants and, in January



                                                                                         51
2008, the government announced a number of reform measures that included additional
funding for the health insurance scheme.

Another important impact of this evaluation was capacity building in government,
especially among the statisticians. They reported that the cooperative experience had not
only taught them impact evaluation principles but had also given them hands-on practice
working on a real evaluation. It has also generated more government interest in using
impact evaluation in the future, leading to additional study of impact evaluation methods
and consideration in the design of future surveys.

Lessons learned

An essential lesson in this evaluation is the value of the relationship among the
stakeholders and the evaluators. The choice of partners is important, and there has to be
a relationship of trust. In some cases, the trust may already exist, especially when there is
already high familiarity with impact evaluation. This is not always true, however, and
even where a government or organization is comfortable with impact evaluation, there
may be other concerns about the potential results. In these situations ­ perhaps in any
situation ­ it is necessary to take time and effort to build trust and to handle the process
and the results with sensitivity. When there are "bad" results, the proper means and
context for presentation and discussion may make the difference between a rejection or
suppression of the results and beneficial reforms and future use of impact evaluations and
other evidence for policy making.

Cooperation with the "clients" of an evaluation cannot begin too early. In this case,
involving the government in the choice of survey design helped to ensure there was
comfort with the evaluation methods and eventually the results ­ increasing utilization.

The cooperation with the local counterparts not only builds trust but also capacity. Skills
and lessons learned during one impact evaluation can be applied to future evaluations,
and clients may begin to seek out new opportunities to apply these skills.




                                                                                          52
Table 9: Summary of the programs evaluated, the evaluation questions, design and main findings - HEALTH

Program                                         Evaluation questions                                     Evaluation design
1. Kenya: Bed net distribution experiment:       Is free distribution or cost-recovery more               Randomization of insecticide-treated
Free vs. Cost-Recovery                          effective for increasing distribution and use of nets?   net prices at prenatal clinics
Goals: Increased distribution and use of         How price elastic is demand?
insecticide-treated nets
2. Kenya: Deworming treatment and worm-          Does (school-based) deworming improve worm               Randomization, using       phased-in
prevention health messages                      load?                                                    project implementation
Goals: Reduced worm infections, increased        Improve schooling outcomes?
prevention behaviors, improved schooling         Do health messages on worm-prevention induce
outcomes                                        the preferred behaviors?
                                                 How does cost-sharing affect uptake?
                                                 How does social learning affect uptake?
3. China: Voluntary Health Insurance Scheme      Does the health insurance scheme reduce out of             Double difference with matching
Goals: Reduced out of pocket healthcare         pocket expenditures?                                        Completed as an input into the
expenditures, increased utilization of needed    Does it increase use of services?                          government's own evaluation and
health services                                                                                             was done in collaboration with
                                                                                                            government staff
Table10: The possible effects and influence of each evaluation and the reasons why the evaluations why the evaluations were influential -
HEALTH
Influence/ effects of the evaluation                                 Reasons why influential
Kenya ­ Insecticide-treated nets
1. Reinforced government's and NGO's decision to distribute nets for free      There was already interest in the subject ­ some organizations and donors
2. Influenced donor's (DFID) choice between free and cost-sharing             were trying to decide whether to pursue free distribution or cost-sharing, and
   distribution of nets in Somalia                                            others were looking for evidence to support the decisions they had made
3. Results seem to have been questioned among some private foundations,        Results contrary to some existing preferences may have led to questions on
   possibly limiting impact                                                   methodological soundness
Kenya - Deworming
1. Program has been expanded, and government has discontinued cost-            There was a combination of a high-quality evaluation, high-profile
   sharing practices                                                          authors, and compelling findings
2. Deworming has become more commonly-discussed among international            There have been concerted advocacy efforts to promote findings
   organizations such as WHO, World Bank, IMF
3. Deworming is now considered an education intervention


China ­ Health Insurance
1. Training and capacity building in impact evaluation analysis for Chinese    Completed as an input into the government's own evaluation and was
   government team                                                            done in collaboration with government staff
2. A committee was formed for follow-up and reforms have been                  Were flexible to address government concerns on the security of their
   announced; more resources have been allocated to the program               information (analysis done using government data on government computers,
                                                                              for example)




                                                                                                                                                          54
   5. Sustainable Development

   A. Introduction

The fourth session discussed the evaluation of three sets of sustainable development
interventions: microfinance programs that provide small loans to poor individuals in
Madagascar and Morocco; the Food Security Program implemented as a safety net
response to a drought in Ethiopia; and the rehabilitation of rural roads in Vietnam. An
overview of the sustainable development projects, the key evaluation questions, the main
evaluations findings, the evaluation designs; how the evaluations were utilized, their
influence on project implementation and policy, and the factors affecting utilization are
presented in Chapter 1 (Tables 1 ­ 4). This chapter provides more detail on each of these
evaluations.

   B. Impact Evaluations of Microfinance Institutions in Madagas car and
      Morocco

The programs

ADEFI and Al Amana are two microfinance institutions in Madagascar and Morocco,
respectively, that receive financing from the French Development Agency (AFD).
ADEFI (Action pour le Développement et le Financement des Micro-Entreprises) was
created in 1995. With six regional branches and 31 commercial agencies, it is a mutualist
scheme that provides loans and savings services to urban micro-businesses in
Madagascar.

Al Amana is the largest microfinance institution in Morocco. Originally serving only
urban microenterprises when it opened in 1998, the decision was made to explore
expansion into rural areas. Starting in 2006, 60 new branches were opened in 80 rural
districts. For two peripheral villages in each district, one was randomly assigned a
branch, with the other being phased in a year later.

The evaluations

For ADEFI, there were actually two evaluations. A first iteration was done without any
specific evaluation questions in mind and was not considered rigorous enough, so a
second impact evaluation was conducted using a double difference approach against a
counterfactual group of non-client micro-businesses. Key evaluation questions for the
second evaluation involved the impact of microfinance on indicators such as financial
turnover, production, value added, staff, capital and labor productivity.

The subsequent Al Amana impact evaluation was commissioned by the organization
itself, as it wanted to know the benefit of expanding into rural areas. In particular, the
evaluation considers effects on agricultural and non-agricultural activities, income and
expenditures, and household security. The expansion process was designed to allow for a
nationally-representative randomized control trial evaluation approach. As such, the
study was the first of its kind for microfinance. Data included a pre-program survey and
follow-ups after one and two years.

The evaluation findings

The results of the second ADEFI evaluation showed no impact on the participating
micro-businesses. There were concerns, however, that high attrition led to low statistical
significance and thus few policy implications. Thus far, the findings of the Al Amana
evaluation have shown low program uptake, but the rest of the results are still pending.

Evaluation utilization and influence

The ADEFI evaluation itself seems to have had only minimal impact. First, it was
disseminated only to direct stakeholders ­ the ADEFI and AFD, with very little
readership within AFD. The bigger problems, however, seem to have been the content of
the evaluation itself, which failed to appeal to its intended audience. The organization
was interested more in social and behavioral rather than economic impacts, and the staff
considered the methods "too statistical". The lack of clear policy implications meant that
only a few minor, less-central recommendations were implemented.

In this case, however, the evaluation process proved to be useful. Lessons learned in
Madagascar were applied to the evaluation in Morocco, which was considered to be
much more successful. In Morocco, ongoing dissemination has involved regular meetings
with AFD's operational unit in charge of microfinance projects and with other
microfinance institutions in Morocco, as well as intermediary reports published and
posted on AFD and Al Amana's websites. Conferences have been held for micro-finance
practitioners. Planned dissemination includes an additional conference in Morocco,
policy briefs and working papers, and academic articles.

Even before the final results are delivered, the Al Amana evaluation has provided some
useful operational feedback and generated interest in gathering additional evidence. From
the preliminary finding that take-up has been lower than expected in rural areas, Al
Amana plans to adapt the design of its loans. Interest in the study has prompted a
complementary study to investigate how rural households finance activities, to better
design new financial products.

Lessons learned

A number of factors contributed to the evaluation's influence. First, the subject was
relevant and timely. The organization was looking to extend credit to rural areas, and
there was a question about the need and the benefits. There were no other RCT studies on
microfinance, so the evaluation benefited from a degree of novelty and recognized rigor.




                                                                                       56
Second, there was "true partnership from the beginning". The organization whose project
was being evaluated wanted the impact evaluation, and there were regular meetings
among stakeholders. In particular, it proved useful to have geographic proximity between
research team and the organization, improving communication.

Third, an emphasis on clarity and rigor meant that methods and results were trustworthy
and understandable. Randomization is widely accepted and the technique is not difficult
to explain. From the beginning of the evaluation, precise questions were identified, so the
impact evaluation could be well focused.

Finally, dissemination and high visibility were prioritized. Active dissemination was
planned from the beginning, and the choice of institution and evaluators was strategic. Al
Amana is a leading microfinance organization and the largest in Morocco, and evaluators
included high-level academics, to ensure that results would be published and read
internationally.


   C. Ethiopia's Food Security Program

The program

In response to a drought in 2001-2002, the Ethiopian government chose to reform the
delivery and quality of the safety net system for vulnerable populations. The resulting
Food Security Program comprises the Productive Safety Net Program, involving labor-
intensive public works to construct productive community assets and a small number of
cash transfers to particular vulnerable groups, and the aptly-named Other Food Security
Program, providing agricultural assistance. The combination of these two components
was meant to provide a safety net for emergency need while hopefully building long-term
productivity and thus reducing poverty and vulnerability.

The evaluation

Because a log frame had been completed for the program, there was ready information on
what it was intended to achieve and how, clarifying the evaluation questions: "It was
clear to us what we were being asked to measure, what outcomes that were of particular
interest to the government". These measures included the project objectives: food
security, asset growth, and perceived usefulness of the works being constructed by the
program. The evaluation also offered quick feedback on the implementation process ­
that is, it investigated the effectiveness of the targeting mechanism and the degree to
which payments had actually been delivered to the intended recipients.

Because there was some debate in the beginning about whether to conduct an impact
evaluation, no baseline data were collected before the program began. As a second-best
option, the evaluators collected retrospective data from beneficiaries and non-
beneficiaries. The evaluation then used a double difference with matching on the
retrospective data. Between the lack of true baseline data and the fact that purposive


                                                                                        57
targeting with national coverage does not lend itself to the identification of an easy
counterfactual, the rigor of the evaluation may have been less than ideal.

The evaluation findings

In terms of process, the evaluation found that targeting was effective, and the assets
constructed through the public works projects (roads, soil and water conservation) were
considered useful. However, there were notable delivery shortcomings, including delays
in payments and a lack of overlap among program components. These created some
questions about how to define "participation" for evaluation purposes.

As for outputs, the evaluation found evidence that food security was being improved, and
among program participants there were increases in borrowing for productive purposes
and use of agricultural technologies. It did not appear, however, that household assets had
grown.

Evaluation utilization and influence

The Ethiopian impact evaluation was particularly interactive among the stakeholders
throughout the process: in commissioning the research, setting priorities, and
dissemination. Having many donors meant that coordination and communication among
them and with the government and the evaluators was difficult. At the same time, it was
hard to reach everyone with the same presentation because of differing levels of
expertise. These challenges are common ones. In this case, the team chose to deal with
them by using frequent meetings, many of them one-on-one, which built trust and
understanding. Having an in-country team member helped facilitate consistent
communication. Also, because all parties were involved throughout the entire process ­
"no surprises" ­ results were considered fairly binding and acceptable.

The interim evaluation was able to have direct and immediate influence on the program
itself, in part because there was a receptive audience of government and donors who
wanted to understand the results in order to help the program succeed. It shifted attention
away from political matters and toward some of the administrative and logistical
practicalities that were being overlooked (such as when to graduate participants). The
results it offered were relevant, guided by the government's own log frame. It provided
the results in a timely manner, too ­ before the program had ended. As a result, there
were adjustments in procedures and measures, as well as follow up studies to explore
delivery challenges.

Beyond impacts on the program, the evaluation experience increased government
appreciation of having external evaluations to better understand programs, contributing to
more of a "culture of evaluation". Also, the evaluation team worked closely with the
Statistics Agency in implementing the questionnaire, building the capacity of the
Statistics Agency and government.




                                                                                        58
Lessons learned

Starting the evaluation data collection after the evaluation may have reduced the quality
of data collected and thus the insights that could be gained from this evaluation.
However, given the set of limitations, this case may be a particularly good example of
how, with careful management and particular attention to communication and
dissemination, even a less-than-ideal evaluation can prove to be very useful, especially if
it is timely and addresses urgent questions.


   D. Rural Roads in Vietnam

The program

Between 1997 and 2001, the Vietnam Rural Transport Project I was designed to
distribute funds for the rehabilitation of rural roads to commune centers in 18 provinces,
with the objectives of linking communities to markets and reducing poverty. Participant
communes were selected by the provinces, within minimum population-density
requirements and maximum cost limits.

The evaluation

The evaluation explored whether or not the project funded what it intended ­ that is,
whether resources supplemented or substituted for local resources designated for roads
and road rehabilitation ­ and the impact on market and institutional development, as well
as whether road might stimulate local markets or increase access to more distant markets.
Key measurements included outputs such as kilometers of rehabilitated roads, new roads
(which the project was not intending to fund), and road quality, as well as outcomes such
as access to markets, livelihoods, and even school completion. It also examined
heterogeneity of impact, particularly whether diminishing returns for villages that started
off better off.

The evaluation employed a double-difference approach with propensity score match and
included controls for local conditions and events over time. Data came from the Survey
of Impacts of Rural Roads in Vietnam panel of 200 communes and 3000 households and
included a pre-program baseline in 1997 and follow-up rounds in 1999, 2001, and 2003.

The evaluation findings

The evaluation showed that the project had resulted in more kilometers of rehabilitated
roads though fewer than were expected. More new roads had been built as well, however,
suggesting that the additional funding allocated to roads had primarily "stuck" to the
roads sector, though not completely to rehabilitation as intended. The project improved
road quality as well. The overall effect was that the project funding resulted in additional
spending on roads instead of displacing regular spending on roads. As a result, access to
markets, goods, and services increased, and there has been livelihood diversification.



                                                                                         59
Primary school completion has also improved. Some impacts such as demand for
unskilled labor appeared in the short term but disappeared in the longer term; other
impacts took longer to appear. In general, poorer communities benefited from larger
impacts.

Evaluation utilization and influence

Dissemination of this impact evaluation thus far has included a number of published
academic and working papers and presentations in Washington, DC; at the Ministry of
Planning and Development in Mozambique; and at the Transport Research Board 2008
Annual Meeting.

The impact of the evaluation on the project itself has been low. This type of evaluation
takes a long time, especially when capacity is low. In general, however, no matter how
much time may be required for data collection and analysis, some projects generate
impacts that take time to manifest. Additionally, the decision was made to take time to
make the evaluation thorough and improve accuracy.

In this case, the benefits of the evaluation are accruing to other projects and other
evaluations because of its subject and quality. Compared to health and education, impact
evaluations of rural roads ­ and infrastructure interventions in general ­ are more difficult
and therefore much less common. The implementation and dissemination of this
evaluation has thus helped integrate impact evaluation into the roads sector and generated
interest in other infrastructure evaluation: there has been high demand for information on
the methods and data needs. Practitioners may be able to relate better to evaluations of
interventions more similar to their own, with methodological considerations and
constraints more similar to those that they face.

The quality of the evaluation has also raised the standard for rural road evaluations,
which is expected to increase the quality of information future evaluations will provide.
At the same time, it has made it easier for others to follow its example. Methods and
questionnaires have been used for other road evaluations.

Lessons Learned

For impact evaluations, there is often a trade off between speed and quality. The impact
of higher quality evaluations may not be seen in the actual intervention being evaluated,
but the benefits may extend to other projects and evaluations by pushing the frontier of
what can be evaluated and how and by setting new expectations for evaluation quality.

Timing may create another concern, if a project or evaluation takes long enough to
undergo staff changes. Staff may have little incentive to start an evaluation that they will
not be around to see completed (or get credit for) and, alternatively, one person interested
in impact evaluation may be replaced by someone with different priorities. Staff turnover
may thus result in low team interest, and low team interest often results in low
government interest in evaluation.



                                                                                          60
Also, there may be special challenges that can suppress demand in sectors and regions
that are less commonly evaluated. Where there is little habit or "culture of evaluation",
there may be less funding and less pressure to evaluate, and perhaps higher resistance to
accountability. It may require special efforts to begin to build a culture of evaluation.




                                                                                      61
Table 11 Summary of the programs evaluated, the evaluation questions, design and main findings - SDN
Program                                      Evaluation questions                          Evaluation design
1. Madagascar - ADeFI Microfinance Institution:           Does participation in microfinance improve            First, ex-post matching of beneficiaries and
Provides credit to small businesses                      financial turnover, production, value added, staff,   non-beneficiaries. Not considered rigorous enough
Goals: Assist very small, small and medium business      capital and labor productivity and capital             Second evaluation: double difference. More
to develop their activities                              productivity?                                         robust, but results were of little use ­ high attrition
                                                                                                               rates left low statistical significance in the results
2. Morocco - Al Amana Microfinance: Provides                 Activities and sales of enterprises                Randomized control trial
credit to urban areas; expanding into rural areas
Goals: Provide access to credit for impoverished
people
3. Ethiopia - Food Security Program: Labor-              Process:                                               Double difference with matching techniques
intensive public works safety-net program,                Effective targeting                                   Beneficiaries and non-beneficiaries were
unconditional transfers for certain vulnerable groups,    Delivery of benefits                                 compared using retrospective data
agricultural assistance and technologies                 Project objectives:                                    There        were    challenges    in    defining
Goals: Improved food security and the well-being of       From project log frame                               "beneficiaries" because of program delivery gaps
chronically food-insecure people in rural areas           Food security
                                                          Asset growth
                                                          Perception that assets being constructed were
                                                         useful
4. Vietnam: Rural Roads (1997-2001)                       Did the project fund what it intended ­ did           Double-difference with propensity score
Goals: Rehabilitation of rural roads to commune          resources supplement or substitute for local          matching
centers, to link communities to markets and reduce       resources?                                             Controls for local conditions, events over
poverty                                                   Impact on market and institutional                   time, etc
                                                         development                                            Used pre-program baseline data in 1997 and
                                                          Heterogeneity of impact                              follow-up rounds in 1999, 2001, and 2003
Table 12 The possible effects and influence of each evaluation and the reasons why the evaluations why the evaluations were influential -
SDN
Influence/ effects of the evaluation                                             Reasons why influential or not
Madagascar Microfinance: ADEFI and AFD
1. Limited policy use                                                             No dissemination beyond direct stakeholders, no efforts to plan for
2. A few project changes were made, but these were not strongly linked to        dissemination from the beginning
    recommendations made by the evaluation                                        No clear message for policy implications. Methods and results were hard
3. Lessons from the evaluation process were learned for a second set of          to understand, and "too statistical" for the institution's management to
    impact evaluations in Morocco                                                understand
                                                                                  Evaluation asked different questions than what the stakeholders were
                                                                                 interested in
                                                                                  The impact evaluation went unread by donors, for the most part
                                                                                  These lessons were learned for the next evaluation
Morocco Microfinance: Al Amana and AFD
1. Initial results have led to adaptations in Al Amana's loans                    Subject was relevant and timely, there was existing demand
2. Interest in the study has prompted a complementary study to investigate        Was first RCT on microfinance
   how rural household finance activities, to design better financial products    "True partnership from the beginning" ­ There was much cooperation and
3. [Final results of evaluation still pending]                                   communication among stakeholders, and there was an in-country evaluation
                                                                                 team member
                                                                                  Clear and rigorous evaluation methods
                                                                                  Dissemination and visibility are prioritized
Ethiopia: Food Security Program
1. The interim evaluation brought attention to some of the administrative         Timing: an interim evaluation allowed for mid-program changes
    and logistical practicalities that were being overlooked                      There was a receptive audience of government and donors that wanted to
2. Generated follow up studies to explore delivery challenges                    understand the results, because of their commitment to see the program
3. Sparked administrative dialogue on practicalities not considered initially    succeed
    (such as when to graduate participants), led to adjustment of procedures      "No surprises": ongoing communication, frequent and often one-on-one
    and measures                                                                 meetings, with space to address concerns. As a result, stakeholders became
4. Increased government appreciation of having an external evaluation            more comfortable with each other, the evaluators, and the results
5. Contributed to a more of a "culture of evaluation"                             Close collaboration with stakeholders in the evaluation process, allowing
6. Capacity building for the Statistics Agency and government                    for heavy inputs into the design
7. Results were considered fairly binding and acceptable                         Challenges:
                                                                                  Lack of baseline may have reduced the quality of data collected and thus
                                                                                 the insights that could be gained from the evaluation
                                                                                  Many donors meant that coordination and communication was more
                                                                                 difficult. Variation in expertise made it hard to present "at the right level"
                                                                                  Government and donor tensions




                                                                                                                                                             63
Vietnam: Rural Roads Rehabilitation Project
1. Has helped introduce impact evaluation to the infrastructure sector     There has been high demand for dissemination on the methods and data
2. Raises the standards for rural road evaluations                        needs for rural road evaluations, especially because infrastructure evaluations
3. Methods and questionnaires have been used for other road evaluations   have been rare
4. Low impact on the project itself                                        The infrastructure sector practitioners may have a harder time relating to
5. Dissemination is still in the early phases                             more commonly- and easily- conducted types of evaluations, such as health
                                                                          Challenges:
                                                                           This type of evaluation takes a long time, especially when capacity is low,
                                                                          but generally when impacts take time to manifest
                                                                           Project staff turnover means that one person may be interested in the
                                                                          evaluation while the next may not. Low team interest leads to low government
                                                                          interest
                                                                           There were less funds and less pressure for impact evaluation when the
                                                                          project started
                                                                           Not everyone wants accountability!




                                                                                                                                                       64
    6. Lessons learned: Strengthening the Utilization and
       Influence of Impact Evaluation
    A. How are impact evaluations used?

Impact evaluations can be used as an assessment tool to help strengthen project and
program design by providing a more systematic, rigorous, and quantifiable assessment of
how a project has performed, what it has achieved (compared to its intended objectives),
who has and has not benefited, and how the costs of producing the benefits compare with
alternative ways of using the resources.

Impact evaluations are also used as a political tool to provide support for decisions that
agencies have already decided upon or would like to make, to mobilize political support
for high profile or controversial programs and to provide political or managerial
accountability. This latter function has been important in countries where new
administrations were seeking to introduce transparency into the design and
implementation of high profile, politically attractive programs. Impact evaluations can
also provide independent corroboration and political cover for terminating politically
sensitive programs ­ in which case the international prestige and independence of the
evaluator was found to be important. In fact, in the end it is likely to be the potential
political benefit or detriment that causes decision makers to embrace or avoid
evaluations, and those who would like to promote impact evaluation as an assessment and
learning tool will have to be fully aware of the given political context and navigate
strategically.

    B. What kinds of influence can impact evaluations have?

The twelve impact evaluations discussed in this report were utilized and had influence in
three broad areas: project implementation and administration; providing political support
for or against a program; and promoting a culture of evaluation and strengthening
national capacity to commission, implement, and use evaluations. It is not only the
findings of an impact evaluation that can have an impact. The decision to conduct an
evaluation, the choice of methodology, and how the findings are disseminated and used
can all have important consequences ­ some anticipated, others not; some desired and
others not. For example, the decision to conduct an evaluation using a randomized
control trial can influence who benefits from the program, how different treatments and
implementation strategies are prioritized, what is measured, and the criteria used to
decide if the program had achieved its objectives.9



9
 A frequently cited example from the US was the decision to assess the performance of schools under the
No Child Left Behind program in terms of academic performance measured through end-of-year tests.
This meant that many schools were forced to modify their curricula to allow more time to coach children in
how to take the tests, often resulting in reduced time for physical education, arts, and music.
The influence of evaluations can be seen in administrative realms such as program design
and scope or the political realm in the form of popular support for a program or its
associated politicians. Understanding the role of impact evaluation is also a process that
evolves as managers, policymakers and other stakeholders become more familiar with
how evaluations are formulated, implemented and used. For high profile programs, the
influence of the evaluation may also be seen in how the debate on the program is framed
in the mass media.

     C. Guidelines for strengthening evaluation utilization and influe nce

The following is a synthesis of the broad range of factors identified in the presentations
as potentially affecting evaluation utilization.

Timing and focus on priority stakeholder issues:
 The evaluation must be timely and focus on priority issues for key stakeholders.
   Timing often presents a trade-off: on the one hand, designing an evaluation to provide
   fast results relevant for the project at hand, in time to make changes in project design
   and while the project still has the attention of policymakers. On the other hand,
   evaluations that take longer to complete may be of higher quality and can look for
   longer term effects on the design of future projects and policies.
 Cooperation with the "clients" of an evaluation cannot begin too early. In this case,
   involving the government in the choice of survey design helped to ensure there was
   comfort with the evaluation methods and eventually the results ­ increasing
   utilization.
 The evaluator must be opportunistic, taking advantage of funding opportunities, or
   the interest of key stakeholders. The evaluators must work closely with national
   counterparts and be responsive to political concerns. Several countries that have
   progressed toward the institutionalization of evaluation at the national or sector level
   began with opportunistic selection of their first impact evaluations10.
 The evaluator should always be on the look-out for "quick-wins" ­ evaluations that
   can be conducted quickly and economically and that provide information on issues of
   immediate concern. Showing the practical utility of impact evaluations can build up
   confidence and interest before moving on to broader and more complex evaluations.
 There is value in firsts. Pioneer studies may not only show the impact of the
   intervention, but in a broader context they may also change expectations about what
   can and should be evaluated or advance the methods that can be used. Even less-
   than-ideal evaluations that are the first or early in their context can build interest and
   capacity for impact evaluation.
 A series of sequential evaluations gradually builds interest, ownership and utilization.

10
  See IEG (2008) Institutionalizing Impact Evaluation within the Framework of a Monitoring and
Evaluation System. The Education for All evaluations in Uganda were cited as an example of
institutionalization at the sector level and the SINERGIA evaluation program under the Planning
Department in Colombia is an example of institutionalization of a national impact evaluation system . The
report is available at:
http://lnweb90.worldbank.org/oed/oeddoclib.nsf/DocUNIDViewForJavaSearch/E629534B7C677EA78525
754700715CB8/$file/inst_ie_framework_me.pdf


                                                                                                      66
   For impact evaluations, there is often a trade-off between speed and quality. The
   impact of higher quality evaluations may not be seen in the actual intervention being
   evaluated, but the benefits may extend to other projects and evaluations by pushing
   the frontier of what can be evaluated and how and by setting new expectations for
   evaluation quality.
   Timing may create another concern. If there are likely to be staff changes before an
   evaluation is completed, staff may have little incentive to start an evaluation that they
   will not see completed (or get credit for). Alternatively, one person interested in
   impact evaluation may be replaced by someone with different priorities.
   Starting the evaluation data collection late in the project cycle may reduce data
   quality and the insights that could be gained. However, with careful management and
   particular attention to communication and dissemination, even a less-than-ideal
   evaluation can prove to be very useful, especially if it is timely and addresses urgent
   questions.
   Also, there may be special challenges that can suppress demand in sectors and regions
   that are less commonly evaluated. Where there is little habit or "culture of
   evaluation", there may be less funding and less pressure to evaluate, and perhaps
   higher resistance to accountability. It may require special efforts to begin to build a
   culture of evaluation.

Clear and well communicated messages
 Clarity and comprehensibility increase use. It helps when the evaluation results point
   to clear policy implications. This may also apply to the comprehension of methods.
   While stakeholders may be willing to "trust the experts" if an evaluation offers results
   that support what they want to hear, there may be a reasonable tendency to distrust
   results ­ and particularly methods ­ that they don't understand.
 People tend to trust or distrust evidence based on what they already believe, looking
   for results that confirm what they believe and looking for ways to discredit contrary
   information. Perhaps one reason is that it is difficult to distinguish between good and
   bad evidence. Currently, there is much ongoing work to provide training in
   measurement and evaluation for donors and policymakers: when individuals have a
   greater understanding of impact evaluation, they may be better able to recognize
   differing qualities of evidence, allowing individual evaluations to have greater
   impact.

Effective dissemination
 Rapid, broad and well targeted dissemination strategies are important determinants of
    utilization. One reason that many sound and potentially useful evaluations are never
    used is that very few people have ever seen them.
 Providing rapid feedback to government on issues such as the extent of corruption or
    other "hot" topics enhances utilization.
 Continuous and targeted communication builds interest and confidence and also
    ensures "no surprises" when the final report and recommendations are submitted.
    This also allows controversial or sensitive findings to be gradually introduced. Trust
    and open lines of communication are important confidence builders.



                                                                                         67
   An individual evaluation will rarely be entirely conclusive, but conclusions drawn
   from an accumulation of evidence may be more difficult to refute.
   The choice of the institution and the evaluators can contribute to dissemination and
   credibility of the findings.
   Making data available to the academic community is also an important way of
   broadening interest and support for evaluations and also of legitimizing the
   methodologies (assuming they stand up to academic critiques as have PROGRESA
   and Familias en Accion).

   Positive and non-threatening findings
   Positive evaluations, or those that support the views of key stakeholders, increase the
   likelihood they will be used. While this is not surprising, one of the reasons is that
   many agencies are either fearful of the negative consequences of evaluation or
   considered evaluation as a waste of time (particularly the time of busy managers) or
   money. Once stakeholders have appreciated that evaluations were not threatening
   and were actually producing useful findings, agencies have become more willing to
   request and use evaluations and gradually to accept negative findings ­ or even to
   solicit evaluations to look at areas where programs were not going well.
   There is always demand for results that confirm what people want to hear. Concerns
   over potential negative results, bad publicity, or improper handling of the results may
   reduce demand; sensitivity, trust-building, and creative arrangements may help
   overcome these fears. Consequently, there may be some benefit in taking advantage
   of opportunities to present good results, especially if it helps the process of getting
   stakeholders to understand and appreciate the role of impact evaluation.

Active engagement with national counterparts
 The active involvement of national agencies in identifying the need for an evaluation,
    commissioning it, and deciding which international consultants to use is central to
    utilization. It gives ownership of the evaluation to stakeholders and helps ensure the
    evaluation focuses on important issues. It often increases quality by taking advantage
    of local knowledge and in several cases reduces costs (an important factor in gaining
    support) by combining with other ongoing studies.
 This cooperation can enable evaluators to modify the initial evaluation design to
    reflect concerns of clients ­ for example, changing a politically sensitive randomized
    design to a strong quasi-experimental design.
 Involving a wide range of stakeholders is also an important determinant of utilization.
    This can be achieved through consultative planning mechanisms, dissemination and
    ensuring that local as well as national level agencies are consulted.
 In some contexts, the involvement of the national statistical agency increases the
    government's trust, and the results and the process have been better accepted when
    overseen and presented by the statistics agency.

Demonstrating the value of evaluation as a political and policymaking tool and adapting
the design to the national and local political contexts
 When evaluation is seen as a useful political tool, this greatly enhances utilization.
    For example, managers or policymakers often welcome specific evidence to respond


                                                                                       68
   to critics, support for continued funding or program expansion. Evaluation can also
   be seen as a way to provide more objective criticism of an unpopular program.
   Once the potential uses of planning tools such as cost-effectiveness analysis are
   understood, this increases the demand for, and use of, evaluations. Evaluations can
   also demonstrate the practical value of good monitoring data, and increased attention
   to monitoring in turn generates demand for further evaluations. When evaluations
   show planners better ways to achieve development objectives, such as ensuring
   services reach the poor, this increases utilization and influence.
   Increasing concerns about corruption or poor service delivery have also been an
   important factor in government decisions to commission evaluations. In some cases,
   a new administration wishes to demonstrate its transparency and accountability or to
   use the evaluation to point out weaknesses in how previous administrations had
   managed projects.
   Evaluations that focus on local contextual issues (i.e. that are directly relevant to the
   work of districts and local agencies) are much more likely to be used.
   In cases where a clearly defined selection cut-off point can be defined and
   implemented (e.g. the score on a poverty or probability of school drop-out scale), the
   regression discontinuity design (RD) can provide a methodologically strong design
   while avoiding political and ethical concerns about RCTs.
   Evaluators must adapt evaluation designs to political realities when deciding what
   evaluation strategies will be both technically sound and politically feasible.
   Evaluations of large, politically sensitive programs should be designed at an early
   stage before the programs have developed a large constituency and become resistant
   to questioning of their goals and methods. Evaluations should begin early in the
   program with greater use being made of small pilot projects to assess operational
   procedures and viability for expansion.

The methodological quality of the evaluation and credibility of the international
evaluators
 High quality of an evaluation is likely to increase its usefulness and influence. Quality
   improves the robustness of the findings and their policy implications and may assist
   in dissemination (especially in terms of publication). However, an impact evaluation
   of a compromised quality may still be useful if it can provide timely and relevant
   insight or if it ventures into new territory: new techniques, less-evaluated subject
   matter, or in a context where relevant stakeholders have less experience with impact
   evaluations.
 The credibility of international evaluators, particularly when they are seen as not tied
   to funding agencies, can help legitimize high profile evaluations and enhance their
   utilization.
 In some cases, the use of what is considered "state of the art" evaluation methods
   such as randomized control trials can raise the profile of evaluation (and the agencies
   that use it) and increase utilization.
 New and innovative evaluations often attract more interest and support than the
   repetition of routine evaluations.
 On the other hand, while studies on the "frontier" may be more novel or attract more
   attention, subsequent related studies may be useful in confirming controversial


                                                                                         69
   findings and building a body of knowledge that is more accepted than a single study,
   especially a single study with unpopular findings.
   Evaluation methods, in addition to being methodologically sound, must also be
   understood and accepted by clients. Different stakeholders may have different
   methodological preferences.

Evaluation capacity development
 Evaluation capacity, especially at a local level, is an important factor in the quality of
   an impact evaluation that also affects the ability of stakeholders to demand,
   understand, trust, and utilize the results.
 Capacity building is an iterative process and may improve both demand and quality.

   D. Strategic considerations in promoting the utilization of impact
      evaluations

Many of the evaluations cited in this report were selected opportunistically, depending on
the availability of donor funding and technical support and the interest of a particular
agency, or even a small group of champions within the agency. While individual
evaluations may have made a useful contribution, the cases illustrate that the effects and
benefits are often cumulative, and utilization and government buy-in tend to increase
where there is a sequence of evaluations. In several cases, the first evaluation was
methodologically weak (for example, being commissioned late in the project and relying
on retrospective data collection methods for reconstructing the baseline), but when the
findings were found useful by the national counterparts, this generated demand for
subsequent and more rigorous evaluations.

Effective utilization of impact evaluations is an incremental process, with the full benefits
only being realized once a number of useful evaluations have been conducted.
Policymakers, planners, managers and funding agencies gradually gain confidence in the
value of impact evaluation once they have seen some of the practical benefits, and have
learned that some of the initial concerns and reservations were not fully justified. A key
element in the successful utilization is developing a system for the selection of
evaluations that address key policy issues and for analysis, dissemination, and utilization
of the results. All of these considerations require the institutionalization of an impact
evaluation system with strong buy-in from key stakeholders and with a powerful central
government champion, usually the ministries of finance or planning.

Institutionalization of Impact Evaluation within the Framework of a Monitoring and
Evaluation System (IEG 2008) identifies a number of different paths towards the
institutionalization of impact evaluation and points out that the utility and influence of
many methodologically sound evaluations has been limited because they were looked
upon as one-off evaluations and did not form part of a systematic strategy for selecting
evaluations that addressed priority policy issues or that were linked into national budget
and strategic planning. This report argues that methodologically sound and potentially
useful impact evaluations do not automatically ensure the development of an evaluation




                                                                                          70
system, and that the creation of such a system requires a strong commitment on the part
of government agencies and donors over a long period of time.

The present publication corroborates many of the findings of the IEG study. In addition
to the recommendations and guidelines presented in the previous sections, the discussion
of the evaluation presentations11 raised the following issues:

        It is important identify and support impact evaluations that can provide findings
        and knowledge that will be useful to a broader audience than the project agency
        whose programs are being evaluated.
        The role of the evaluator should be clarified. Should they become advocates for
        the adoption of the evaluation findings (for example, the free distribution of anti-
        malaria or deworming treatments) or should their role be limited to the collection
        and analysis of data that the evaluation clients will interpret? While many clients
        require the evaluator to present recommendations, there is a concern in the
        evaluation profession that the requirement to present recommendations may lead
        to a bias in how the findings are presented (and particularly ignoring findings that
        do not support the recommendations).
        There is also a challenge when academics are asked to provide recommendations.
        The academic researcher is trained to present caveats rather than to come to firm
        conclusions. Also, the academic has a different set of incentives, and she or he is
        often judged on the number of publications (in journals that require the use of
        particular methodologies and give less value to policy recommendations based on
        the best available, but less rigorous, evidence).
        The previous point relates to a concern that the influential role of academic
        researchers in the program evaluation field means that many evaluations are
        method-driven rather than policy driven. This criticism has often been leveled at
        advocates of randomized control trials who are seen as ignoring important policy
        evaluations where it is not possible to use rigorous methods, in favor of
        evaluations that are less useful to policymakers and planners but where it is
        possible to use randomized designs.
        Further to this point was the recommendation that there is a need to consider rules
        and procedures for defining acceptable standards of evidence. Different fields,
        such as health and drug research, may traditionally use different standards of
        evidence and proof than those used in other fields such as conditional cash
        transfers and poverty analysis. Is it possible to define generally accepted
        standards of evidence that can apply in all sectors?
        The question of standards of evidence also applies to increasing use of mixed
        method evaluation designs that recognize and seek to reconcile the different
        criteria of evidence and proof conventionally used in quantitative and qualitative
        research.
        A final point concerned the question of whether all evaluation results should be
        disseminated. For example, if the success of an evaluation depends on close

11
 These considerations draw primarily from Michael Kremer's reflections during his presentation on the
Kenyan deworming evaluation.


                                                                                                        71
cooperation of national counterpart agencies, should there be situations in which
these agencies can decide whether and when certain findings should be
disseminated? There are other situations in which potentially important but
controversial findings may be based on weak evidence (for example with small
sample sizes and low statistical power). While researchers may understand that
such findings must be interpreted with caution, the mass media or political
supporters or critics of a program may ignore these caveats, perhaps jumping to
conclusions that a program should be terminated or an innovative approach
should receive major funding.




                                                                              72
Annex 1 The case studies
All of the case studies are available on video presentations, and (except where indicated)
the presentations are also available in Power Point on the conference website:
www.worldbank.org/iepolicyconference.

Education
Deon Filmer. Promoting Lower Secondary School Attendance: The Impact of the CESSP
      Scholarship Program in Cambodia.
Miguel Urquiola. The Effects of Generalized School Choice on Achievement and
      Stratification: Evidence from Chile's Voucher Program.
Antonie de Kemp and Joseph Eilor. Impact of Primary Education in Uganda [Video
      presentation only].

Anti-Poverty Programs and Conditional Cash Transfers
Emmanuel Skoufias. The Role of Impact Evaluation in the PROGRESA/Oportunidades
       Program of Mexico.
Orazio Attanasio. Evaluating a Conditional Cash Transfer: The Experience of Familias
       en Accion in Colombia.
Emanuela Galasso. Assessing Social Protection to the Poor: Evidence from Argentina.

Health
Adam Wagstaff. An Impact Evaluation of a Health Insurance Scheme in China.
Pascaline Dupas. Free Distribution or Cost-Sharing? Evidence from a Randomized
       Malaria Prevention Experiment (Bednets ­ Kenya).
Michael Kremer. Evaluating a Primary School Deworming Program in Kenya [Video
       presentation only].

Sustainable Development
Dominique Van De Walle. Making smart Policy: Using Impact Evaluations of Rural
       Roads (Vietnam).
Jocelyne Delarue. The Impact Evaluation of MicroFinance Projects and their Expected
       Use (Madagascar and Morocco).
John Hoddinott. Ethiopia's Food Security Program [Video Presentation only].

Reporting Back from the Sector Sessions and Lessons Learned
Norbert Schady. Impact Evaluation of Anti-Poverty Programs and Conditional Cash
       Transfers.
       http://siteresources.worldbank.org/INTISPMA/Resources/Training-Events-and-
       Materials/449365-1199828589096/NorbertSchady.pdf
Halsey Rogers. The Impact of Impact Evaluations: Lessons from the Education Sector.
       http://siteresources.worldbank.org/INTISPMA/Resources/Training-Events-and-
       Materials/449365-1199828589096/HalseyRogers.pdf



                                                                                       73