Policy Research Working Paper 9041 Integrating Value for Money and Impact Evaluations Issues, Institutions, and Opportunities Elizabeth D. Brown Jeffery C. Tanner Independent Evaluation Group October 2019 Policy Research Working Paper 9041 Abstract This mixed methods study investigates why fewer than one of rigor undermine editors’ capacity to evaluate the qual- in five impact evaluations integrates a value-for-money anal- ity of value-for-money analysis when it is integrated with ysis of the development intervention being evaluated. This impact evaluation evidence. Institutional funders of impact study distills four main insights from combined analysis evaluations do not consistently demand that cost analysis of 33 semi-structured and unstructured interviews, sur- be integrated into their funded evaluations. This study finds veys of 497 policy makers and 16 journal editors, and no evidence in support of the myth that policymakers do portfolio analyses of World Bank and worldwide impact not demand cost evidence. Rather, it finds that researchers evaluations. The study finds that low levels of training in have few ways of knowing what kind of analysis policy- cost data collection and analysis methods, together with makers need and when they need it. Improving the stock of a lack of standardization of the value-for-money assump- impact evaluators who are cross trained in value-for-money tions (e.g., time horizons, discount rates, and economic or methods, establishing standards in what constitutes rigor financial cost accounting) limit value-for-money integra- in costing, resolving methodological issues, and improv- tion into impact evaluations. Further eroding researchers’ ing linkages between policymakers and researchers would incentives, demand for cost evidence from the journals that lead to greater integration of value-for-money methods in publish impact evaluations is mixed. Ill-defined standards impact evaluations. This paper is a product of the Independent Evaluation Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at jtanner@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Integrating Value for Money and Impact Evaluations Issues, Institutions, and Opportunities Elizabeth D. Brown Jeffery C. Tanner Key Words: value for money, impact evaluation, cost-effectiveness analysis, cost-benefit analysis, international development JEL Codes: 01-Economic Development, 0100 Economic Development: General Table of Contents 1. Background ..................................................................................................................... 1 2. Evaluation Questions and Strategy ............................................................................... 4 Approach ..................................................................................................................... 4 Data ............................................................................................................................. 5 Methods ....................................................................................................................... 8 3. Findings.......................................................................................................................... 14 4. Discussion and Conclusion ........................................................................................... 25 Box Box 2.1. Defining Value for Money ......................................................................................... 4 Figures Figure 1.1 Impact Evaluations Published Per Year (1990–2015)............................................. 1 Figure 2.1 Composition of Policymaker sample....................................................................... 7 Tables Table 3.1 Why is VFM so Infrequently Incorporated into Impact Evaluations?.................... 15 Table 3.2 Willingness to Pay for Impact, Cost, and VFM Information.................................. 17 iv Abbreviations and Acronyms 3ie International Initiative for Impact Evaluations BACO best available charitable option CBA cost-benefit analysis CBCSE Center for Benefit-Cost Studies of Education CEA cost-effectiveness analysis CUA cost utility analysis DHS Demographic and Health Surveys DIME Development Impact Evaluation ERR economic rate of return GEA general efficiency analysis IE impact evaluation IEG Independent Evaluation Group IER Impact Evaluation Repository LSMS Living Standards Measurement Survey SIEF Strategic Impact Evaluation Fund VFM Value for Money DFID U.K. Department for International Development USAID U.S. Agency for International Development MCC Millennium Challenge Corporation NGO nongovernmental organization SROI social return of investment All currency amounts are in U.S. dollars unless otherwise indicated. v Acknowledgments The authors wish to thank Patrick McEwan for contributing his advice and knowledge throughout the project’s duration, and to Hank Levin, John Strand, and Howard White who provided views and comments on an earlier draft. Aliza Marcus and Joost de Laat graciously facilitated access to the contact list of Strategic Impact Evaluation Fund (SIEF) subscribers. We are grateful to the 400+ survey respondents from that list. The authors are grateful for feedback from Joy Behrens, Diana Epstein, David Evans, and Emmett Keeler in developing that policymaker survey, and for the assistance of Holly Blagrave in its implementation. The authors also thank those who met with them for telephone, in-person, or email interviews, including Juan Belt, Logan Brenzel, Annette Brown, Laura Chioda, Joost de Laat, Shanta Devarajan, Markus Goldstein, Penny Hawkins, Sarah Lane, Ariana Legovini, Ruth Levine, Gideon Lukens, Manny Jimenez, Temina Madon, Meghan Mahoney, David McKenzie, Jack Molyneaux, Owen Ozier, Jyotsna Puri, Dan Rosenbaum, Adam Ross, Justin Sandefur, Lyn Squire, Miguel Szekely, Caitlyn Tulloch, Edit Velenyi, Damian Walker, Howard White, and Keith Wood. The authors also wish to thank editors of the top seven journals that publish impact evaluations in international development who participated in the Journal Editor Survey— American Economic Journal: Applied Economics; Economic Development and Cultural Change; Journal of Development Economics; Journal of Development Effectiveness; World Development; Quarterly Journal of Economics; and World Bank Economic Review. Maria MacDicken assisted in implementing the survey of journal editors. Erik Bloom, Marie Gaarder, Richard Scobey, Mark Sundberg, and Nicholas York provided insights and encouragement throughout. Yunsun Li and Karol Acon Monge provided excellent research assistance. Yezena Yimer gave unflagging administrative and clerical support. This work was originally used as an unpublished manuscript developed under the auspices of the Independent Evaluation Group (IEG) with financial support of the government of Sweden, for which we are especially grateful. The findings, interpretations, and conclusions are the authors’ own and should not be attributed to the World Bank, its Executive Board of Directors, or any of its member countries. vi 1. Background This paper considers the evidence and incentives for producing impact evaluations that integrate cost 1 data capture and value for money analysis for the purposes of accurately describing the cost and efficiency of an intervention. In this analysis, Value for Money refers to methodologies that measure efficiency, including Cost-Effectiveness Analysis (CEA), Cost-Benefit Analysis (CBA), Cost-Utility Analysis (CUA), Social Return on Investment (SROI), rank correlation, and basic efficiency resource analyses. 2 For more than a decade, observers have increasingly looked to impact evaluation (IE) methods as a means to evaluate the attributable effects of real-world interventions implemented by development agencies and developing nations (see White and Bamberger, 2008, for example). More recently, however, the confluence of four trends offers unprecedented potential to dramatically increase the production and policy relevance of impact evaluation evidence. Figure 1.1 Impact Evaluations Published Per Year (1990–2015) Source: Sabet & Brown (2018) First, the number of IEs on issues important to international development is rising: Over the past 15 years, the number of published international development impact evaluations increased from no more than 50 studies per year before 2000 to between 400 – 500 studies 1For the balance of this study, in order to clarify the meaning of the word “cost,” we capitalize Cost when it is used to denote the output of an analytic process as in: conducting a Cost study or when referring to a Cost analysis or providing policymakers with Cost reporting. We use lowercase when it is used as an input into such an analytic exercise, as in cost data, cost elements, costs of activities, cost information. We also use lowercase when cost could be used interchangeably with “price tag” as in the cost of Cost-related research, or when it could be used as a verb as in to cost the financial impact of an intervention. The distinction can become somewhat nebulous, but it is made in an attempt to clarify concepts and emphasize Cost analysis. 2The VFM concept also explicitly considers equity. This project limited the scope of its VFM investigation to the efficiency methods described in this paragraph. 1 per year between 2013 and 2015, despite a plateau in production after 2012 3 (Sabet and Brown 2018; Cameron, Mishra, and Brown 2015; Savedoff 2013). Second, the internal validity—quality—of impact evaluations also has increased. There are more prospective evaluations of well-defined interventions, more field experiments, and more carefully designed quasi-experiments (IEG 2012). Third, impact evaluations are increasingly asking explicitly comparative questions on relative effectiveness by evaluating multiple treatment arms (Muralidharan and Sundararaman 2011). Finally, as the number of impact evaluations is rising, so too is the number of systematic reviews (White and Masset 2018), which rely on the existence of a range of primary studies in order to uncover generalizable findings. All four elements—rapid growth in the number of IEs, improvements in internal validity, an increase in IEs with multiple treatment arms allowing for comparability, and an expansion in the number of systematic reviews—create more entry points through which to integrate VFM analysis into project design and decision making. Yet as the stock of evidence grows, observers have noted that most published evaluations do not contain cost information needed for cost-effectiveness analysis (Dhaliwal et. al. 2013; McEwan 2012) or other analyses of a project’s Value for Money. Institutional production of Value for Money analyses has waned also. For example, the use of economic rates of return (ERR) at the World Bank and other multilateral development agencies has faltered and fallen out of favor. Whereas around 70 percent of the World Bank’s investment projects contained an ERR in the 1970s, about 30 percent did so in the early 2000s (IEG, 2010). International development scholars whose work integrates impact evaluations with Value for Money analysis have noted the underutilization of CBA, CEA and other efficiency analyses in the published impact evaluation literature (McEwan 2012; Dhaliwal et. al. 2013). However, no analysis to date has determined the percentage of published impact evaluations that incorporates efficiency analysis, and there has been little systematic investigation of the observed decline in VFM application. A 2008 review found that inconsistent use of language; lack of common measures; lack of quality data on social impacts, outcomes, outputs and program cost; lack of incentives for transparency; and the expense of cost measurement all contribute to low levels of Cost reporting in the social sector (Tuan 2008). Veteran CBA observers note that the low application of VFM analysis in impact evaluation studies and their often-dubious quality has occurred within the context of waning interest in CBA and Cost-efficiency methods more generally. This downward trend is mirrored in the quality of economic analyses in the World Bank’s project appraisal documents. In 2007 and 2008 just over half of the included economic analyses were of “acceptable” or “good 3 Data are from 3ie’s Impact Evaluation Repository. The repository includes published impact evaluations of development interventions carried out in low- and middle-income countries that use experimental or quasi-experimental estimation strategies with a credible counterfactual. 3ie developed the repository using systematic search of over 31 “academic databases in health, economics, public policy and the social sciences provided by platforms such as Ovid, EbscoHost and ProQuest, and libraries and websites from select research organisations and academic institutions.” More than 84,000 potential studies published between 1981 and 2015 were identified and screened, with more than 4,200 (from over 120 countries) selected for inclusion in the repository. 2 quality”. Identical analysis performed in the 1990s found that roughly 70 percent met the “acceptable” or “good” quality standard (IEG, 2010). Even so, there are recent examples of influential projects that sought to integrate impact evaluation results with cost-effectiveness analysis to collectively set policy priorities, as with the Disease Control Priorities Project (Jamison et al. 2006). Arguably, as impact evaluation methods continue to mature in sectors like Education, Governance, Agriculture, and Health and Nutrition, evaluators may consider how to re-institutionalize VFM practices. A major goal of IE studies is to provide scientific evidence on the policies and programs that do and do not work to improve development outcomes. However, if IEs do not include program or policy cost information, resource-constrained policymakers will have limited evidence to guide their selection of efficient programs and policies, or to consider the cost implications of scaling, replicating or reproducing programs and policies found to be effective. Cost-benefit analysis (CBA) and cost-effectiveness analysis (CEA) would be useful for (i) identifying whether social benefits exceed social costs of a single intervention; (ii) comparing worth of interventions with different (monetizable) outcomes; and (iii) comparing the relative worth of interventions that share common outcome(s). CEA and CBA help identify which intervention produces a given amount of outcomes for the least cost. Including VFM analysis may help policymakers understand and use evidence from the growing number of IE field studies in their program decisions by identifying which interventions produce the most outcome for a given cost (White 2014; Dhaliwal et al. 2013; McEwan 2012). Such evidence can provide insight that is “counter to common sense, popular appeal, and traditional ideas” (Levin 2001). Including VFM directly in IEs yields at least two advantages over producing them separately. First, collecting cost data at the time of the intervention and data collection on the IE outcomes is likely to be less onerous in terms of expense and time than reconstructing program costs ex post. Second, evaluators, policymakers, and other stakeholders can use the evidence to make more efficient decisions when high-quality, rigorous estimates of impact also include an estimate of their Cost. 3 2. Evaluation Questions and Strategy Despite the potential gains from including VFM analysis in impact evaluations, such integration seems to be the exception rather than the rule. To understand why, we used mixed methods to answer three evaluation questions: 1. How frequently is Value for Money analysis incorporated into published impact evaluations? 2. What are the existing incentives and barriers faced by producers and users, both as individuals and institutions, for VFM incorporation? 3. What are some options to overcome the challenges for the integration of Value for Money analysis into impact evaluations? Approach We investigated these questions from the perspective of those who produce impact evaluation estimates and reports (IE producers), and those who use or who could use those estimates and reports in decision-making for international development policy work (IE consumers). This report defines IE producers as those in the production line of impact evaluations: They pay for, fund, carry out, or communicate evaluations and findings. IE consumers include those who use evaluation results to change behavior, make funding decisions, select programs, or legitimize predetermined behavior. IE consumers include impact evaluation beneficiaries as well as policymakers and decisionmakers. Box 2.1. Defining Value for Money The definition of Value for Money is contested. Our working definition relates specifically to the use of VFM in impact evaluations. In practice, impact evaluations most often use efficiency analyses such as cost-benefit or cost-effectiveness analysis. We adapted a definition from that used by the Department for International Development (DFID) after an extensive review of VFM field methods. DFID defines VFM to include the four “E’s”: measurement of a program’s economy, efficiency, effectiveness, and equity, though in this paper most of the discussion centers around using the quantified attributable benefits identified in impact evaluations as the numerator to establish efficiency with costs as the denominator. The ratio of Impact to Cost yields an estimate of efficiency, recognizing that there are many ways to arrive at estimates of each of those broad constructs. There are times, of course, when institutions may act as both producers and consumers of both impact evaluations and Cost analyses. For example, a large agency (such as USAID) or multilateral bank (such as the World Bank) could potentially use the evidence from evaluations of vaccine programs that it funded to determine the most effective way of distributing vaccines in resource constrained countries. We worked through a simple framework of the process by which evaluative evidence is created, disseminated, and applied. That is, we solicited views from impact evaluation 4 producers, journal editors, and policymakers in addition to estimating the frequency with which efficiency analysis currently appears in published impact evaluations for the first time. This paper combines results from exploratory research that queried individual impact evaluators, VFM experts, policymakers, as well as representatives of international development multilateral, bilateral, philanthropic, and research institutions and academic peer-reviewed journals using semi-structured and unstructured interviews, surveys, a small randomized control trial, and portfolio analyses of World Bank and worldwide impact evaluations. Data Data for the evaluation were procured through five data collection activities: NEAR CENSUS OF WORLD BANK IMPACT EVALUATIONS – A data set of 30 World Bank impact evaluations with VFM analyses was constructed from a sample of 168 World Bank impact evaluations produced between 2000 and June 2010. 4 Of the 168 impact evaluations, 40 were found to contain “a simple comparison of costs with benefits, cost-benefits analysis, economic rate of return, or cost-effectiveness analysis across treatment types or programs” (IEG 2012). Three were dropped because they were falsely tagged as including efficiency analysis when they did not, and two because they did not report the information needed to make a determination. Five IE publications could not be found. The remaining 30 World Bank IEs with VFM analysis were reviewed in detail. SAMPLE OF GLOBAL IMPACT EVALUATIONS FROM THE DATABASE OF THE INTERNATIONAL INITIATIVE FOR IMPACT EVALUATION – We obtained a random sample of 236 impact evaluations from the 3ie’s Impact Evaluation Repository (IER), representing 10 percent of IER studies that were published between 1986 and 2012. The IER records all published impact evaluations of the effectiveness of development interventions that were identified through systematic search of over 30 databases, search engines and websites. Included interventions must have been carried out in low- or middle-income countries using recognized experimental or quasi-experimental estimation strategies. SEMI-STRUCTURED AND UNSTRUCTURED INTERVIEWS WITH IE PRODUCERS AND CONSUMERS, INDIVIDUAL RESEARCHERS, INSTITUTIONAL REPRESENTATIVES, AND POLICY MAKERS – Thirty-three face-to-face or telephone interviews were conducted with a purposive sample of impact evaluators, impact evaluation funders, policymakers, and individuals specializing in cost-benefit and cost-effectiveness methodologies to learn about the range of institutional barriers and incentives of integrating Value for Money analysis into impact evaluations. SURVEY OF POLICYMAKERS—In partnership with the World Bank’s Strategic Impact Evaluation Fund (SIEF), we designed a survey that was emailed to 3,623 individuals on a contact list maintained by SIEF of policy-minded individuals. These individuals were asked 4 A list of the World Bank impact evaluations that were included in the sample is available from the authors. 5 EVALUATION QUESTIONS AND STRATEGY to respond to five questions related to Value for Money to understand their willingness to pay for Impact, Cost, and VFM studies. The survey had 497 respondents who answered at least one question; the analytic sample is composed of 407 individuals who answered at least one of the “willingness to pay” questions. The survey asked respondents about their role within their organizations and then mapped those roles onto three functions within the policy-making process: Advisors (composed of researchers, academics, evaluators and consultants), Decisionmakers (high level executives within government or international or local NGS), and those who Execute those decisions (government functionaries, project managers); a category for “Other” (e.g. teachers, students, or librarians). Of the analytic sample, two-thirds were male. Respondents were well educated. Nearly one- quarter had a Ph.D., almost two-thirds had a master’s degree, 1 in 10 had a bachelor’s degree, and less than 1 percent had anything below a bachelor’s degree. They also tended to be well established in their careers: 60 percent were between 35 and 55 years old; 17 percent were older than that; and 23 percent were between 25 and 34. The income levels and regions of the country they professionally work on are shown in figure 2.1. So, too are the types of institutions where they work and their roles within that institution. 6 Figure 2.1 Composition of Policymaker sample Economic Level of Country of Primary Region of Country of Primary Professional Focus Professional Focus Policymakers’ Role Institution Type Where Employed Source: IEG, SIEF policy-maker survey fielded in December 2015. Note: LMIC: lower-middle income country; UMIC: upper-middle income country; LIC: lower-income country; HIC: high- income country. Specifics on World Bank country classifications can be found at http://data.worldbank.org/about/country- and-lending-groups Country income levels are rather evenly represented, with more than 4 of 5 respondents working in or on developing countries. Respondents are most likely to work in a government post or in academia. Half of the respondent population identifies as having an advisory role, more than 40 percent report a policy role in making or executing decisions. Overall the respondents provide a well-rounded sample of the many stages and places of policymaking. The 14 percent response rate denotes a self-selected group of those interested in responding to a request for a “short 7-10 minute survey to help the World Bank learn how people like you think about and use the evaluations that SIEF and other groups at the World Bank 7 EVALUATION QUESTIONS AND STRATEGY produce.” The SIEF list itself is a self-selected group of individuals involved in the policy- making process that have self-identified as having an interest in impact evaluations. JOURNAL SUBMISSION REQUIREMENTS AND SURVEY OF JOURNAL EDITORS—We collected the stated submission requirements from the websites of seven journals: The American Economic Journal: Applied Economics; Economic Development and Cultural Change; The Journal of Development Economics; The Journal of Development Effectiveness; World Development; The Quarterly Journal of Economics; and The World Bank Economic Review, and responses to an email survey of the journal’s editorial staff. The stated submission requirements describe each journal’s expected standards of rigor, topics of interest to its readership, and whether the journal accepts empirical or theoretical subjects. None revealed any preference for a specific methodological approach, including approaches such as CBA, CEA, or other VFM methods. Given the absence of formal guidance on VFM methods, we developed a five-question survey to explore journal editor’s perspectives and opinions and the journals’ de facto policies and practices. 1. Do the journal’s stated editorial policies or submission requirements discuss cost- effectiveness, cost-benefit, and Value for Money analysis? 2. What formal and informal editorial practices govern the inclusion of cost-effectiveness, cost-benefit, and Value for Money analysis in published impact evaluations? And, what are they? 3. Do submissions with cost-effectiveness, cost-benefit, and Value for Money analysis receive special consideration? 4. Why is VFM so infrequently incorporated into impact evaluations? 5. What opinions do journal editors have regarding whether published impact evaluations should or should not include VFM analyses? The survey was sent to 44 editors by email in December 2015. Sixteen journal editors responded to the survey, a 36 percent response rate representing six of the seven journals contacted. Methods The team employed a mix of qualitative and quantitative methods to answer the research questions. We applied: • Qualitative methods to separately analyze responses from the 33 structured and semi- structured interviews; the 16 responses to the journal editor survey; and the set of 30 World Bank impact evaluations that contained a value for money analysis. • Statistical methods to separately analyze the sample of 236 impact evaluations drawn from 3ie’s repository; the 30 World Bank impact evaluations that contained a value for money analysis; and the 497 responses to the policymaker survey. 8 ASSESSMENT OF WORLD BANK IMPACT EVALUATIONS CONTAINING VFM ANALYSES We developed a framework to analyze the frequency, type and transparency of VFM analyses found in the sample of 168 IEs produced by the World Bank. Information from the subset of 30 IE’s found to contain any kind of VFM IE’s was tabulated in a Microsoft Excel spreadsheet for analysis by Sector Board, business line, method, and transparency of reporting. The analysis generated three outcome variables corresponding to the type of VFM analysis observed in the studies: CBA, CEA, or general efficiency discussion (includes some VFM, but no indicators meeting the CBA or CEA criteria, and no VFM (a small number of tagged studies were found to contain no VFM analysis). An impact evaluation was classified as having a cost-benefit analysis (CBA) if it included a comparison of estimates of Costs and Benefits (with data of costs and benefits) and one or several of the following indicators: Benefit to Cost Ratio, Cost to Benefit Ratio, Economic Rate of Return/Internal Rate of Return, Financial Internal Rate of Return, Present Value, or Net Present Value. The studies classified as CBA vary in the extent of their reporting from quick, one paragraph “back of the envelope” calculations, to more elaborate exercises. For example, we classified as a CBA the analysis in “Evaluating Preschool Programs when Length of Exposure to the Program Varies: A Nonparametric Approach” implemented in Bolivia. This IE includes benefit-cost ratios in the range of 2.28 and 3.66 resulting from sensitivity analysis of the program’s benefits under different assumptions. We classified “Does Management Matter? Evidence from India” as a “back of the envelope” CBA. Firms included in the study did not want to report internal accounts information. The evaluation’s authors reported a 130 percent rate of return in one year based on an analysis of the firm’s main cost (cost of the consultancy firm’s services) and an estimate of other costs. In a third example, we classified as CBA “Contracting-Out Dialysis in Romania: What Was the Impact?” which compares the total program Cost with the total savings of the benefitting public entity. An impact evaluation was classified as having a cost-effectiveness analysis when it compared two or more alternative programs measuring the same outcome. Cost comparisons to programs with different outcomes were excluded from CEA classification. We classified as CEA two Cambodian experiments that presented the cost per unit increase in student promotion, dropout, and achievement in literacy and numeracy to the same impacts in a control group. We classified less direct comparisons as CEA if a comparison was present. For example, one CEA compared the marginal cost per child of an intervention that provided information to parents in Pakistan to the cost of programs with the same outcome in low- income countries. The general efficiency analysis (GEA) designation was applied when the IE included an efficiency discussion that did not qualify as either CBA or CEA or as another VFM method. 5 We classified IEs that reported average intervention costs, cost per beneficiary, the share of 5Cost utility analysis, social return on investment, basic efficiency analysis, multi-criteria appraisal, rank correlation of cost versus impact, cost minimization analysis. 9 EVALUATION QUESTIONS AND STRATEGY administrative cost, and other non-comparative discussions of Costs and Benefits as GEA. For example, we classified as GEA “Seaweed Farming in Indonesia” based, in part, on the following statement: “SEAplant’s cost has been high relative to benefits derived to-date. IFC committed roughly $1.9 million to the SEAplant program through 30 June 2006, and this investment has not yet yielded significant tangible returns in terms of either increased earnings for farmers or the introduction of value-added processing.” We documented the cost indicators and ingredients by type, e.g., opportunity costs, discount rates, timeframes, inflation, and exchange rates between local currency and the currency used in the analysis, for each IE reviewed in detail. Transparency was evaluated along six dimensions: 1) methods clarity (i.e. statement of a specific, identifiable method), 2) cost reporting (i.e. listing of ingredients and their value), 3) analysis (i.e. level of aggregation, unit cost per impact, unit cost per input, treatment of opportunity cost) , 4) Cost adjustment reporting (i.e. reporting discount rates where appropriate, inflation adjustments, and currency exchange rates), 5) rationale (i.e. explicit rationale for a VFM analysis), and 6) study limitations, (i.e. reporting partial analyses such as the exclusion of costs related to some benefits derived from the program, or the use of approximate costs, data quality issues, missing data, or focusing on only a short period for accounting benefits). FREQUENCY OF VFM IN GLOBAL IMPACT EVALUATIONS To estimate the frequency of VFM analysis in global impact evaluations, we first developed a set of keywords to represent each of the VFM methods based on extensive review of the VFM literature. Keyword sets were developed to represent cost-effectiveness analysis, cost- benefit analysis, cost-utility analysis, financial analysis, social return on investment, basic efficiency resource analysis, multi-criteria appraisal, and rank correlation of cost vs impact. A text analytics algorithm was conducted over a full-text search of each of the 236 impact evaluation studies to generate tables of keyword frequencies. This method also was applied to a “benchmark sample” of 65 studies drawn from a database of 168 World Bank Group impact evaluations studies published between 2002 and 2011 (World Bank Group impact evaluation database). The benchmark sample was formed by combining IEs from two groups of World Bank impact evaluations. The first group contains 35 IEs identified as including some kind of (unverified) efficiency analysis. The second set of studies was pulled randomly from the remaining (untagged) 131 studies contained in the near census of World Bank impact evaluations described above. The outcome variable for the untagged impact evaluations is “No VFM.” Since IEs in the first and second group do not have the same probability of being selected into the benchmark sample, we applied a sampling weight. The probability of selection into the benchmark sample is 1 for the 35 tagged studies. The probability of selection for the untagged studies is 30/131 = 0.229 for the 30 untagged studies. Hence the sampling weight, 10 calculated by taking the inverse of the selection probability, is different for tagged (weight = 1.057) and untagged studies (weight = 4.37). To estimate the (unknown) proportion of impact evaluations in the 3ie sample that contains any kind of efficiency analysis, we first ran an ordered logistic regression on the benchmark sample with the sampling weights applied, where the categorical outcome variable is defined as: 3 = �2 1 The explanatory variables originate from keyword frequencies indicative of presence of (formal) VFM analysis. We grouped the keywords into three mutually exclusive categories: (i) return on investment/cost-benefit analysis, (ii) cost-effectiveness, and (iii) decision analysis of other kinds. Next, we constructed three explanatory variables 6 as the total frequency of the words mapped to each category. We predicted the outcome for the 3ie IEs using the estimated parameters from the benchmark regression and explanatory variables constructed in the same way for 3ie studies. This requires the assumption that the relationship between the VFM outcomes and key word frequencies in the World Bank’s IE portfolio also holds in the 3ie sample. The probability of having any kind of VFM analysis in the 3ie sample is 1-Pr(VFM=1). PRODUCER AND CONSUMER PERSPECTIVES ON VFM IN IE The team analyzed responses from eight initial unstructured interviews to first identify key topics related to VFM in IE that were used to inform the design of the semi-structured interview protocol. The semi-structured interview protocol covered: (i) the interviewee’s perspective on the importance of VFM in IE, (ii) their institution’s formal or informal rules governing the production and use of VFM in IEs, (iii) sources of demand for VFM in IE, and (iv) actions that the interviewee thought would increase the production or consumption of VFM in IEs. All interview responses were recorded using notes taken during the interview and transcribed into electronic format for analysis. A single analyst reviewed and coded responses according to emergent themes in the data. Analysis explored the multiple reasons respondents gave in response to a question by enumerating unique responses and tabulating response frequencies. Because the statistical representativeness of the key informant sample is unknown, we do not engage in statistical analysis for this sample. 6 Alternative constructions of explanatory variables include: (i) using total frequencies of all words/terms as the single regressor: (ii) grouping the words/terms into six groups, giving rise to six instead of three explanatory variables; and (iii) getting regressors from principal component analysis. Regression and prediction results are qualitatively similar. The three variable model has the advantage of being parsimonious while also distinguishing between formal VFM methods (e.g. CEA and CBA) and informal VFM discussions. 11 EVALUATION QUESTIONS AND STRATEGY Rather, this purposive sample was constructed to parsimoniously cover the range of roles of those engaged in the production chain of IEs. Ten of the selected interviewees worked at multilateral agencies, seven worked at international NGOs, six worked at bilateral agencies, four were from large foundations active in international development, and two apiece worked in a research institute or the Executive Office of the President. Finally, IEG interviewed one individual currently working in an academic institution. The sample included eight individuals whose primary role was to design and develop analytical tools, guidelines, and templates for economic efficiency analysis such as cost- benefit, cost-effectiveness, impact modeling, and Costing studies. Ten interviewees were primarily involved in designing and carrying out impact evaluations at a research institute, a multilateral agency or a non-governmental organization. Ten interviewees primarily served in the capacity of a funder. Their decision-making often included determining the basis for awarding grants to impact evaluation teams, monitoring progress on the evaluations, reviewing impact evaluation results, and disseminating findings. Five interviewees function primarily as policymakers. Two policymakers provide training, assistance, review, and support to agencies in fulfilling government-wide policies related to efficiency analysis and evidence standards. Two worked in large, highly influential research institutes or NGOs that help to define international standards of evidence for impact evaluations in international development. Many of the individuals interviewed had a secondary role of influencing or setting their own institution’s policies. For example, all the funders interviewed also consume and use the results of the impact evaluations and Cost analyses to inform grant-making decisions and institutional policies. Likewise, many impact evaluators use evaluation results to recommend policies and programs, or to identify areas for further and future research. POLICYMAKER’S WILLINGNESS TO PAY FOR COST AND IMPACT INFORMATION Our online survey of more than 400 individuals involved in the policymaking process in developing nations presented a hypothetical scenario in which an unspecified education program had generated test score improvements of 0–20 percent in other countries. Respondents were told that they had an opportunity to implement the program for up to 10,000 students in their country, but that they could also choose to do an impact evaluation to learn of the program’s true effects in their setting. Respondents were asked how many of the 10,000 students they would be willing to have not receive the program in order to be able to pay for the impact evaluation. A similar question was then asked about the number of beneficiaries they would forgo in order to do a Costing study. Out of concern for framing effects, the order of the first and second of these questions—on impacts and Costs alone—was randomized. Simple t-tests revealed that order did not have an effect on responses for willingness to pay for either Effectiveness or Cost studies. 12 A third question asked about respondents’ willingness to pay (WTP)—again in terms of excluded students—to do a study that would tell them efficiency by giving them both the Costs and the benefits. Here we did find evidence of framing effects. Those who see the impact question first are willing to pay 629 (~26 percent) more units for the combination of benefits and Costs as a VFM analysis than those who saw Cost estimates first. All three questions required respondents to enter the number of excluded student beneficiaries as an integer between 0 and 10,000. JOURNAL EDITORS’ PERSPECTIVE ON VFM IN IE The journal editors’ responses to the survey questions were tabulated and enumerated in an electronic spreadsheet. The responses were summarized in tables. Sixteen of 44 journal editors of the seven journals that publish the most impact evaluations in international development responded to the survey, a 36 percent response rate. Qualitative and descriptive analyses of the journal editors’ responses are provided in the findings section. 13 3. Findings Each of the three evaluation questions posed by the report is addressed by triangulating analysis obtained across data collection activities. QUESTION 1: HOW FREQUENTLY IS VALUE FOR MONEY ANALYSIS INCORPORATED INTO PUBLISHED IMPACT EVALUATIONS? We estimate that 18.9 percent of the World Bank’s impact evaluations include any kind of VFM analysis, while the predicted proportion in the 3ie data set is 14.1 percent. Our prediction indicates that the 3ie sample has a somewhat lower, yet not statistically different, proportion of VFM studies compared to the sample of World Bank IEs. The estimated percentage of IEs with any kind of VFM analysis in the 3ie sample has not changed much over time. The estimated percentage of IEs that include VFM was: • 16.8 percent of IEs published prior to 2004; • 11 percent of IEs published between 2005 and 2008; and • 14.5 percent of IEs published between 2009 and 2012. A majority of the World Bank’s IEs with efficiency analysis —around 80 percent— conducted a CBA or CEA, and the remaining conducted a general efficiency analysis. The quality of cost ingredients reporting was mixed in the sample of 30 World Bank IEs. About 13 percent of the CEAs and CBAs listed only the main cost components, whereas just over half listed several cost ingredients; and the remaining third simply reported a Cost estimate without detailing much of what went into it. About 40 percent of World Bank IEs classified as category II (those with any kind of VFM) reported the cost of a program in relation to the number of beneficiaries—a common metric used in VFM analyses. Fewer than 20 percent of CEA and CBA studies reported on unit cost per impact. These percentages are very low, considering that they are included in impact evaluations where at least one measure of impact is supposed to be very available to be used in these comparisons. We find that just 25 percent of the CEAs and CBAs reported opportunity costs. However, because policy makers craft decisions based on real budgets and financial costs, excluding opportunity costs may be theoretically imperfect but practically correct. The transparency of reporting is very low among the 30 World Bank IEs that included any kind of VFM analysis. Descriptions of the Costing data, methods and assumptions are often vague or incomplete. Just 13 percent reported unit costs of inputs, and only 17 percent reported unit costs per impact. Only 8 percent of IE projects conducted over a year reported discount rates and only 21 percent reported adjusting for inflation. There was also a lack of consistency in reporting discount rates, currency conversions, and inflation adjustments. For example, 58 percent of the analyses did not report the exchange rate used to convert from local to the currency of analysis. In addition, very few studies—around 13 percent—provided 14 an explicit rationale for including Cost Efficiency analysis and just under half explicitly reported the limitations of the Costing exercise. Our analysis suggests the results of available Cost Efficiency analyses from WB impact evaluations should be used carefully. Such analyses often quantify a portion of the program’s benefits, and frequently suffer from missing information and missing cost data. In sum, our analysis reveals the need to elevate reporting transparency and the need to develop standards of reporting so that the available evidence can accumulate and be used to improve the use of resources in development cooperation. In addition, our analysis suggests the need to examine the root causes underscoring the wide variation in cost-efficiency reporting across evaluators. It is difficult to know why VFM analysis is so infrequent in the absence of further inputs from IE producers and consumers. QUESTION 2: WHAT ARE THE EXISTING INCENTIVES AND BARRIERS FACED BY PRODUCERS AND USERS, BOTH AS INDIVIDUALS AND INSTITUTIONS, FOR VFM INCORPORATION? The descriptive analytic findings on the state of VFM in IE are symptoms of underlying problems faced in their production, dissemination, and use. We found three primary reasons for the underproduction of VFM in IE—organized around three principal actors: impact evaluators, policymakers, and institutions. Finding 1: The expected payoffs to producing efficiency analyses is dulled by the (inaccurate) perception that policymaker demand for VFM is low when it is more accurately described as uncertain. As shown in Table 3.1, our sample of IE producers most frequently cite a lack of policymaker demand as the reason for little VFM observed in IEs. IE-producing interviewees supported this argument with three main points: first, interviewees questioned the cost data’s relevance to a policymaker’s individual contextual considerations. Second, many interviewees felt that the political calculus, rather than a project’s investment case, dominates a policymaker’s decision-making. This view of political economy dynamics undercuts incentives to generate efficiency analysis and often detrimentally influences the analysis itself. Third, IE and VFM producers felt uncertain about what policymakers need and want to know and when they need to know it; furthermore, producers felt unsure about how to go about gaining that insight. Table 3.1 Why is VFM so Infrequently Incorporated into Impact Evaluations? Main reasons IE IE CE/CBA Policy- Sum Evaluator Funder Methodologist maker Policymakers do not demand VFM analysis 6 3 3 3 15 There is little incentive for academic impact 4 3 3 3 14 evaluators to produce VFM Impact evaluators lack consensus on how to 6 4 10 apply CEA and CBA methods; there is no agreement on the "right" approach 15 Carrying out the cost data-collection and 3 5 8 analysis is costly to the projects Obtaining cost data can be challenging 2 4 1 7 Measuring the impact or benefit is the 2 3 1 6 evaluator's first-order concern Large institutions implement VFM in IEs 3 2 5 inconsistently across their programs Institutional conflicts and interests stand in the 2 1 1 4 way of applying VFM to institutional decision- making The skills of VFM and IE experts differ and 1 2 1 4 there is not a lot of integration across projects or within institutions The importance of VFM will increase as more 2 2 large-scale randomized evaluations are carried out Source: Analysis of unstructured and semi-structured interview responses to the question: “Thinking about the field, generally, why do you think VFM is so infrequently incorporated into impact evaluations?” Even so, producers suggested that the political calculus differed greatly depending on the amount and on the quality of the evidence provided. Validating this point is the fact that impact evaluations have flourished at current levels of policymaker interest, and that quantitative analysis of the policymaker survey data 7 showed that their willingness to pay for causal information on benefits is not much higher than their willingness to pay for Cost information. Analysis of the policymaker survey of 407 respondents in the study sample revealed that respondents are willing to pay about 10 percent more for an impact evaluation study as they are for a Cost study, but the observed difference is not statistically significant. Likewise, the average willingness to pay for VFM is also higher than willingness to pay to know a projects effects alone, but again this difference is not statistically significant. However, respondents indicated they are willing to pay a 13 percent premium for a study that presents VFM (Costs and Benefits) than for a Cost study alone – a statistically significant difference at the p<0.01 threshold. Together, these results imply that although decisionmakers are most interested in what works, there is also clear demand to know bang for buck. 7 Based on a 14 percent response rate of nearly 4,000 individuals on the mailing list of the World Bank’s Strategic Impact Evaluation Fund. These individuals self-selected for the mailing list based on their interest in impact evaluations—which underscores the parity found between willingness to pay for effectiveness information and Cost information. Because a sampling frame of those involved in the policy-making process does not exist, the external validity of these results, as with all findings in this evaluation, are indicative even if not conclusive. 16 Table 3.2: Willingness to Pay for Impact, Cost, and VFM Information Paired t tests: WTP obs Mean1 Mean2 dif St_Err t_value p_value IE - Cost 353 2685.2 2428.9 256.2 139.4 1.85 0.067 IE - VFM 344 2674.8 2755.4 -80.6 109.3 -0.75 0.461 Cost - VFM 345 2434.0 2757.2 -323.1 123.9 -2.6 0.009 This exercise abstracts from the true cost of conducting any of these three kinds of study and sets aside the fact that Cost analyses are generally an order of magnitude (or more) cheaper than an Impact Evaluation. 8 Those involved in the policy process indicate that a study that combines impact and Value for Money should be less costly than the sum of doing those two activities individually. While we find no evidence of survey or framing effects on WTP for Cost or Effectiveness studies, we do find that respondents who were primed by the IE question are willing to pay more for a VFM study than are those who randomly received the question on willingness to pay for Cost first. Those who see Impact first are willing to pay 629 more units (~26 percent) more for VFM than those who saw Cost first. 9 Respondents indicated that they valued the combination of Cost and benefit analysis more than they valued either Cost or benefits individually. These results hold for nearly every subgroup of functionary in the policy-making process in the survey—advisors, deciders, implementers. These findings undermine the notion often articulated in the interviews that a lack of policymaker demand is responsible for the lack of VFM analysis. Finding 2: Researchers perceive significant cost but little incentive to produce VFM in impact evaluations. The second most frequent reason for so little VFM is that researchers and evaluators who produce IEs are offered little incentive to integrate Cost analysis. The costs of VFM include uncertainties that affect the time and effort required to carry out rigorous evaluation of policy and program Cost. For example, the lack of consensus on how to apply CEA and CBA methods; the lack of agreement on the "right" approach; and the challenges of obtaining Cost data in the first place and effort to analyze the data all disincentivize VFM production. 8 Our own experience and discussions with Cost analysis experts and IE producers put the price of a Cost study at $20,000- $40,000, while the price tag for an impact evaluation typically runs from $400,000-$2.5M. 9 Perhaps priming with the higher cost IE item first leads to a proclivity to spend more for combined products. It would be useful to repeat this experiment to understand with greater certainty whether the observed framing effect is merely a statistical artifact (e.g. a Type I error). 17 One of the driving incentives for IE producers—whether in academia or (to a lesser degree) in international organizations—is the opportunity to publish in professional journals. From a researcher’s perspective, there is little incentive to take time out of the preferred activity— measuring benefits—to collect high quality cost data and work through tricky cost estimates and assumptions to achieve a VFM analysis with the same level of rigor as the IE if it is not rewarded. Evidence from our journal editor survey indicates that demand for VFM from top journals is mixed at best. Perhaps most illustrative of the low esteem held for VFM by journals is the fact that while nearly all of the top economics and economic development journals that we investigated had statements on the standards of estimating benefits (often termed effects, outcomes, results, and so on), none addressed quality control criteria for Cost estimates, much less efficiency. Additionally, 13 of 14 journal editors surveyed reported neither formal nor informal editorial practices governing inclusion of (CEA, CBA, or other VFM analysis) in published impact evaluations. Twelve of the journal editors did not give special consideration to VFM analysis—an IE with VFM is generally no more likely to be published in the top academic journals than is an IE without VFM analysis. Journal editors mainly thought that Value for Money is so infrequently incorporated into impact evaluations because so few studies can obtain high quality cost data. “While such calculations are helpful in ‘ball parking’ the magnitude of an intervention, and whether it is ‘worth it,’ I suspect most researchers don't believe that the estimated numbers are of sufficient precision or reliability to require people to put dollar figures on interventions.” Several journal editors noted methodological issues and a lack of standardization could cause the analysis to be rejected: “Because careful valuation of Costs and benefits involves difficult decisions on certain parameters, such as shadow prices, discount rates and welfare weights, which may be arguable, and thus expose authors to additional rejection risks from referees.” Two editors said they will ask authors to remove VFM analyses. Two others noted that CBA, if done carefully, could stand on its own and deserved to—rather than be crowded in with impact estimates. Still, about half of the editors indicated that VFM should be included in IEs. Collectively the Journal Editors provided three stipulations for including VFM analysis: each case would need to be considered on its own merits; the analysis would have to be done well and convincingly; and the journal’s quality standards would have to be met. Editors indicated that VFM analysis with a high level of rigor could be reported independently, although not necessarily in a top journal—the currency by which many IE producers exercise influence and secure tenure. Indeed, although journal editors raised the notion that CBA, if done properly, can potentially stand on its own, there are few if any examples of this in those editors’ own journals. Scrutiny for VFM quality in journals is de facto left to peer reviewers, who are almost certain to be inconsistent in the standards that they apply. That inconsistency adds another layer of uncertainty—and so risk—to a prospective author. 18 Even if coordination of methodological standards was resolved, challenges to external validity remain. As pointed out by Evans and Popova (2016), program Costs differ greatly across contexts even for the same type of intervention. These Cost differences naturally result in large variations in efficiency estimates. Uncertainty in effect estimates, recall bias in expenditures, and scale can substantially influence cost-effectiveness estimates (Evans and Popova 2016). Even so, transparency in Cost reporting and assumptions can improve cross- context comparisons and usefully quantify the nature of the variation in implementation costs. Better still, multi-arm evaluations within the same context can largely evade such thorny issues. Impact evaluators in general seem hungry for templates, checklists and third parties to coordinate and guide their efforts. Such tools have the potential to effectively lower the effort required to build and justify new models and assuage the coordination concerns of evaluators. However, there is a lack of harmonization between the institutions that commonly utilize such tools, leaving still unanswered the coordination failure of having consistent and comparable methods. Even so, if developed with a core consensus, a revised toolkit has the potential to crystalize accepted standards. To address the concerns about generalizability and sensitivity to specifications, these tools will likely need to be focused on a particular sector (or even intervention or outcome); allow users to explicitly model uncertainty; choose between reasonable methodological alternatives; and to transparently adjust the parameter values of the model’s assumptions to more closely align with the particulars of a specified target context and scale. 10 Until such models are available, systematic reviews and league tables such as those done at JPAL 11 and in the innovative work in the state of Washington, USA 12 can provide policymakers with useful information on relative cost effectiveness. Beyond methodological considerations, data access and quality problems were also cited as significant barriers. Evaluators frequently cite difficulty in extracting reliable expense information—even from administrative data—to reflect the financial cost considerations most relevant to policymakers. Though hardly limited to the World Bank, low levels of baseline financial data collection at the World Bank impedes the ability to conduct CBA at either the start or the end of projects (IEG 2010). Economic costs are even more challenging. Data on opportunity costs are rare, notwithstanding their centrality to economic analysis. Even more challenging than these considerations of basic cost data are the important issues of apportioning Cost (and often benefit) components when an evaluated project is part of a larger multi-arm effort with multiple outcomes and benefits. These data and methodological challenges reflect an apparent lack of up-front planning to do VFM types of analysis. While the measurement of benefits is now approached in impact evaluations with careful planning well before implementation begins, it is unusual for Cost 10The second paper in the VFM Component 2 series is a case study that pioneered a tool to address many of these concerns while generating locally relevant, user-informed valuations of the environmental effects of World Bank projects across the globe. 11 http://www.povertyactionlab.org/policy-lessons/education/student-participation. 12 http://www.wsipp.wa.gov/BenefitCost. 19 analysis to receive the same forethought. Impact evaluators frequently face a lack of cost data and lack of standards for performing analyses. Moreover, some economists—the discipline that churns out most of the IE work in international development—relate that Cost analysis does not “feel like” microeconomics, and together with a lack of training in Cost analysis methods and the challenge of getting it right even if the data are available, make the cost of doing the analysis greater than the benefit—from the researcher’s perspective (Evans, 2016). Together, these factors conspire to expose researchers to risk of costly critique from peer reviewers—convincing many to avoid the exercise altogether. When it is performed, VFM analysis often seems to come as an afterthought, with cost data inquiries sometimes made well after a project has been closed. This is reflected in the general lack of detail on VFM analysis in impact evaluations, including those done at the World Bank and on Bank projects where cost data should be easier to procure, and Cost analysis should be more valued than analysis done as academic exercises. For example, of the less than 19 percent of World Bank impact evaluations that include any type of Value for Money analysis, only 13 percent—or less than 2.5 percent of World Bank impact evaluations—reported unit costs of inputs. When cost data are not explicitly available, they are estimated—sometimes borrowing from other studies— but more frequently without meaningful details of how Costs were estimated. Such ex post data collection is prone to recall bias and underestimating expenditures (Evans and Popova 2016). About two-fifths of the World Bank’s impact evaluations do not indicate which cost elements are included in the analysis. Apart from the very real challenges of using accurate cost data, it is extremely rare for World Bank IE-VFM studies to indicate parameters used for basic assumptions in Cost analysis: discount rates, exchange rates, or time horizons, for example. In general, there seems to be a clear disconnect between the careful planning of the estimation of benefits through impact evaluations and the suboptimal quality in the reporting of Cost information. This may explain the hesitancy of journal editors to include it (or institutional funders of IEs require it) as a matter of course. The combination of these weak incentives and relatively high costs likely accounts for much of the reason why the Value for Money of development interventions is calculated using impact evaluation estimates so infrequently. Finding 3: At the institutional level, political considerations and significant heterogeneity in VFM approaches and methods are constraints to greater inclusion of VFM in impact evaluations. Individual evaluators and institutional representatives largely agreed that most of the demand for VFM is internal to institutions that both produce and use impact evaluations. However, even for those institutions, there are real challenges to embedding Value for Money analysis within impact evaluations. Political interests and inconsistency in the application of guidelines and methods are clear friction points. Veteran CEA and CBA methodologists pointed to institutional interests that work against the strict application of cost-effectiveness thresholds to internal decision-making about programs. While the World Bank has largely diluted Value for Money considerations from 20 its funding decisions (IEG 2010), institutions with clear guidance on integrating Value for Money as a key aspect of their decision process—such as the Millennium Challenge Corporation (MCC)— face difficult internal discussions on whether to fund (or continue funding) a project when an economic rate of return does not meet the institution’s threshold. All institutions interviewed that had a mandate of doing Value for Money assessments reported that VFM analysts often face pressure to find positive results or are at risk of being ignored in decision-making when they do not. Politicized decision-making tends to yield politicized CBA. Many institutions struggle with how to publicize ex post VFM results based on impact evaluations that do not yield the hoped-for results. Variation between and within institutions of how Value for Money methods are applied contributes to the coordination failure undermining production of VFM in impact evaluation. The challenges of integrating the work of VFM and IE methodologies typically arise in two forms—consistent application of guidelines across groups within an organization and fostering active lines of communication between IE and VFM specialists—across three classes of producer-user institutions: large development institutions (such as multilateral development banks), bilateral and executive agencies (such as the Department for International Development [DFID], U.S. Agency for International Development [USAID], and MCC), and nongovernmental organization (NGO) impact evaluator/funders (such as the Gates or Hewlett Foundations, JPAL, or 3ie). Larger development institutions tend to struggle more with the challenges of consistent guidelines and active communication between VFM and IE specialists. Although the large development institutions (especially the World Bank and the Inter-American Development Bank) have codified guidance on the inclusion of efficiency analysis throughout various stages of reporting, 13 they often fail to generate consistent guidance about how to integrate VFM in impact evaluations. As a result, VFM can become prioritized by some groups and deemphasized by others within the same institution. For example, even though World Bank leadership has called for an increase of CBA in its impact evaluations, there is considerable variation in the take-up of that directive among the several IE hubs at the World Bank. One World Bank IE hub has implemented explicit requirements that all impact evaluations funded through the hub must also implement a CBA. To aid in this, that hub has collaborated with other highly regarded organizations to generate a new note to assist in capturing cost data for rigorous Cost analyses. In contrast, another impact evaluation hub at the World Bank has cited a lack of budget and staff expertise in its decision to not incorporate VFM analysis in its impact evaluation work, despite being endowed by a large trust fund from a donor keenly interested in VFM. In general, larger institutions tend to struggle with facilitating collaboration between impact evaluators and those who develop VFM analyses; indeed, these groups are often located in very different departments and rarely communicate. The bilateral and executive agencies that we interviewed were selected based on having an institutional track record of doing both VFM and Impact Evaluations. Although each has 13 For example, the World Bank’s Project Appraisal Reports (OPSPQ 2013a), Implementation Completion Reports (elaborated when a project is closed), and Project Evaluation (OPCS 2006) reports all give guidance on efficiency analysis. Moreover, IEG’s guidelines for post-completion assessments, Project Performance Assessment Reports (PPARs), also include guidance for evaluating the use of efficiency analysis in the project (IEG 2013). Finally, the WB also has an Economic Analysis Guidance Note (OPSPQ 2013b) that provides more specific guidelines on this issue. 21 institutionalized VFM within their projects and programs, or are subject to regulations that demand VFM, general VFM policies do not necessarily translate into more VFM analysis in impact evaluations in practice. Each of the government agencies that has such an institutional requirement governs evaluation using an “arm’s length” relationship to retain independence. As a result, agencies have decentralized the decision about whether to carry out an impact evaluation on any given program, resulting in uneven production of VFM based on IE results within any given institution. The MCC tends to do better than most in overcoming the VFM- IE integration challenge. Roughly half of MCC’s projects are impact evaluated. Although these evaluations are always done by third-party contractors, MCC economists are engaged in the design and monitoring of both the intervention and the impact evaluation and are responsible for developing an Economic Rate of Return model before a project is approved and then revisiting that model using the effectiveness results and cost data gathered from the impact evaluation. VFM generally appears more frequently in NGO impact evaluation producers. These institutions are perhaps the most progressive with respect to integrating and institutionalizing VFM into impact evaluations. Representatives reported formal, institutional practices and guidance governing the inclusion of VFM. Even so, not every impact evaluation included a VFM analysis, either because this guidance applies to certain pools of funds, or because it was difficult to enforce the guidance under all the scenarios in which the grants were made. QUESTION 3: WHAT ARE SOME OPTIONS TO OVERCOME THE CHALLENGES FOR THE INTEGRATION OF VALUE FOR MONEY ANALYSIS INTO IMPACT EVALUATIONS? Respondents’ proposals for increasing VFM in impact evaluations ran along three main channels, directed at the entire ecosystem of VFM production and consumption. 1. The most prevalent response to the question of how to increase the production of Value for Money analysis in impact evaluations was the need to develop closer ties to policymakers in order to understand their demands for information. Respondents proposed more efforts to increase demand for VFM from policymakers through improved understanding of their political pressures and needs for research and information and the general space in which they make decisions. Such outreach could take place through workshops to consider how VFM can inform evidence-based policy decisions or could use other strategies described in Dhaliwal and Tulloch (2012). 2. Lowering VFM research costs was another top priority. Respondents proposed (i) investing in and promoting standardized methods (including by information and training on the existing standards); (ii) refining methods which resolve challenges often faced by impact evaluators; (iii) organizing effective peer review by donors; (iv) organizing existing research (through systematic reviews or in league tables); (v) promoting “operationally relevant” VFM analysis in IE performed in more policy- oriented settings such as at the World Bank, USAID, and DFID among others; and (vi) promoting the creation of interactive efficiency models and tools as research becomes more plentiful around a particular sector; this can also go some distance to 22 assuage a degree of external validity concerns by allowing changes to assumption parameters to fit a targeted intervention context. 3. Communicating findings and facilitating discussion and agreement on methodological issues was a final priority IE-oriented NGOs and communities of practice within and across IE funders can play a significant role in expanding the ‘market of ideas’ around VFM-IE issues. Academic journals can also contribute by publishing clear guidelines on accepted standards for Value for Money analysis to which their peer reviewers will be expected to hold fast, and publication outlets can signal the improved likelihood of being accepted for publication that VFM analysis brings to an impact evaluation. These proposals highlight and attempt to address the challenges of integrating Value for Money analysis into impact evaluations. Other solutions likely exist. The goal of this paper is to motivate further dialogue among policymakers, impact evaluators, funders, and journal editors to overcome the existing structural barriers and weak incentives and to produce more and higher quality work on the important topic of efficiency. If achieved, the integration of VFM and IEs can make significant contributions to guiding local and global policy decisions on selectively investing in international development interventions, leading to more rapid reductions in poverty, improved economic growth and human welfare. In short, development practitioners will be able to do more, faster, with existing development budgets. 23 4. Discussion and Conclusion This paper explored the challenge and potential solutions to integrating Value for Money analysis into impact evaluations. It found that current levels of integration are low. We estimate that fewer than one in five impact evaluations have any type of Value for Money assessment. Several formidable challenges account for this low level of production. Impact evaluators believe that demand from policymakers is low; evidence presented here, however, reveals no statistically significant difference between policymakers’ willingness to pay for IE evidence and willingness to pay for Cost evidence—despite the fact that IEs are more than 10 times more expensive. This implies that Cost analyses and VFM exercises represent normative bargains in the eyes of policymakers. Yet IE funders’ demand for VFM in IE is mixed and is often driven by the preferences of their donors—donors’ reporting requirements for VFM appear to be inconsistent across funding windows and recipients. Academic researchers do not have significant incentives to produce VFM in IE as neither publication outlets nor other professional considerations give significant additional weight for including VFM; on the other hand, academics face non-negligible disincentives. Finally, VFM analysis receives little forethought at the survey design phase, and data collection on costs is expensive and less precise when done after a project closes. Perhaps most important, there is a lack of cohesive guidance and tools of VFM methods in their application and acceptance among impact evaluators—whose skillsets are generally thin on VFM methods to start with. Templates can help, but specialized expertise is needed to appropriately adapt such templates to the evaluation research design. When an institution does have formal guidelines that cover efficiency analysis in the formulation of IEs, the application of those guidelines is often uneven, and decisions on whether impact evaluations receive a Value for Money analysis are often uncoordinated at the project level. VFM analysis can increase the value of an IE as a public good. And as with most public goods, provision is sub-optimal when decisions are made individually and in isolation, and when there is no recourse for internalizing externalities. In addition to low levels of inclusion, VFM analyses in IEs often lack transparency and comparability. Opaque reporting of methods, data sources, prices and discount rates make it difficult for consumers to establish comparability and accurately interpret the results of CE and CBA analyses performed in different settings and countries. Under increasing pressure to deliver to the last mile, and improve results while reducing costs in its applied policy work, several reviewers of this paper noted that the World Bank has a unique opportunity to advance VFM in IE practice using the following instruments: 25 • Training Workshops for Staff and Country Representatives, sustained through continuing guidance and technical assistance; 14 • Cost platforms or templates, together with training on those platforms as needed, to assist users in the appropriate application of VFM methods; • Greater dissemination and promotion of studies that use CEA and CBA methods so that evaluators and authorities become accustomed to their methods, findings, and implications; • Codification (or even more pointed guidance) of a common set of methods that is sufficiently flexible to allow for reasonable alternatives that can be specified with a discussion of a basis for deviations from the “standard” and the probable consequences. A standard set of defaults could also be used where the user is not sure of the best assumptions; • Bolstered vigilance on the quality of efficiency calculations in Project Appraisal Documents and Implementation Completion Reports; • Expectations of sensitivity analysis on deviations from standard methods and assumptions; and • Consensus on standard outcome measures and specifications for their construction. These may come from measures derived from the Demographic and Health Surveys (DHS), and the Living Standards Measurement Survey (LSMS), for example. Relatedly, the field needs policy-relevant standard numerators for CEA in different sectors, as has been done with the disability adjusted life year (DALY) in health, for example, or months of additional schooling in the Education sector. If some evaluators opine that demand for VFM from policymakers may not appear to be terribly strong, it is useful to recall that initially, impact evaluations were not in high demand by policymakers either. Instead, the demand for impact evaluation largely came from development funding agencies and actors interested in basic research. The former have become increasingly interested in VFM in more recent times, in part as a result of domestic resource constraints leading to increased scrutiny of the value achieved from the aid budget. As for the academic world, the demand for VFM as exemplified through its coverage in academic journals is still decidedly less strong than for impact evaluations. There is reason to believe that this could change. One of the appeals of VFM is the ability to compare multiple intervention options. As there are more impact evaluations generated around a theme, efficacy comparisons become more feasible—through systematic reviews and other vehicles. Subsequently, the academic conversation around impact evaluations will likely turn to two frontiers: generalizability and efficiency. 14 For example, a training program sponsored by the Center for Benefit Cost Study in Education, funded by the Institute of Education Science in the U.S. Department of Education includes five days of intensive, hands-on training in cost analysis with technical training in shadow pricing and sensitivity analysis, all based around an open-source platform for performing cost analysis called CostOut (Teachers College, Columbia University 2016). 26 In the near term, the generalizability question can be partially resolved by examining the similarities in contextual factors between the intervention evaluated by an IE and the specific context where a replication of that intervention is being considered. However, this hinges on the transparent reporting of relevant contextual factors. As pointed out by Waddington et al. (2012), the context specificity of a single study is a strength in generating locally relevant insights, and a weakness when looking to draw more generalizable conclusions. In aggregate, there is a generalizable robustness for interventions that have consistently demonstrated a meaningful effect across multiple contexts. Systematic reviews can help. For a given topic, they accumulate all available evidence that passes risk of bias assessments and present the current state of knowledge on the efficacy of an intervention or interventions, and the number of systematic reviews is increasing. 15 On the efficiency frontier, the question of Value for Money is ripe for examination, especially because clear identification of effects is a necessary condition to being able to ascribe efficiency. In his 2001 treatment of a very similar question—the lack of Cost Effectiveness analysis in education—Cost Effectiveness veteran Hank Levin drew on 30 years of experience to give three possible explanations: a lack of training, a lack of effects, and a lack of demand by policymakers (Levin 2001). Since that time, the proliferation of impact evaluations and systematic reviews has made large strides on the issue of a lack of effects. And as indicated by the research in the present study, policymakers appear to be nearly as concerned about Costs as they are about effects. The component of production with the greatest gap is capacity of evaluators to do VFM work. If this can be resolved through improved training, Value for Money may see expanded implementation and integration in impact evaluations. The challenges and barriers to greater production and use of VFM analyses in impact evaluations chronicled in this paper can be overcome by adopting approaches outlined to give greater voice to policymakers’ needs, increasing calls and use for VFM from development agencies, increasing incentives for impact evaluators by (inter alia) lowering expected costs, and reducing frictions between supply and demand for incorporating policy- relevant Value for Money analysis into impact evaluations. 15 For an excellent primer in how to do a systematic review, see Waddington et al. (2012). To find systematic reviews, see the Campbell Collaboration (http://www.campbellcollaboration.org/international_development/index.php), Cochrane Collaboration (http://www.cochrane.org/), 3ie (http://www.3ieimpact.org/en/evidence/systematic-reviews/), and IEG (http://ieg.worldbank.org). 27 5. References Acumen Fund. 2007. The Best Available Charitable Option. New York: Acumen Fund. Andrabi, Tahir, Jishnu Das, and Asim Ijaz Khwaja. 2014. “Report Cards: The Impact of Providing School and Child Test Scores on Educational Markets.” mimeo. Bjorkman, Martina, and Jakob Svensson. 2009. “Power to the People: Evidence from a Randomized Field Experiment on Community-Based Monitoring in Uganda.” Quarterly Journal of Economics 124 (2): 735–769. Boardman, Anthony E., Wendy Mallery, and Aidan Vining. 1994. "Learning from EX Ante/ Ex Post Cost-Benefit Comparisons: The Coquihalla Highway Example." Socio-Economic Planning Sciences 28 (2): 69–84. Cameron, Drew B., Anjini Mishra, and Annette N. Brown. 2015. “The Growth of Impact Evaluation for International Development: How Much Have We Learned?” Journal of Development Effectiveness April, 1–21. doi:10.1080/19439342.2015.1034156. Cerdan-Infantes, Pedro, and Christel Vermeersch. 2013. “More Time Is Better: An Evaluation of the Full-Time School Program in Uruguay.” The World Bank's Gender Impact Evaluation Database, Washington, DC. http://documents.worldbank.org/curated/en/2013/08/18329366/more-time-better-evaluation- full-time-school-program-uruguay. DFID (U.K. Department for International Development). 2011. DFID’s Approach to Value for Money (VfM). London: DFID. Dhaliwal, I., E. Duflo, R. Glennerster, and C. Tulloch. 2012. “Comparative Cost-Effectiveness Analysis to Inform Policy in Developing Countries: A General Framework with Applications for Education.” In Paul Glewwe, ed., Education Policy in Developing Countries. University of Chicago Press. Also, http://www.povertyactionlab.org/publication/cost-effectiveness. Dhaliwal, I., E., and C. Tulloch. 2012. “From Research to Policy: Using Evidence from Impact Evaluations to Inform Development Policy.” Journal of Development Effectiveness 4(4):515- 536. DOI: 10.1080/19439342.2012.716857. Also, http://www.povertyactionlab.org/publication/research-policy Evans, David K., and Anna Popova. “Cost-Effectiveness Analysis in Development: Accounting for Local Costs and Noisy Impacts.” World Development 77 (2016): 262–276. Evans, David. “Why Don’t Economists Do Cost Analysis in Their Impact Evaluations?” Development Impact (blog post) https://blogs.worldbank.org/impactevaluations/why-don-t- economists-do-cost-analysis-their-impact-evaluations Evans, David. “Why Don’t Economists Do Cost Analysis in Their Impact Evaluations?” Development Impact (blog post) https://blogs.worldbank.org/impactevaluations/why-don-t- economists-do-cost-analysis-their-impact-evaluations Fleming, Farida. 2013. “Evaluation Methods for Assessing Value for Money. Better Evaluation.” Working Group Paper, Australasian Evaluation Society, Perth, Australia. 29 Gaarder, Marie M., and Bertha Briceño. 2010. “Institutionalisation of Government Evaluation: Balancing Trade-Offs.” Working Paper 8. International Initiative for Impact Evaluation, New Delhi. Hewlett Foundation. 2008. Making Every Dollar Count: How Expected Return Can Transform Philanthropy. Menlo Park, Calif.: William and Flora Hewlett Foundation. IEG (Independent Evaluation Group). 2010. Cost-Benefit Analysis in World Bank Projects. Washington, DC: World Bank. http://ieg.worldbank.org/Data/reports/cba_full_report1.pdf IEG (Independent Evaluation Group). 2012. World Bank Group Impact Evaluations: Relevance and Effectiveness. Washington, DC: World Bank Group. Jamison et al., Disease Control Priorities in Developing Countries, 2nd Edition Disease Control Priorities. New York: Oxford University Press; 2006. Kremer, Michael, Conner Brannen, and Rachel Glennerster. 2013. “The Challenge of Education and Learning in the Developing World.” Science 340 (6130): 297–300. Levin, Henry M. 2001. “Waiting for Godot: Cost-Effectiveness Analysis in Education.” New Directions for Evaluation 90. Levin, Henry M., Patrick J. McEwan, Clive Belfield, Brooks A. Bowden, and Robert Shand. 2018. Economic Evaluation in Education: Cost-Effectiveness and Benefit-Cost Analysis. Third. SAGE Publications. McEwan, P. J. 2012. “Cost-Effectiveness Analysis of Education and Health Interventions in Developing Countries.” Journal of Development Effectiveness 4 (2): 189–213. McEwan, Patrick J. 2012. “Cost-Effectiveness Analysis of Education and Health Interventions in Developing Countries.” Journal of Development Effectiveness 4 (2): 189–213. Muralidharan, Karthik, and Venkatesh Sundararaman. 2011. “Teacher Performance Pay: Experimental Evidence from India.” Journal of Political Economy 119 (1): 39-77. New Economics Foundation. 2004. Social Return on Investment: Valuing What Matters. London: New Economics Foundation. Olken, Benjamin A. 2007. “Monitoring Corruption: Evidence from a Field Experiment in Indonesia.” Journal of Political Economy 115 (2): 200–249. OPCS (Operations Policy and Country Service). 2006. Implementation Completion and Results Report Guidelines. Washington, DC: World Bank. Operations Policy and Quality Department (OPSPQ). 2013a. Investment Project Financing-Preparing the Project Appraisal Document (PAD). Washington, DC: World Bank. Sabet, Shayda Mae, and Annette N. Brown. 2018. “Is Impact Evaluation Still on the Rise? The New Trends in 2010–2015.” Journal of Development Effectiveness 10 (3): 291–304. https://doi.org/10.1080/19439342.2018.1483414 Savedoff, W. 2013. “Impact Evaluation?: Where Have We Been? Where Are We Going?” Presentation at the CGD-3ie Conference, Center for Global Development, July 17, Washington DC. http://www.cgdev.org/sites/default/files/Savedoff.pdf Stokey, Edith, and Richard Zeckhauser. 1978. A Primer for Policy Analysis. New York: W.W. Norton & Company. Teachers College, Columbia University. “CBCSE Methods Training.” Center for Benefit Cost Studies of Education. http://cbcse.org/ (accessed 8 September 2016). 30 Tuan, Melinda T. 2008. “Measuring and/or Estimating Social Value: Insights into Eight Integrated Cost Approaches.” Prepared for Bill & Melinda Gates Foundation, Washington, DC. Waddington, Hugh, Howard White, Birte Snilstveit, Jorge Garcia Hombrados, Martina Vojtkova, Philip Davies, Ami Bhavsar, John Eyers, Tracey Perez Koehlmoos, Mark Petticrew, Jeffrey C. Valentine, and Peter Tugwell. 2012. “How to Do a Good Systematic Review of Effects in International Development: A Tool Kit.” Journal of Development Effectiveness 4 (3): 359– 387. Weyrauch, Vanesa, and Gala Diaz Langou. April 2011. “Sound Expectations: From Impact Evaluations to Policy Change.” Working Paper 12, International Initiative for Impact Evaluation. White, Howard. 2014. “Current Challenges in Impact Evaluation.” European Journal of Development Research 26 (1): 18–30. White, Howard and Michael Bamberger. 2008. “Introduction: Impact Evaluation in Official Development Agencies.” IDS Bulletin 39 (1): 1-11. White, Howard, and Edoardo Masset. 2018. “The Rise of Impact Evaluations and Challenges Which CEDIL Is to Address.” Journal of Development Effectiveness 10 (4): 393–99. https://doi.org/10.1080/19439342.2018.1539387. IEG (Independent Evaluation Group). 2012. World Bank Group Impact Evaluations: Relevance and Effectiveness. Washington, DC: World Bank. http://ieg.worldbank.org/Data/reports/impact_eval_report.pdf IEG (Independent Evaluation Group). 2013. Independent Commission for Aid. Guidelines for Reviewing World Bank Implementation Completion and Results Reports: A Manual for Evaluators. Washington, DC: World Bank. Operations Policy and Quality Department (OPSPQ). 2013b. Investment Project Financing Economic Analysis Guidance Note. Washington, DC: World Bank. 31