82841 AUTHOR ACCEPTED MANUSCRIPT FINAL PUBLICATION INFORMATION Good Countries or Good Projects? Macro and Micro Correlates of World Bank Project Performance The definitive version of the text was subsequently published in Journal of Development Economics, 105(November 2013), 2013-07-13 Published by Elsevier THE FINAL PUBLISHED VERSION OF THIS ARTICLE IS AVAILABLE ON THE PUBLISHER’S PLATFORM This Author Accepted Manuscript is copyrighted by the World Bank and published by Elsevier. It is posted here by agreement between them. Changes resulting from the publishing process—such as editing, corrections, structural formatting, and other quality control mechanisms—may not be reflected in this version of the text. You may download, copy, and distribute this Author Accepted Manuscript for noncommercial purposes. Your license is limited by the following restrictions: (1) You may use this Author Accepted Manuscript for noncommercial purposes only under a CC BY-NC-ND 3.0 Unported license http://creativecommons.org/licenses/by-nc-nd/3.0/. (2) The integrity of the work and identification of the author, copyright owner, and publisher must be preserved in any copy. (3) You must attribute this Author Accepted Manuscript in the following format: This is an Author Accepted Manuscript of an Article by Denizer, Cevdet; Kaufmann, Daniel; Kraay, Aart Good Countries or Good Projects? Macro and Micro Correlates of World Bank Project Performance © World Bank, published in the Journal of Development Economics105(November 2013) 2013-07-13 http://creativecommons.org/licenses/ by-nc-nd/3.0/ © 2013 The World Bank Good Countries or Good Projects? Macro and Micro Correlates of World Bank Project Performance Cevdet Denizer (Center for Economics and Econometrics, Bogazici University) Daniel Kaufmann (Revenue Watch and Brookings Institution) Aart Kraay (World Bank) April 2013 Abstract: This paper investigates macro and micro correlates of aid-financed development project outcomes, using data from over 6,000 World Bank projects evaluated between 1983 and 2011. Country- level "macro" measures of the quality of policies and institutions are strongly correlated with project outcomes, consistent with the view that country-level performance matters for aid effectiveness. However, a striking feature of the data is that the success of individual development projects varies much more within countries than it does between countries. A large set of project-level "micro" variables, including project size, project length, the effort devoted to project preparation and supervision, and early-warning indicators that flag problematic projects during the implementation stage, accounts for some of this within-country variation in project outcomes. Measures of World Bank project manager quality also matter significantly for the ultimate project outcomes. We discuss the implications of these findings for donor policies aimed at aid effectiveness. Keywords: aid effectiveness, World Bank projects cd.cee@boun.edu.tr, dkaufmann@brookings.edu, akraay@worldbank.org. We are very grateful to Jaime Zaldivar and Kartheek Kandikuppa for their assistance in retrieving project-level data from the World Bank’s databases, and to Swati Raychaudhuri for tireless data entry. Thanks also to Martha Ainsworth, Antonella Bassani, Jaime Biderman, Jean-Jacques Dethier, Marguerite Duponchel, Patricia Geli, Homi Kharas, Alex McKenzie, Hoveida Nobakht, Kyle Peters, Hari Prasad, Lant Pritchett, Veronika Penciakova, Luis Serven, Andrew Warner, and seminar participants at the World Bank and the IMF for helpful discussions. Support from the Concessional Finance and Partnerships Vice Presidency of the World Bank is gratefully acknowledged. The views expressed here are the authors’, and do not reflect those of the Brookings Institution, the World Bank, its Executive Directors, or the countries they represent. 1. Introduction A vast empirical literature has sought to answer the question of when foreign aid is effective in achieving its desired objectives. One influential strand of this literature has focused on the aggregate country-level impact of aid, typically on GDP growth, and has necessarily also focused on country-level factors that determine the aggregate effects of development assistance. 1 However, recognizing that most of foreign aid is provided in the form of individual aid-financed development projects2, another influential strand of the literature focused on aid effectiveness at the project level. During the 1970s and 1980s, this typically took the form of calculations of economic rates of return in aid projects, either prospectively in order to justify financing particular projects, or to assess their effectiveness after the fact. More recently, a large literature has used more rigorous impact evaluation techniques, often in the form of randomized controlled trials, to understand the effects of particular aid-financed interventions at the level of individual projects. Out of necessity, this project-level literature has for the most part focused on project-level factors that matter for the success or failure of individual projects. In this paper, we use a very large dataset of over 6000 World Bank projects, implemented in 130 developing countries since the 1970s, to simultaneously investigate the relative importance of country- level "macro" factors and project-level "micro" factors in driving project-level outcomes. Our effort to bridge the gap between the country-level and project-level aid effectiveness literatures is motivated by the observation that, while country-level factors are important for the aid project outcomes, these outcomes vary much more across projects within countries than they do between countries. This implies that both project-level factors (which often are at least in part under the control of the aid agency implementing project), as well as country-level characteristics (which typically are beyond the control of aid donors), need to be taken into account when assessing project performance, and aid effectiveness more generally. Our measure of project-level success consists of a subjective assessment of the extent to which individual World Bank projects were able to attain their intended development objectives. These ratings 1 This line of research has produced a wide variety of conflicting results, to the point where Temple (2010) suggests that it “must be regarded as a work in progress”. Recent assessments over the past decade range from cautiously optimistic Burnside and Dollar (2001), Clements, Radelet, Bhavnani (2004), Hansen and Tarp (2000), Minoiou and Reddy (2009), and Arndt, Jones, and Tarp (2010); to ambivalent, Roodman(2007); to skeptical and pessimistic, Easterly, Levine and Roodman (2004), Doucouliagos and Paldam (2008), and Rajan and Subramanian, (2008). 2 The subdivision of aid into individual development projects is quantitatively important. For example, AidData, the largest existing compendium of project-level aid data, records over 1 million individual development finance 'activities' over the past 50 years (http://irtheoryandpractice.wm.edu/projects/plaid/). 1 are generated through internal World Bank project management and evaluation procedures, which we describe in more detail below. While we acknowledge upfront that these ratings are highly-imperfect indicators of the ultimate effects of projects, we will for terminological convenience refer to these ratings as "project outcomes". In addition, we share with the rest of the project-level literature the important limitation that the average effectiveness of individual aid projects may well not coincide with the aggregate impacts of aid. For example, there may be complementarities between individual aid projects, or between aid- and non-aid-financed projects, that contribute to a greater aggregate impact than any individual aid project. Conversely, to the extent that aid money is fungible, even highly- successful aid-financed projects may have the side effect of freeing up resources for less beneficial forms of recipient-government spending, so that the aggregate impact of aid is less than the project- level evidence would suggest. With these qualifications in mind, we first document a set of robust partial correlations between project outcomes and basic measures of country-level policy and institutional quality observed over the life of the project. This echoes other findings in the literature on macro-level determinants of aid effectiveness, which emphasize the role of country-level proxies for the quality of policies and institutions in driving project outcomes. However, enthusiasm for this finding on the importance of country-level variables for project outcomes needs to be tempered by the observation that roughly 80 percent of the total variation in project outcomes in our sample occurs across projects within countries, rather than between countries. This basic observation suggests that there are large returns to gathering and studying potential project-level correlates of project outcomes, which have largely been overlooked in the cross-country literature on aid effectiveness. We draw extensively on the World Bank’s internal databases to extract three categories of such project-level variables: (1) basic project characteristics such as the size and sector of the project, and the amount of resources devoted to its preparation and supervision, (2) potential early-warning indicators of project success retrieved from the World Bank’s administrative processes for monitoring and implementing active projects ; and (3) information on the identity of the World Bank staff member responsible for the project. We find that several project-level variables, such as project size, project length, the extent of preparation and supervision costs, delays in starting projects, and whether the project was restructured or was flagged as problematic early in the life of the project, are significant correlates of project-level outcomes. However, interpreting these partial correlations is complicated by the fact that many of the project-level characteristics we observe are not randomly assigned to projects, but rather reflect 2 deliberate choices by those responsible for designing and implementing the project. For example, more challenging projects might require greater supervision by World Bank staff, and might also be more likely to result in unsatisfactory outcomes. While we lack a plausibly exogenous source of variation in project characteristics that can be used to pin down causal effects, we make an extensive effort to document and interpret the size of the likely biases due to this endogeneity problem. In the final section of the paper we explore the role of differences in the quality of World Bank staff assigned to manage projects (known as "task team leaders") in explaining variation in project performance. We study this question in a reduced sample of projects where we have information on the identity of the task team leader, and we also have meaningful variation in project outcomes across both countries and task team leaders. Our main finding here is that task team leader fixed effects are of comparable importance to country fixed effects in accounting for the variation in project outcomes, suggesting a strong role for task team leader-specific characteristics in driving project outcomes. We also document that task team leader quality (as proxied by the average outcome rating on all the other projects managed by the same staff member) is a highly significant predictor of project outcomes. Our results are based on the analysis of projects of just one aid donor, the World Bank. Despite this particular institutional focus, we believe that the evidence in this paper has broader implications for aid effectiveness beyond the World Bank itself. The World Bank is one of the largest single aid donors in the world, its basic model of financing and implementing specific aid projects is by far the most common mode of aid delivery among all aid donors. While each aid donor has its own mechanisms for allocating resources across countries, for identifying specific aid projects to finance within countries, and for determining the management, implementation, supervision, and evaluation of these projects, a few implications of our findings are plausibly relevant to the wider aid community. The first is basic and not very new, though it is confirmed by the updated and expanded work in this paper: targeting aid to countries with better policies and institutions pays off, as rates of project success are significantly higher in countries with good policy, as measured by the CPIA ratings. However, the very large heterogeneity in project performance within countries suggests that policies to improve aid effectiveness could focus more on project-level factors in addition to country-level factors. These include those that make individual projects difficult to restructure or cancel outright even after early indications of problems arise, as well as those that underlie the large differences in project performance across task managers that we observe in the data. 3 The rest of this paper proceeds as follows. In the next section we briefly summarize related literature that has also studied the World Bank project-level data we work with here. In Section 3 we describe the project-level outcome data in detail. Sections 4 and 5 contain our main empirical results on the links between country and project-level characteristics and project outcomes. Section 6 addresses the problem of unobserved project characteristics in driving our results, while Section 7 documents the importance of task team leader characteristics in explaining project outcomes. Section 8 offers concluding remarks and a discussion of the implications of these findings for policies to improve aid effectiveness. 2. Related Literature This paper is not the first to study the correlates of individual World Bank project outcomes. In earlier contributions, Isham, Kaufmann and Pritchett (1997) and Isham and Kaufmann (1999) studied the determinants of project-level estimated ex-post economic rates of return. Both of these papers focused primarily on country-level factors affecting project returns, notably the role of democracy and civil liberties in the first, and the role of sound macroeconomic policies in the second. Many subsequent papers have similarly focused on country-level determinants of project performance, typically focusing on country-level averages of the same project success measure we use here. For instance, Dollar and Levin (2005) estimate a series of cross-country regressions of country-average project success ratings on a set of country-level explanatory variables, emphasizing the role of differences in institutional quality in driving cross-country differences in average project performance. Guillamont and Laajaj (2006) focus on country-level volatility in accounting for project-level success, while Chauvet, Collier, and Duponchel (2010) emphasize country-level conflict measures. In addition Dreher, Klasen, Vreeland and Werker (2010) focus on the effect of political influence in project approval decisions (as proxied by a country- level variable capturing whether the country benefitting from the project was a rotating member of the UN Security Council) on project outcomes. Finally, World Bank (2010) Chapter 7, studies the impact of trends in country-level macro variables, such as growth and market-oriented reforms, on variation over time in project-level Economic Rates of Return (ERRs). Despite the very large project-level variation in the data, only a handful of previous papers have sought to link this project-level variation in outcomes to project-level explanatory variables. Deininger, Squire, and Basu (1998) primarily focus on the effect that the volume of pre-existing country-level economic analysis has on the success of projects, but also contrast this with a project-level variable measuring the time spent by World Bank staff on project supervision. Dollar and Svensson (2000) focus 4 on a small set of structural adjustment projects and investigate the role of both country-level political economy factors and a number of project-level factors, such as project preparation and supervision time and the number of conditions associated with the loan, in determining the ultimate success of structural adjustment operations. Kilby (2000) examines the role of staff supervision in determining project outcomes, but focuses on a set of interim outcome measures gathered over the course of project implementation, rather than the ex post outcome measures used in most other papers, including this one. Chauvet, Collier, and Fuster (2006) also emphasize supervision, and document the differential effect it has on project outcomes in countries with strong and weak governance. More recently, Kilby (2011, 2012) documents the effect of political influence on World Bank project preparation times, and the subsequent impact of preparation times on project outcomes. Pohl and Mihaljek (1998) focus less on project outcomes themselves, and more on the discrepancy between ex ante and ex post estimated economic rates of return at the project level. Finally, while not focused on World Bank projects, our emphasis on the distinction between country-level and project-level correlates of project outcomes is shared with Khwaja (2009), who investigates the role of project-level and community-level characteristics in determining the success of individual small infrastructure projects undertaken in a set of communities in northern Pakistan. He documents that community-level constraints to successful project performance can be alleviated by better design at the level of individual projects, thus enabling "good" projects in "bad" communities. We contribute to this literature in two main ways. This paper is the first, to our knowledge, to emphasize the relative importance of the within-country variation in project performance relative to the between-country variation. This is important because, as described above, much of the previous empirical literature has relied on country-level variables to explain project-level success, even though country-level variation accounts for only about one-fifth of the variation in project outcomes. From a policy perspective, this observation is particularly relevant, given most aid donors’ focus on country- level factors for determining conditionality and eligibility for aid programs. Second, relative to the existing literature, we assemble a much larger and novel set of project-level variables in our effort to account for this very large project-level heterogeneity in outcomes. We do so by including data from the Bank's own internal monitoring indicators over the course of project implementation, as well as information on the identity of task managers, that has not yet been exploited in the academic literature on project outcomes. 5 3. Project Outcome Data In order to understand the data on project outcomes used in this paper, some institutional background is helpful. The activities of the World Bank are organized by project. For example, a project might consist of an agreement to build a particular piece of infrastructure, to fund teacher or health worker training, to support a particular health intervention, or a myriad of other potential development- oriented government actions that the World Bank finances. In some cases, projects simply take the form of budget support to recipient countries. A document describing the project is prepared by World Bank staff and includes a proposed amount of World Bank funding. A key ingredient in this initial document is a statement of the project’s “development objective,” which summarizes what the project intends to achieve. This development objective is important because subsequent project evaluations rate the performance of the project relative to this objective. Once the project is approved by the Board of Executive Directors of the World Bank, it is implemented over several years, with project-related spending financed by disbursements on loans and/or grants provided by the World Bank, often with co- financing from other donors and the recipient government. On the World Bank side, each project is staffed by a project team led by a "task team leader". At least twice a year, task team leaders are required to report on the status of the projects for which they are responsible, by completing an Implementation Status Results report. As discussed below, these reports provide us with a rich set of project-level variables measured over the life of the project. Once the project is complete, the task team leader produces an Implementation Completion Report, which includes a subjective assessment of the degree to which the project was successful in meeting its development objective. These ratings are reviewed by World Bank management for the country and/or region where the project took place, and can be thought of as an initial "self-evaluation" by World Bank staff and management of the project. After 1995, all such self-evaluations were also subject to an additional layer of validation by the Independent Evaluation Group (IEG) of the World Bank, based on available project documentation. These desk reviews were variously known as “Evaluation Summaries” or “Evaluation Memoranda”. In addition to these desk reviews, IEG performs a more detailed ex-post evaluation of about 25 percent of projects completed each year, known as “Project Performance Audit Reports”. These typically occur several years after project completion – the mean lag in our core sample between project completion and the completion of these detailed IEG evaluations is 3.4 years. These more elaborate reviews often involve substantial additional analysis, including visits to the project site for follow-up data 6 gathering and analysis. These evaluations also explicitly rate projects in terms of their success in attaining their stated development objectives. We construct our project outcome variable by taking the rating from the most detailed evaluation available for each project. Specifically, we rely on the project outcome rating from the Project Performance Audit Reports for those projects for which these detailed evaluations were completed. For projects not subject to this audit, we rely on the IEG desk review of the Implementation Completion Report (for all projects completed after 1995), or otherwise on the Implementation Completion Report itself. Projects evaluated prior to 1995 are assigned a binary satisfactory/unsatisfactory rating, while projects rated after 1995 are scored on a six-point scale ranging from highly unsatisfactory to highly satisfactory. We define two datasets based on this break in the evaluation scale. The first combines all available projects and converts the six-point rating scale into a binary indicator during the post-1995 period (with all projects rated as "moderately successful/successful/highly successful" classified as "successful"). The second dataset consists only of projects evaluated since 1995, and uses the six-point outcome rating. Our largest dataset covering projects evaluated between 1983 and 2011 consists of 6,569 projects, of which 2148 performance ratings are based on detailed IEG reviews over the period 1983-2012, another 3117 ratings are based on IEG desk reviews since 1995, and the remaining 1304 pre- 1995 ratings are based on Implementation Completion Reports alone. The smaller dataset covering projects evaluated between 1995 and 2011 consists of 4191 projects, of which 1044 are based on detailed IEG reviews and the remainder are lighter IEG desk reviews of Implementation Completion Reports.3 In the data, projects are mapped to up to five different sectors, with percentages indicating the fraction of the project falling in each sector. We uniquely assign projects to sectors based on the largest of these sectoral assignments. Table 1 reports the distribution of these projects across sectors, while Table 2 reports summary statistics on the project outcome ratings, as well as on all of our explanatory variables that we discuss in more detail below. 3 Since all completed World Bank projects will have either an Implementation Completion Report, or an IEG desk review, or a full IEG evaluation, our dataset in principle consists of the universe of all completed World Bank projects. We do however lose some projects due to unavailability of data on some of our explanatory variables. The full IEG evaluation dataset we work with consists of 8168 projects with some kind of an evaluation. We focus on the subset of 7342 projects evaluated since 1983, since most of our project-level data is scarce for earlier projects. In the post-1983 period, we lose a further 776 projects due to unavailability of either country-level or basic project-level data described in more detail in the next section. 7 There are a variety of natural concerns about the credibility of these project outcome ratings. A basic concern is that they explicitly measure success in attaining the stated “development objective” of each project, rather than relative to some common standard across projects and over time. In part this is a natural consequence of the wide sectoral diversity of projects that the World Bank finances -- it would be quite difficult to define a common standard against which to evaluate the outcome a road building project, a teacher training project, and a civil service reform project, for example. It is also quite plausible that the standards for setting development objectives have evolved over the nearly 30 years covered by our dataset. And finally, the standards for evaluating success relative to a given development objective may also have been evolving over time To account for this possibility, we construct a set of dummy variables corresponding to the five-year period in which the project was approved and in which the project was evaluated (i.e. 1980-84, 1985-89, etc.). We include these dummies, their interactions with a set of sector dummies, and the sector dummies themselves, in all specifications. A second obvious concern might be that project outcome ratings based on Implementation Completion Reports alone primarily reflect the view of the task team leader, who may not be fully candid about the shortcomings of the projects for which s/he is responsible. As a first check of whether this concern is important, Figure 1 graphs the average number of projects rated as “satisfactory” over time for each of the three types of evaluations (projects are organized here by year of evaluation). During the period up to 1995 we can compare the Implementation Completion Report-based reviews with the detailed IEG evaluations. This first look at the data reveals little difference in the average rating across these two types of evaluations. Similarly, during the period after 1995, average ratings on projects receiving detailed IEG evaluations do not appear to be very different from those based on the lighter IEG desk reviews.4 Nevertheless, in all of our empirical specifications that follow, we include dummy variables for the type of evaluation to capture the possibility of differences in average outcome ratings across projects. Yet another concern might be the credibility of the IEG evaluations themselves. On the one hand, several factors point to their plausibility. The IEG is formally independent of the rest of the Bank’s management and directly reports to the World Bank's Board. Its review procedures are developed independently, its staff is experienced with evaluation issues, and has the ability to draw on cross- 4 A separate issue is whether evaluations reviewed or produced by IEG result in lower scores for the same project than the initial evaluation completed by the task team leader. There is some evidence that this is the case unconditionally, when comparing projects for which both evaluations are available. 8 country and cross-project experience to inform project assessments and apply common standards. Moreover, since the 1990s most IEG evaluations have been public and IEG pays close attention to comments and criticisms of outside experts, civil society groups, and academia. On the other hand, IEG is primarily staffed by current and future Bank staff and there is some rotation in and out of IEG, although this turnover is considerably lower than in other parts of the World Bank. There are also likely various informal channels of communication between IEG and World Bank staff which may affect the ratings process. While the full independence of IEG evaluations cannot be directly verified or contradicted, we simply note this as a potential question regarding the reliability of outcome measures. Our overall impression is that while the evaluation outcome data described here is far from perfect, arguably it captures the experience and insights of many World Bank and IEG staff on how well projects have fared. And of course, even well-measured individual project outcomes will not be fully informative about the overall aggregate development impact of aid, as there may be complementarities between projects, as well as the potential scope for aid-financed spending crowding in (or out) other sorts of public spending. Nevertheless, while surely there is considerable remaining measurement error in the outcome measures, it is still useful to investigate a range of country-level and project-level factors that are associated with these outcomes. We discuss these factors next. 4. Country-Level Correlates of Project Performance We begin by considering a small set of core country-level variables that have been identified in the literature as important correlates of project outcomes. We measure the quality of country-level policies and institutions using the Country Policy and Institutional Assessment (CPIA) ratings of the World Bank. This is a useful summary of the types of country-level policies and institutions that may matter for project performance, emphasized for example by Isham and Kaufmann (1999) and Dollar and Levin (2005). This measure of policy is also of particular interest in our context, given that concessional loans and grants from the World Bank are allocated across countries according to a formula that strongly rewards better CPIA performance. 5 This allocation formula reflects the expectation that World Bank-financed projects will have better outcomes in countries with better policy performance. We also include real per capita GDP growth as a crude proxy for macroeconomic shocks. Finally, we consider the 5 The Performance-Based Allocation formula used to allocate IDA resources is essentially a geometric average of per capita income and the CPIA, with an exponent of 5 on the latter. This implies that a country with fairly good performance (with a CPIA score of 4) would get more than four times as much aid per capita as a country with at the same per capita income level but fairly weak performance (with a CPIA score of 3), i.e. . 9 role of civil liberties and political rights as country-level correlates of project performance, as emphasized by Isham, Kaufmann and Pritchett (1997). We measure these using the sum of the Freedom House scores of civil liberties and political rights. Our unit of observation is a project, which typically lasts several years – the median length of a project in our sample is 6 years, and 10 percent of projects last 9 years or more. For each project, we calculate the annual average of each of these country-level correlates of performance over the life of the project (from approval to evaluation), and use it as an explanatory variable for project performance. We report these results in Table 3. The first five columns of this table refer to the full set of projects evaluated between 1983 and 2011, and uses the binary satisfactory/unsatisfactory project outcome rating, while the second set of columns refers to the 1995-2011 period, and uses the six-point scale for project outcomes. As discussed above, all specifications include a full set of sector dummies, sector-by- approval period dummies, and sector-by-evaluation period dummies to control for potentially evolving standards in setting project development objectives, as well as in the evaluation of projects relative to these objectives. We also include dummy variables to capture the type of the evaluation, as well as the lag (in years) between project completion and the date of the evaluation. To conserve on space, we do not report the estimated coefficients on these controls for evaluation characteristics.6 Finally, we cluster all standard errors at the country-evaluation year level to allow for potential correlation of residuals within these groups. We obtain very similar standard errors clustering at the sector-evaluation year level, and at the sector-region level. Turning to the country-level variables, we find that all of them enter with the expected signs, and in most cases also statistically significantly. Higher rates of economic growth, and better policy performance, are significantly positively correlated with project performance in all specifications. Consistent with the findings of Isham and Kaufmann (1999), we find that greater political freedoms, as measured by the Freedom House ratings, are also positively associated with project outcomes. However, this effect loses significance in the specifications which control for policy performance and economic growth. The magnitude of the estimated coefficients is also noteworthy. In column (10) for 6 Briefly, a longer delay between project completion and project evaluation is significantly associated with worse project outcome ratings. On average, detailed IEG PPAR reviews on average assign slightly higher outcome ratings to projects than the lighter IEG desk review. Turning to the sector dummies, there is some evidence that transport and education sector projects score higher than average, although as discussed earlier it is hard to determine whether this reflects differences in actual project outcomes or alternatively different standards for project evaluation in these sectors. 10 example, an increase in the CPIA score of one point (on the six-point scale) is associated with roughly a half-point increase in the project outcome rating (also on a six-point scale). The results in columns (1) to (4) and (6)-(9) are based on ordinary least squares estimation. We do this primarily for ease of interpretation of some of our findings -- and particularly the role of between-country versus within-country variation in project performance that we emphasize below. For completeness, however, in columns (5) and (10) we report results based on conceptually-more appropriate probit and ordered-probit regressions, reflecting the binary nature of the satisfactory/unsatisfactory project outcome ratings used in the 1983-2011 sample, and the ordered categorical nature of the six point project outcome ratings used in the 1995-2011 sample. Naturally, the estimated magnitudes of the coefficients change somewhat in these nonlinear specifications. However, the pattern of significance of parameter estimates is identical to that observed in the corresponding OLS specifications in columns (4) and (9). An important observation here is that although these country-level variables are generally significant correlates of project performance, they jointly have rather modest explanatory power. The R-squareds of the regressions in Table 3 range from 0.07 to 0.14. Moreover, much of this in turn reflects the contribution of the controls for evaluation characteristics and the dummy variables capturing sector, sector by approval period, and sector by evaluation period effects, rather than the explanatory power of the country-level variables themselves. For example, the R-squared of a regression of project outcomes on these dummy variables alone in the post-1983 sample delivers an R- squared of 0.068, and in the post-1995 sample delivers an R-squared of 0.067. This suggests that the incremental explanatory power of the country level variables, while statistically significant, is not large. To more systematically document the limited potential role of country-level variables in accounting for project outcomes, for each year between 1985 and 2005 we select the set of projects that were active in that year, and regress the binary project outcome rating of these projects on a set of country fixed effects. The R-squareds from these regression captures the share of variance in project outcomes in a given year that is due to any type of country-level factors. The R-squareds of these regressions average 0.18 across these 21 years. Performing a similar exercise over the period 1995- 2005, and using the six-point project outcome ratings, delivers an average R-squared of 0.20. Overall, this suggests that only about 20 percent of the variation in project outcomes can be accounted for by country-level characteristics. This striking feature of the data motivates our analysis in the following 11 sections of a large set of project-level characteristics that can potentially account for the remaining 80 percent of variation in project outcomes that occurs across projects within countries. 5. Project-Level Correlates of Project Performance As we have seen in the previous section, country-level variables can explain at most 20 percent of the total variation in project outcomes observed in the data. In this section we turn to a selection of project-level variables in an attempt to explain some of the substantial remaining 80 percent of the variation in project outcomes. We begin by documenting the relationship between a number of basic project characteristics and project outcomes in Table 4. Before delving into these, we note first that our findings on the magnitude and significance of country-level factors from the previous section are essentially unchanged when we add these project-level variables. As can be seen from the first three rows in Table 4, per capita GDP growth and CPIA scores continue to be significantly associated with project outcomes, while the Freedom House measure of civil liberties and political rights is not. A first basic project characteristic is whether it finances a specific spending project or whether it provides general budget support. Nearly 90 percent of projects in our sample, accounting for about three-quarters of the total lending volumes, consist of "investment projects" that finance specific spending projects. The remaining projects are currently referred to as "development policy lending" and capture general budget support. This category also includes previous forms of budget support such as "structural adjustment loans" common during the 1980s and 1990s. We add a dummy variable indicating whether the project is an investment project, to pick up any average differences in project outcomes across these two types of projects. The evidence here is rather mixed. In the 1983-2011 sample, there is a weakly significant positive relationship in the specification controlling for all other project characteristics, indicating that investment projects on average perform slightly better than non- investment projects (columns (8) and (9)). However, this relationship loses significance in the multivariate specifications in the post-1995 sample (columns (17) and (18)). Undue project complexity is often suggested as a reason why projects ultimately turn out to have unsatisfactory outcomes. Project complexity is difficult to measure systematically across a large sample of projects, and so we consider three imperfect proxies for different dimensions of complexity. The first dimension is the extent to which a project spans multiple sectors. Recall that our sectoral classification of projects is based on data which assigns a percentage of each project to up to five major sectors. We exploit this to measure the (lack of) complexity as the largest share of the project assigned 12 to a single sector, i.e. higher values indicate less dispersion of the project across sectors, and presumably also less complexity. A second potential dimension of complexity is the extent to which a project is novel, in the sense of financing activities in a country that the World Bank had not previously financed. While we do not have any direct measures of project novelty, it is possible to identify sequences of projects that are followups of previous projects, and so presumably are less novel than the original project in the sequence. Specifically, many projects form part of a sequence of closely-related projects, and this is reflected in the project's name, e.g. Botswana Education I, II, and III. We refer to the second and higher of such projects in this sequence as "repeater" projects. To identify such projects, we searched the names of all projects for numbers and roman numerals, and created a dummy variable equal to one if a project is the second or higher in such a sequence of projects, and zero otherwise. 7 Our prior is that "repeater" projects are less complex than non-repeater projects. A third and final proxy for project complexity is simply the size of the project itself, measured as the logarithm of total World Bank financing of the project, with the presumption that larger projects are more complex.8 The estimated relationship between these various proxies for project complexity and project outcomes is surprisingly mixed. On the one hand, there is a clear and statistically significant negative correlation between project size and project outcomes -- larger projects on average obtain worse outcome ratings. On the other hand, a more direct proxy for complexity -- the repeater project variable -- is not significantly correlated with project outcomes in any of the specifications. Finally, the measure of sectoral concentration is, contrary to our expectations, negatively correlated with project outcomes. Projects that are more concentrated in a single sector tend to get lower project outcome ratings. Overall, these results suggest a lack of a systematic relationship between project outcomes and these proxies for project complexity. 7 In our sample, 23 percent of projects are classified as "repeaters". Among "repeater" projects, the median project is 2nd in its sequence, and 10 percent of repeater projects are fifth or higher in their corresponding sequences. As a robustness check, we created an alternative variable containing the number of the project in the sequence of projects to which it belongs for all repeater projects, and zero otherwise. This more continuous measure of project novelty gives similar results to those reported here using the binary classification. We are grateful to an anonymous referee for this suggestion of looking at "repeater" projects. 8 Many World Bank-financed projects also involve cofinancing from other donors, or from the recipient government, so that the total cost of the project may be larger than the amount of financing provided by the World Bank. Nevertheless, the correlation between log total project cost and log World Bank financing is 0.96. 13 In Table 4 we also consider three further project characteristics. We have information on the start and end dates of each project, which together gives us the length of time required to implement the project. We use the logarithm of length (in years) as a measure of project duration We also have data on preparation and supervision costs for each project. Preparation costs consist of World Bank staff time and travel costs devoted to designing the project prior to its approval by the World Bank's Board. Supervision costs consist of staff time and travel costs associated with monitoring the implementation of the project following its approval. We express these as a fraction of the total size of the project, and use the logarithm of these measures as additional correlates of project outcomes. All three variables are negatively correlated with project outcome ratings, and in the case of project length and supervision costs, this correlation is significant in most specifications. At first glance, this runs counter to the reasonable prior belief that devoting greater resources to designing projects and monitoring their implementation should result in better project outcomes and a faster pace of project completion. However, a common feature of these last three variables is that they are quite likely to respond to unobserved (by us) project characteristics that also matter for project outcomes. For example, a "difficult" project might very well take longer to complete, require greater preparation and supervision, and also may be more likely to turn out to be unsatisfactory.9 This endogeneity problem means that the negative partial correlations we observe between these project characteristics and project outcomes cannot be interpreted as a causal effect of the former on the latter. This concern also applies to a number of additional project characteristics we consider in the next set of results. We address this interpretation problem in greater detail in the next section of the paper. We next turn to a second set of project-level variables that can be thought of as potential "early- warning" indicators of eventual problems with a project. The first of these is simply the length of the delay between the approval date of the project, and the date at which disbursements begin. This "project effectiveness delay" is commonly thought of as a leading indicator of project difficulties, to the extent that delays in the actual start of a project reflect inadequate planning prior to project approval, and possibly also lack of borrower commitment to the project. However, it is also plausible that delaying project implementation in order to remedy deficiencies in project preparation can lead to 9 Indeed, a particular concern relevant to the earlier discussion is that high preparation and supervision costs may well be proxying for project complexity better than the other proxies discussed above. As a robustness check, we tried re-estimating the specifications in columns (8) and (17) of Table 4, dropping the preparation and supervision variables. This however did not alter the size and significance of the estimated coefficients on the direct proxies for complexity. 14 better ultimate project outcomes, so that the sign of the relation between project effectiveness delays and outcomes is a priori ambiguous. The remaining three early-warning indicators are obtained from the Implementation Status Report project monitoring process described earlier in Section 3. We have retrieved data from the end-of-fiscal-year Implementation Status Report for every project, and for each year during the life of the project. These reports provide information reported by the task manager on a variety of interim measures of project performance, on an annual basis, over the life of the project. Each year, the implementation status of the project is rated relative to the ultimate development objective, and if this rating is unsatisfactory, the project is flagged as a “problem project”. In addition, task managers indicate with a series of 12 flags whether there are concerns about specific dimensions of project performance, including problems with financial management, compliance with safeguards, quality of monitoring and evaluation, legal issues, etc. If three or more of these flags are raised at any one point in time, the project is identified as a “potential problem project”. Beyond these two summary flags, we also have information in the form of a binary variable indicating whether a project was restructured during that year of the project. Restructurings are relatively rare in our sample, and occur when there is a major change to the development objective of the project requiring approval of the World Bank's Board of Directors. While in principle this flag data is a rich source of information on leading indicators of project performance, it needs to be treated with some caution. Some of the flags are automatically triggered (for example, by objectively-measured disbursement delays, or by lags between project approval and the start of work on the project), but the decision to raise others is at the discretion of task managers, who for natural reasons may be reluctant to do so. This could be due to optimism about the ultimate outcome of the project, or reputational concerns on the part of the task manager and/or counterparts. Indeed, a perennial concern for World Bank management has been the frequency of projects exhibiting “disconnect” – projects that were rated as satisfactory throughout the implementation process but were then ultimately rated as unsatisfactory upon completion. Despite these caveats, these flags are an important set of candidate predictors of project success since they are routinely generated, and are readily available to World Bank decision makers who can in principle act on them over the course of project implementation in order to improve the ultimate outcome of the project. For each project, we construct a set of dummy variables indicating whether the project was flagged as a "problem" or "potential problem" project in the first half of the project implementation 15 period, measured in calendar years. For example, for a project lasting 6 years from approval to completion, we construct dummy variables indicating whether these flags were raised in the first three years of the project. We then investigate whether these “early warning” flags are related to the eventual outcome of the project. Creating this lag between the measured flags and the completion of the project is important for two reasons. First, our primary interest in these flags is as potential leading indicators of eventual project outcomes. In particular, we would like to investigate whether flagging a project as a “problem” or “potential problem” early in its life creates opportunities or incentives to take remedial steps to turn the project around – this is after all the point of having a process for monitoring projects over the course of implementation. Second, we would like to avoid any mechanical link between the flags and the ultimate project outcome rating. Consider for example the “problem project” flag, which is supposed to be raised in the project monitoring process if the project is not making satisfactory progress towards its development objective. This criterion is very similar to the ultimate project outcome rating which, as discussed above, captures the extent to which the project was able to meet its development objective. Finally, following the same logic, we also construct a variable indicating whether a project was restructured in the first half of its life. The results for these early-warning indicator variables are reported in Table 5. Since the early- warning variables are mostly available for projects in the latter 1995-2011 sample, we report results only for this period. Before discussing these results, we first briefly note that including these variables has little effect on our conclusions regarding the country-level and basic project-level variables first analyzed in Table 3 and Table 4. As before, per capita GDP growth and CPIA scores are significantly positively associated with project outcomes. Similarly, project dispersion across sectors, project size, and supervision costs all continue to be signficantly associated with project outcomes across all specifications reported in Table 5. Turning to the early-warning variables, delays between project approval and first disbursement are consistently significantly negatively correlated with project outcomes. This is consistent with the conventional wisdom that project effectiveness delays are a signal of incipient problems with the project, and that these persist over the life of the project, as reflected in lower average project outcome ratings.10 Another finding is that project restructurings early in the life of a project are strongly 10 A potential concern here is that the initial disbursement delay is correlated with the project length variable discussed previously, since the former is a component of the latter. However, re-estimating the specifications in column (5) of Table 5 does not affect the sign or significance of the coefficient on the initial disbursement delay. 16 significantly associated with better project outcomes, in the last two columns where all the early- warning indicators are included together in the regression. We also find that the problem and potential problem flags are both negatively correlated with ultimate project outcomes, and this correlation is strongly significant for the problem project flag. This finding suggests that even when early warning flags are raised through the Implementation and Status Results report in the first half of the life of the project, it is difficult to turn around problematic projects in order to achieve satisfactory outcomes. One way of seeing this directly is to consider the persistence of problem project flags. Consider the sample of 3283 projects covered in Table 5. Of these, 899 were flagged as problem projects in the first half of their lives, or about 27 percent of projects. Of these, 546 projects were also flagged as problem projects in the second half of their lives, representing 60 percent of those initially flagged. This persistence is also found in the final project outcomes that comprise our main dependent variable of interest. Unconditionally, projects that are flagged as a problem in the first half of their implementation period have only a 59 percent chance of yielding satisfactory results, while projects that are not flagged in their first half have a 77 percent chance of turning out satisfactorily. Interestingly, projects that are deemed problematic during their first half, but not during their second (indicating that initial problems have been resolved), have an 84 percent chance of ultimately being deemed satisfactory. As discussed earlier, an important caveat about several of these early-warning variables is that they may simply be proxying for unobserved project characteristics that ultimately matter for project outcomes. For example, difficult or challenging projects may be more likely to obtain poor project outcome ratings upon completion, and they may also be more likely to be flagged as a problem project, or be restructured, or require greater time between project approval and the commencement of disbursements. This endogeneity problem suggests that we may be underestimating any positive direct effects of these variables on project outcomes. We turn to this issue in greater detail in the next section. 6. Role of Unobserved Project-Level Factors In the previous section, we noted that the interpretation of the partial correlation between several of our explanatory variables and project outcomes was clouded by the fact that these variables might themselves be responding to unobserved project-level factors that also matter for project 17 outcomes. 11 Ideally, we would like to address this concern by some combination of (a) finding more and better measurable proxies for project quality in order to reduce the scope for such omitted variable bias, and/or (b) finding some plausibly-exogenous source of project-level variation in these explanatory variables that could be used as an instrument. However, this ideal strategy is unlikely to be feasible in the data on project characteristics that we have. Since we are working with purely observational data, there is no scope to manipulate the assignment of factors such as preparation and supervision to projects. And among the observed data on project characteristics that we have, it would be very difficult to justify classifying some variables as instruments that influence project outcomes only through their effects on other potentially endogenous variables, and that have no direct effects on outcomes. Instead, our goal in this section is much more modest: to simply quantify the magnitude of the likely biases in our OLS estimates due to these unobserved and potentially confounding effects. Our strategy is to use Bayesian methods to formally specify a range of reasonable prior beliefs about the importance of these confounding variables, and then explore quantitatively how much these priors influence posterior inferences about the slope coefficients of interest. To see how this can be done, consider the following example: (1) where is the measure of project outcomes, is a potentially endogenous project characteristic, and is an error term capturing the remaining variation in project outcomes as measured by IEG. For notational convenience, assume that all other explanatory variables have been partialled out of the relationship between and , as we will also do below in the empirical implementation.12 11 We are of course not the first to note the potential endogeneity of project variables such as preparation and supervision costs. This difficulty is noted also in Deininger, Squire and Basu (1998) and Dollar and Svensson (2000), who propose using various country and project characteristics as instruments for supervision costs. However, it is difficult to justify the required exclusion restriction that these variables matter for project outcomes only through their effects on supervision. Kilby (2000) relates lagged supervision to within-project changes over time in interim measures of project outcomes taken from Implementation Status and Results report sequences. Looking at lags and changes over time in project performance partially mitigates concerns about unobserved project characteristics driving both variables. Kilby (2012) studies the role of project preparation on project outcomes, and proposes instrumenting for preparation time using country-level measures of political influence of donors on recipients. In addition to the usual concerns about justifying the validity of the exclusion restriction (which requires that political influence matters for project outcomes only through project preparation time), a further drawback of this approach is that, by relying on country-level variation in the instrument, it cannot account for the substantial within-country across-project variation in project outcomes that we stress in this paper. 12 It is worth acknowledging at the outset that this partialling out of all of the other variables involves the implicit assumption that all these other variables are in fact exogenous in the sense of being uncorrelated with the error term. This is a strong assumption that may well not be valid, but is necessary to focus attention on the the 18 Our concern here is that part of the residual variation in project outcome ratings is due to unobserved project quality, , which also is correlated with the observed project characteristic, . Specifically, assume that: (2) where , and are independently and identically distributed with mean zero and standard deviation one, and is the standard deviation of the error term in the regression. The parameter captures the correlation between the observed project characteristic and unobserved project quality, while the parameter captures the correlation between the error term and project quality. Choosing units such that higher values of correspond to better quality projects, we assume that is positive, i.e. conditional on the observed project characteristic, , better quality projects also receive better IEG ratings. The bias in the OLS estimator of naturally depends on these two parameters, i.e. (3) Concretely, suppose that is a measure of project supervision. It seems plausible that better quality projects require less supervision, i.e. . As a result, we expect that the OLS estimator of will be biased downwards. In addition, this downward bias will be stronger the better IEG ratings track true project quality, i.e. the larger is . This is nothing more than the standard intuition for omitted variable bias from introductory econometrics. While the direction of the omitted variable bias is obvious, the more interesting question that we address in this section concerns its size. Specifically, how large is the omitted variable bias relative to the OLS point estimate of ? Is this bias sufficiently large that the true impact of supervision on project outcomes is positive, even though the corresponding OLS point estimate is negative? Is this positive effect statistically significant? The difficulty in answering these questions is that the size of the OLS bias depends on the unknown (and un-estimable) parameters and , as well as on the variance of the error term, which is also not consistently estimated using OLS in the presence of omitted variable bias. possible biases associated with the endogneity of the variable in question. More generally, these biases will also depend on (a) how correlated the variable in question is with the other right-hand -side variables, and (b) how correlated these are with the error term. The other maintained assumption through this section is that the linear specification is appropriate. However, as we have seen, probit and ordered-probit specifications deliver similar results to the linear models in Table 4 and Table 5. 19 In order to quantify this bias, we use techniques developed in Kraay (2012) to document the consequences for inference of uncertainty about exclusion restrictions in linear instrumental variables (IV) regression models.13 That paper considered the case of IV estimation of a linear regression like Equation (1) in which the instrument, , may be correlated with the error term, i.e. , where captures potential violations of the exclusion restriction. It then showed how to use standard Bayesian techniques to incorporate prior uncertainty about the validity of the exclusion restriction, captured by a non-degenerate prior distribution for , into inferences about the structural parameter of interest, . These results map directly into the present OLS setting, in which the regressor serves as its own instrument, and the omitted variable problem is reflected in a likely non-zero correlation between and the error term, . We consider a range of prior beliefs about the correlation between and the error term, , i.e. about . Since the bias in the OLS estimator depends on the product of these two correlations, there is little to be gained from considering them separately. Accordingly, we begin by fixing , i.e. this amounts to a benchmark assumption that IEG ratings fully reflect unobserved (by us) project quality. We then specify alternative priors for , i.e. the correlation between project supervision and project quality. Specifically, we assume that over the support , for different values of . Figure 2 shows examples of such priors, which are symmetric, bell-shaped, centered on , and have a constant coefficient of variation, i.e. prior uncertainty (as measured by the standard deviation of the prior) is a fixed fraction of the mean of the prior distribution. Using this prior over , combined with standard diffuse prior assumptions over the remaining parameters, we calculate the posterior distribution of , and summarize it with its 2.5th, 50th, and 97.5th percentiles. The results are reported in Table 6, for the coefficients on the potentially-endogenous variables of interest in columns (14) to (16) of Table 4 and columns (1) to (4) of Table 5. Consider for example project supervision costs, which in our simple OLS specifications was negatively and significantly correlated with project outcomes, with a slope coefficient of -0.068. As discussed earlier, this negative correlation to some extent reflects the fact that problematic projects require greater supervision, and in addition are more likely to ultimately have unsatisfactory outcome ratings. This correlation between unobserved (by us) project quality and project supervision effort is captured by the parameter . 13 For a related approach in a non-Bayesian setting, see Kiviet (2013). 20 The first column Table 6 begins with the benchmark assumption that . Under this assumption, the OLS estimate uncovers the true impact of project supervision costs on project outcomes, and accordingly the posterior distribution of is centered on the OLS estimate, and the 2.5th and 97.5th percentiles of the posterior distribution are comparable to the 95 percent confidence interval reported in Table 4 for this parameter.14 Moving to the right in Table 6, we consider successively more negative average prior beliefs about the correlation between supervision costs and unobserved project quality. For example, the middle column of Table 6 considers the case where our prior is that the correlation between the supervision costs and project quality, , ranges from -0.5 to 0, and is on average -0.25. Under this assumption, the estimate of the effect of supervision costs on project outcomes is higher than the OLS estimate, by a factor of , and is now 0.174. Moreover, the posterior 95 percent confidence interval now no longer includes zero, suggesting that this estimated impact is significantly positive. We find a similar pattern of results for several of the other potentially-endogenous project characteristics reported in Table 6. For example, the somewhat counterintuitive negative partial correlations between project length, preparation costs, and the potential problem flag become positive with only modestly strong prior beliefs about the importance of unobserved project quality in driving the OLS relationship. For all three variables, a prior that the correlation between these variables and project quality averages just -0.25, and ranges from -0.50 to 0, is sufficient to retrieve a more plausible significantly positive relationship between these variables and project outcomes. Not surprisingly, the positive estimated effect of restructurings also becomes substantially stronger if our prior belief is that these are features of poor-quality projects. The effect of the problem project flag on project outcome stands out as requiring a much stronger negative prior correlation with unobserved project quality in order to retrieve a positive effect of this flag on ultimate outcomes. For example, only in the fourth column of Table 6 do we find a significantly positive effect of raising a problem flag in the first half of a project's life on ultimate project outcomes. However, such a strong prior does not seem unreasonable, given that problem project flags are raised specifically in response to difficulties encountered in meeting the development objective of the project. 14 The two confidence intervals are not identical, since those reported in the main results reflect clustering of the standard errors at the country-year level, while this clustering of errors is not taken into account in the analysis of endogeneity here. However, this difference is minimal. For project supervision, for example, the 95 percent confidence interval implied by the results in Table 4 extends from -0.10 to -0.036, while the corresponding confidence interval in the first column of Table 6 extends from -0.097 to -0.040. 21 Overall, these results suggest that only fairly modest prior beliefs about the endogenous response of these project variables to unobserved project quality is needed to overturn the counterintuitive negative partial correlations uncovered by OLS. However, there are three important caveats to this conclusion. The first caveat is that prior uncertainty about the strength of this endogeneity problem substantially magnifies posterior uncertainty about estimated effects. This can be seen in the substantially-wider confidence intervals as we move to the right in Table 6. This is a natural consequence of accepting, through a formal prior, the fact that we are uncertain about the extent to which these observed project characteristics respond to unobserved project quality. The second caveat is that we have thus far assumed that , i.e. that the evaluations of project outcomes perfectly reflect "true" project quality. Since the correlation between the regressor and the error term is equal to , if we assume that , consistent with the plausible assumption of at least some measurement error in the IEG project outcome ratings, we would need to appeal to a much stronger correlation between the regressors and unobserved project quality in order to obtain the results shown in Table 6. The third caveat is perhaps the most important, and is best explained in the context of a specific variable, such as project supervision. We have seen that the negative OLS estimates of the effects of project supervision on project outcomes are likely to be biased downwards, due to the fact that difficult projects are naturally more likely to require more supervision, and at the same time are more likely to receive poor outcome ratings. We have also seen that only modestly strong prior beliefs about the likely endogenous response of supervision to unobserved project quality is sufficient to overturn this negative correlation and generate a statistically-significant positive effect of supervision on project outcomes. However, this does not imply that enough supervision is sufficiently effective that it can turn "bad" projects into "good" projects, at least on average in the sample of projects we are considering. To see this, consider the effect of a one-standard-deviation worsening of unobserved project quality, , in Equation (2). On the one hand, this worsens measured project outcomes by a factor of . On the other hand, this raises supervision effort by the absolute value of (recall ) , which in turn improves project outcomes by times the absolute value of . The overall effect of these two factors is . If this expression is negative, then the positive effect of increased supervision is not sufficient to counteract the deterioration in project quality, and project outcomes will be worse despite increased supervision. It is straightforward to show that, as long as (a) , as is the case for modestly strong priors about , and (b) the observed correlation between supervision and outcomes is 22 negative, as it is in the data, then the overall effect of worse project quality on project outcomes is negative, i.e. enhanced supervision alone is on average insufficient to turn "bad" projects into "good" projects.15 The same argument applies to the other potentially endogenous project characteristics analyzed in Table 6. 7. Role of Task Team Leaders As noted above, a great deal of the variation in project outcomes occurs across projects within individual countries. So far, we have investigated the contribution of a range of project characteristics and early-warning signals to explaining this variation. However, our efforts in this respect have been only modestly successful, in that the project-level variables explain only a small part of the very substantial project-level variation in outcomes. In this section, we investigate another potentially- important factor driving project performance, which is the identity of the task team leader responsible for the project. We have data on the staff identification number of the World Bank task team leader at the time of completion, for 3925 of the 4191 projects included in our sample of projects evaluated between 1995-2011. In addition, for a smaller set of 3187 projects, we also have information on the identity of the TTL at the time of each Implementation Status Report for each project. This gives us a time series on the identity of the TTL over the life of each project. We begin with some simple data description to motivate the potential role of TTL effects in accounting for the variation in project performance. We first restrict our sample to projects managed by TTLs who have been responsible for at least two projects, and in at least two different countries, so that we can potentially differentiate TTL effects from country effects. This results in a sample 2407 projects where the TTL at completion has worked on at least two projects, and in more than one country. A total of 711 TTLs and 137 countries are represented in this sample. In this sample, country fixed effects account for only 17 percent of the variation in project outcomes, consistent with what we saw earlier for the full set of project. The top panel of Table 7 considers the additional role of TTL fixed effects in accounting for project outcomes. It uses a standard two-way ANOVA table to decompose the variation in project outcomes in this sample into variation due to TTL and country effects. Specifically, the ANOVA table corresponds to a regression of project outcomes on a full set of country dummies and a full set of TTL 15 To see this, note that the covariance between supervision and project outcomes is which is negative in the data. As long as , then . 23 dummies. Not surprisingly, adding 710 additional dummy variables to capture TTL effects results in a substantial increase in model fit -- the model now accounts for 47 percent of the variation in project outcomes. More interestingly, TTL effects are jointly highly statistically significantly different from zero. In terms of magnitudes, the TTL effects are also substantial: the mean squared variation of TTL effects (which adjusts for the fact that there are more than four times as many TTL dummies as country dummies) is about two-thirds the size of the mean squared variation of country effects. The bottom panel of Table 6 considers the potential distinct effects of the TTL at the time of project approval and the TTL at the time of project completion. In this panel, we further restrict attention to a much smaller set of 846 projects where we have information on the initial and final TTLs, and in addition, the initial and final TTLs are different people, so that we can separately identify inital and final TTL effects. This sample consists of projects involving 510 distinct initial TTLs, 232 distinct final TTLs, and implemented in 47 different countries. In this sample, both initial and final TTL effects are highly significant (at the one and five percent levels, respectively), while country effects are no longer significant. In terms of magnitudes, as measured by mean squared variation, initial TTL effects are slightly more important than final TTL effects, and nearly twice as important as country effects. Overall, the evidence in this table is consistent with the idea that the identity of the TTL plays an important role in accounting for project outcomes, on the same order of magnitude as country effects. We investigate the role of TTL effects more systematically in Table 8. We begin by defining a proxy for TTL quality as the average project outcome rating for all of the other projects managed by the same TTL, excluding the project in question. To capture country effects, we consider, as before, the average over the life of the project of the country-level CPIA score. In column (1) we consider a parsimonious regression of project outcomes on these two variables. Consistent with the ANOVA tables discussed above, both TTL quality and country quality are highly significant correlates of project outcomes. Moreover, their standardized magnitudes are also quite similar. To see this, note that moving from the 25th percentile of projects to the 75th percentile of projects as ranked by the CPIA score during project implementation is an increase 0.5 points on the 6-point CPIA scale (from 3.1 to 3.6). The same movement from the 25th to the 75th percentile of projects as ranked by TTL quality is an increase of 1.25 points (from 3.5 to 4.75). Combining these changes in country and TTL quality with the estimated coefficients implies changes in project outcomes of 0.22 and 0.23 points, respectively. In column (2) we consider an alternative proxy for TTL quality, based on the average performance of the TTL on all projects prior to the one in question. The sample size is considerably 24 smaller, since out of necessity we drop the first project for each TTL in the dataset. However, we still find a highly significant effect of TTL quality, and the magnitude of this effect is similar to the benchmark results in the first column. A potential shortcoming of the results so far is that we are assessing TTL quality based on the performance of projects for which the individual was the TTL at the time of project completion. This discards information on projects the individual managed at some point during the life of the project, but not at the time of completion. To remedy this, we draw on the detailed information we have on the identity of the TTL of projects at the time of each ISR. Specifically, we measure TTL quality as a weighted average of the outcomes on each of the projects which s/he managed, with weights proportional to the number of ISRs on which the individual is identified as the TTL. The results, shown in column (3), again indicate a very strong TTL effect on project outcomes, with a magnitude similar to that in the previous two columns. We can also use the detailed information on the identity of the TTL at the time of each ISR over the life of the project to assess whether the impact of the TTL at the beginning of the project has a different effect on the ultimate outcome than the impact of the TTL at the end of the project. For all projects for which we have a full ISR sequence, we take the quality of the TTL at the beginning and at the end of the project, and enter both variables together. Again, we find highly significant partial correlations between initial and final TTL quality and ultimate project outcomes. However, there is no discernable difference in the size of the estimated effect of initial and final TTL quality. In column (5) we investigate the role of TTL turnover. A common anecdotal concern is that excessive TTL turnover leads to worse project outcomes, as information and continuity is lost each time a new TTL assumes responsibility for the project. Our data on the time series of TTLs by project allows us to investigate this hypothesis quantitatively. Specifically, for each project, we calculate the number of TTLs observed over the course of the ISR sequence for each project. We then normalize this by the total number of ISRs for the project, to adjust for differences in project length. This measure of TTL turnover is strongly significantly negatively associated with project outcomes. The estimated magnitude of the effect is also non-trivial. To put this in perspective, note that the median project in this sample has two TTLs, lasts 6 years, and has 12 ISR reports. Moving from two to three TTLs for this typical project means that our measure of TTL turnover increases from 2/12 to 3/12, i.e. by 0.08. Combining this with the estimated coefficient implies an worsening in project outcomes of 0.10 points. For 25 comparison purposes, this is roughly half the estimated impact of moving from the first to the third quartile of either TTL or country quality.16 Finally, we have also assembled data on the identity of the IEG staff member responsible for the evaluation of the project, for a subset of projects in the post-1995 period.17 This enables us to assess possible evaluator effects in driving project outcomes. It is possible, for example, that some evaluators are "tough" in the sense of giving on average lower scores to projects than their colleagues who are "easy". And to the extent that there is any purposeful matching of "tough" evaluators with high- (or low-) quality TTLs, this could bias our estimates of the role of TTL quality in driving project outcomes. We measure evaluator "toughness" analogously to TTL quality, by calculating the average project outcome rating on all the other projects scored by the same evaluator, excluding the project in question. The results in column (6) suggest a significant effect of evaluator "toughness" on project ratings. However, when in columns (7) and (8) we add all of the other explanatory variables discussed earlier in the paper, this apparent effect of evaluator "toughness" vanishes. A likely explanation for this is that evaluators tend to specialize in projects in different sectors, with different average project success rates. More importantly, in columns (7) and (8) we find that even after controlling for all the preceding correlates of project quality, we still find that TTL quality is a highly significant correlate of project outcomes. While the estimated magnitude of the effects of TTL quality and country quality (as proxied by the CPIA) decline somewhat as we add more control variables (compare for example Columns (1) and (7) in Table 8), their relative magnitude remains similar across specifications. And as noted earlier, in standardized terms this suggests that the effect of an improvement in TTL quality on project outcomes is of similar order of magnitude to the effect of better country quality on project outcomes. 16 As with several of the previous results we have discussed so far, there is a potentially important endogeneity problem here as well. To the extent that the identity of the TTL of the project changes in response to some unobserved factors driving poor performance (for example, management appointing a new TTL in the hopes of turning around a problematic project), then we would expect the OLS results to yield a downward-biased estimate of the effects of turnover on project performance. 17 This data comes from two sources. For all projects since 1995 subject to IEG desk reviews, we received anonymized data on the identity of the desk reviewer from IEG's internal evaluation database. IEG does not however electronically track the identity of the IEG staff responsible for the more detailed Project Performance Audit Reports. We therefore manually coded this information by retrieving the documents from the Bank's electronic archives for 1150 PPARs completed since 1995. We then constructed dummy variables for the identity of the reviewer based on the full name of the reviewer for the PPARs, and on the anonymized data for the IEG desk reviews. While there is overlap between these two groups of reviewers, we unfortunately cannot take this into account in the construction of the reviewer dummies since the latter group is anonymized. We therefore implicitly assume that the same IEG staff member applies potentially different standards in their desk reviews and their more detailed PPAR reviews. 26 Overall, the results in this section suggest that there is an important "human factor" in driving project outcomes, that is associated with TTL characteristics. There are however at least two important caveats to this finding. The first is that observing TTL effects in project outcomes does not tell us specifically which characteristics of project managers are responsible for the variation we observe in average project outcomes across TTLs. This suggests that there is considerable scope for further analysis to identify such potential characteristics, such as education, experience, career path, etc. that might matter for project outcomes. It is even possible that these TTL effects do not reflect TTL effort or qualifications, but rather the process of assignment of projects to TTLs. For example, it is possible that some TTLs are "well-connected" and use their influence to choose "easy" projects that are more likely to have good outcomes, or to set "unambitious" development objectives that then are easy to attain.18 The second caveat is that TTL effects surely are not the only "human factor" driving project outcomes. Quite plausibly also there may be variation in project outcomes associated with differences in managerial quality at higher levels above that of the task manager, for example reflecting the abilities and priorities of the relevant country or regional managers in the World Bank's hierarchy. For lack of systematic data, we have also not studied the role of counterparts in the country in which the project is located. Anecdotally at least, the identity and skill-set of the specific counterpart agency, or even individuals in these agencies, matters a lot for project outcomes as well. 8. Interpretation, Implications, and Conclusions We have analyzed correlates of project outcomes for a very large set of World Bank projects since the early 1980s, distinguishing between country-level correlates of country-average project performance, and project-level correlates of the variation in project outcomes within countries. This distinction is important as roughly 80 percent of the variation in project outcomes occurs across projects within countries, rather than between countries. Consistent with existing literature, we find that country-level variables, most notably the CPIA measure of policy and institutional quality, are robust partial correlates of country-level variation in project performance. This basic finding underscores the importance of country-level selectivity in aid allocation. In the case of the World Bank, this country-level selectivity is primarily implemented through the Performance Based Allocation system for IDA 18 On this last point, however, there are some safeguards in place. During the concept stage, project teams are routinely questioned about the "realism" of project objectives, and this is further reviwed during the project appraisal stage. In addition, in recent years at least Bank staff from the Quality Assurance Group (QAG) with global expertise with project preparation and implementation review the project and advise the team and management about various aspects of it, including its development objectives. 27 resources, which emphasizes "macro" country-level measures of policy and institutional quality such as the CPIA in determining the cross-country allocation of aid. However, since most of the variation in project outcomes occurs within countries across projects, the bulk of our effort is devoted exploring project-level "micro" variables that could potentially account for some of this variation. For example, we find that restructured projects perform better-than- average following their restructuring, underscoring the effectiveness of this particular intervention to turn around underperforming projects. On the other hand, we consistently find a statistically- significant negative partial correlation between project preparation and supervision expenditures and project outcomes, as well as significant negative partial correlations between project effectiveness delays as well as early-warning indicators flagging "problem" and "potential problem" projects and ultimate project outcomes. Interpreting these negative correlations is complicated by the fact that difficult projects are both more likely to be unsuccessful in attaining their development objectives, and also more likely to require greater preparation and supervision and to trigger early warning flags and accompanying remedial actions. Convincingly addressing this endogeneity problem is difficult, but we have shown that only modestly-strong prior beliefs about the strength of the feedback effect from project quality to preparation and supervision, for example, are sufficient to generate an intuitively-plausible significant positive estimated effect of these interventions on project outcomes. However, these estimated effects are not sufficiently large that intrinsically "bad" projects can -- on average -- be turned around to yield successful outcomes through greater preparation or supervision. This suggests that there are returns to (a) improving the process of identifying and selecting projects at the very beginning of the project cycle, as well as (b) strengthening supervision and responses to problem project flags, including potentially more frequent use of project restructurings. Some of our findings also call into question some of the conventional wisdom regarding determinants of project outcomes. For example, a commonly-held view is that more complex projects are less likely to turn out to be successful. Yet, of the three proxies for project complexity that we have studied, we find only some evidence that larger--and so possibly more complex--projects are less likely to be successful. On the other hand, greater dispersion of a project across sectors is in fact significantly associated with better project outcomes, and whether a project is a "repeater" project or not does not seem to matter much for outcomes. 28 Another finding with important policy implications is that task team leader characteristics are significantly correlated with project outcomes. Simple analysis of variance suggests that task team leader fixed effects are of comparable importance to country fixed effects in accounting for the variation in project outcomes observed in the data. More specifically, task team leader quality, as proxied by average project outcomes in the rest of a task team leader's portfolio, is strongly significantly correlated with project outcomes. One immediate policy implication of this finding is the importance of internal practices to develop and propagate task manager skills in order to ensure better project outcomes. More generally, there may be returns to taking task team leader characteristics into account when making aid allocation decisions, to complement the current practice of allocating aid based on country characteristics. One approach might be to consider strengthening incentives for task team leaders with a record of success to work in countries where average project performance has been poor. Another might be to consider allocating aid in part to task team leaders rather than countries. For example, a portion of total World Bank assistance could be allocated through a fund to which task team leaders could submit "proposals" for development projects, much in the same way that researchers submit grant proposals to finance research projects. The criteria for judging proposals could then explicitly consider not just the usual merits of the project itself (including the characteristics of the country in which the project is to be implemented, the consistency of the project with the country's development strategy, the degree of country "ownership" of the project, etc.), but also the track record of the task team leader proposing the project. This would give decisionmakers flexibility to consider an appropriate weighting of project and task team leader characteristics when selecting projects for financing. The final policy implication comes from the humbling fact that, even after accounting for a wide range of micro and macro variables, much of the variation in project performance remains unexplained. After all, in our core specifications we can account for between 13 and 16 percent of the variation in measured project outcomes. Part of this low explanatory power may simply be due to measurement error in the IEG assessments of project outcomes, pointing to the importance of developing more robust tools for capturing project performance. But at the same time, much of this variation is likely to be real, and it reflects a wide range of as-yet-unmeasured factors at both the country and project levels. Developing empirical proxies for these other factors, and thinking creatively about how to use them to design selectivity at both the country and project levels based on such factors, will ultimately help to 29 improve overall aid effectiveness, not just for the World Bank, but for other aid donors that finance and implement project-based aid as well. References Arndt, Channing, Sam Jones, and Finn Tarp (2010). “Aid, Growth, and Development: Have We Come Full Circle?” Journal of Globalization and Development. 1(2):1-27. Burnside, Craig, and David Dollar (2000). “Aid, Policies, and Growth”. American Economic Review. 90(4):847-868. Chauvet, Lisa, Paul Collier, and Margeurite Duponchel (2010). “What Explains Aid Project Success in Post-Conflict Situations?”. World Bank Policy Research Working Paper No. 5418. Chauvet, Lisa, Paul Collier, and Andreas Fuster (2006). “Supervision and Project Performance: A Principal-Agent Approach”. Manuscript, DIAL. Clemens, Michael, Steven Radelet, and Rikhil Bhavnani (2012). “Counting Chickens When They Hatch: Timing and the Effects of Aid on Growth. The Economic Journal. 122:590-619. Deininger, Klaus, Lyn Squire, and Swati Basu (1998). “Does Economic Analysis Improve the Quality of Foreign Assistance?”. World Bank Economic Review. 12(3):385-418. Dollar, David and Jakob Svensson (2000). “What Explains the Success and Failure of Structural Adjustment Programs?”. The Economic Journal. 110():894-917. Dollar, David and Victoria Levin (2005). “Sowing and Reaping: Institutional Quality and Project Outcomes in Developing Countries”. World Bank Policy Research Working Paper No. 3524. Doucouliagos, H., and Paldam, M. (2009). “The Aid Effectiveness Literature: The Sad Results of 40 Years of Research”. Journal of Economic Surveys. 23(3): 433-461. Dreher, Axel, Stephan Klasen, James Raymond Vreeland, and Eric Werker (2010). “The Costs of Favouritism: Is Politically-Driven Aid Less Effective?. CESIFO Working Paper No. 2993. Easterly, William, Ross Levine, and David Roodman (2004). “Aid, policies, and growth: A Comment”. American Economic Review. 94(3):774-780. Gelb, Alan (2010). “How Can Donors Create Incentives for Results and Flexibility for Fragile States? A Proposal for IDA”. Center for Global Development, Working Paper No. 227. Guillaumont, Patrick and Rachid Laajaj (2006). “When Instability Increases the Effectiveness of Aid Projects”. World Bank Policy Research Working Paper No. 4034. Hansen, Henrik, and Finn Tarp (2000). “Aid Effectiveness Disputed”. Journal of International Development. 12: 375-398 30 Isham, Jonathan and Daniel Kaufmann (1999). “The Forgotten Rationale for Policy Reform: The Productivity of Investment Projects”. Quarterly Journal of Economics. 114(1):149-184 Isham, Jonathan, Daniel Kaufmann and Lant Pritchett (1997). “Civil Liberties, Democracy, and the Performance of Government Projects”. World Bank Economic Review. 11(2): 219-242. Khwaja, Asim Ijaz (2009). "Can Good Projects Succeed in Bad Communities?" Journal of Public Economics. 93: 899-916. Kiviet, Jan (2013). "Identification and Inference in a Simultaneous Equation Under Alternative Information Sets and Sampling Schemes". The Econometric Journal. 16:S24-S59. Kilby, Christopher (2000). “Supervision and Performance: The Case of World Bank Projects”. Journal of Development Economics. 62: 233-259. Kilby, Christopher (2011). "The Political Economy of Project Preparation: An Empirical Analysis of World Bank Projects". Villanova School of Business Economics Working Paper No. 14. Kilby, Christopher (2012). "Assessing the Contribution of Donor Agencies to Aid Effectiveness: The Impact of World Bank Preparation on Project Outcomes". Villanova School of Business Economics Working Paper No. 20. Kraay, Aart (2012). "Instrumental Variables Regressions with Uncertain Exclusion Restrictions: A Bayesian Approach". Journal of Applied Econometrics. 27:108-128. Minoiu, C., and Reddy, Sanjay (2009). “Development Aid and Economic Growth: A Positive Long Term Relation”. IMF Working Paper 09/118. Pohl, Gerhard and Dubravko Mihaljek (1998). “Project Evaluation and Uncertainty in Practice: A Statistical Analysis of Rate-of-Return Divergences in 1015 World Bank Projects”. World Bank Economic Review. 6(2): 255-257. Rajan, Raghuram, and Arvind Subramanian (2008). “Aid and growth; What does the cross-country evidence really show?” Review of Economics and Statistics. 90(4):643-665. Roodman, David (2007). “The anarchy of numbers: Aid, Development, and Cross-Country Empirics”. World Bank Economic Review. 21(2): 255-277 Temple, Jonathan (2010). “Aid and Conditionality” in Handbook of Development Economics, Rodrik. D., and Rosenzweig, M. (eds)., Volume 5., Elsevier BV. (4415-4523). Wane, Waly (2004). “The Quality of Foreign Aid: Country Selectivity or Donor Incentives?”. World Bank Policy Research Department No. 3325. World Bank (2010). “Cost-Benefit Analysis in World Bank Projects”. Independent Evaluation Group. 31 Figure 1: Average Satisfactory Ratings Over Time, By Type of Evaluation 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Detailed IEG Review 0.2 IEG Desk Review 0.1 Implementation Completion Report 0 1980 1985 1990 1995 2000 2005 2010 2015 Notes: This figure shows the average evaluation score (on a 0=unsuccessful/1=successful scale) across all projects corresponding to the indicated evaluation type, by year in which the evaluation occurred. 32 Figure 2: Priors for Omitted Variable Bias 12 fmin=-0.25 10 8 6 fmin=-0.50 fmin=-0.75 4 fmin=-1.00 2 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 Correlation Between Regressor and Error Term ( r ) Notes: This figure shows alternative prior densities for the correlation between the regressor and the error term ( . The priors are Beta[5,5] densities over the support , for the indicated values of . More negative values of correspond to stronger prior beliefs that observed project variables respond to unobserved project quality. 33 Table 1: Distribution of Projects Across Sectors 1983-2011 Sample 1995-2011 Sample Number % of Total Number % of Total Agriculture 1,263 19.23 524 12.5 Transport 770 11.72 469 11.19 Public Administration 974 14.83 892 21.28 Energy 691 10.52 388 9.26 Education 607 9.24 428 10.21 Finance 429 6.53 245 5.85 Water 456 6.94 330 7.87 Industry 477 7.26 325 7.75 Health 525 7.99 475 11.33 Other 377 5.74 115 2.74 Total 6,569 100 4,191 100 Note: This table reports the distribution of the number of World Bank projects across the 10 indicated major sectors. The two sets of columns refer to projects evaluated between 1983 and 2011, and between 1995 and 2011. 34 Table 2: Summary Statistics Notes: This table reports summary statistics on measured project outcomes (on a 0/1 and 1-6 scale), as well as summary statistics on all of the correlates of project outcomes reported in Table 3, Table 4, and Table 5. The two sets of columns refer to projects evaluated between 1983 and 2011, and between 1995 and 2011. 35 Table 3: Country-Level Variables and Project Outcomes Panel A: All Projects Evaluated 1983-2011 (1) (2) (3) (4) (5) Dependent Variable Is: Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Real GDP Per Capita Growth 3.036*** 1.915*** 6.832*** (13.82) (8.53) (8.40) CPIA Rating 0.168*** 0.118*** 0.379*** (15.77) (9.70) (9.31) Freedom House Rating 0.0179*** 0.00434 0.00634 (3.64) (0.99) (0.43) Number of Observations 6569 6569 6569 6569 6569 R-Squared 0.104 0.112 0.071 0.122 Panel B: All Projects Evaluated 1995-2011 (6) (7) (8) (9) (10) Dependent Variable Is: 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating Real GDP Per Capita Growth 8.147*** 4.839*** 4.567*** (10.25) (6.36) (6.15) CPIA Rating 0.660*** 0.533*** 0.513*** (15.97) (10.81) (10.60) Freedom House Rating 0.0607*** 0.0143 0.0101 (3.31) (0.88) (0.66) Number of Observations 4191 4191 4191 4191 4191 R-Squared 0.103 0.132 0.072 0.143 Sector Dummies Y Y Y Y Y Sector x Evaluation Period Dummies Y Y Y Y Y Sector x Approval Period Dummies Y Y Y Y Y Estimation Method OLS OLS OLS OLS Probit/O-Probit Note: *** (**) (*) denotes significance at the 1 (5) (10) percent level. T-statistics based on heteroskedasticity-consistent standard errors clustered at the country-evaluation-year level are reported in parentheses. All regressions are estimated pooling all projects in the indicated sample, and including sector fixed effects, sector times approval-period fixed effects, and sector times evaluation-period fixed effects. Panel A refers to the full set of projects evaluated over the period 1983-2011, and Panel B refers to the subset of projects evaluated between 1995 and 2011. 36 Table 4: Project-Level Variables and Project Outcomes All Projects Evaluated 1983-2011 (1) (2) (3) (4) (5) (6) (7) (8) (9) Dependent Variable Is: Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Sat/Unsat Real GDP Per Capita Growth 1.921*** 1.945*** 1.913*** 1.918*** 1.932*** 1.912*** 1.899*** 1.928*** 6.949*** (8.56) (8.67) (8.52) (8.52) (8.56) (8.54) (8.51) (8.54) (8.36) CPIA Rating 0.118*** 0.119*** 0.118*** 0.119*** 0.116*** 0.116*** 0.111*** 0.112*** 0.364*** (9.70) (9.81) (9.69) (9.54) (9.55) (9.41) (8.86) (8.90) (8.68) Freedom House Rating 0.00431 0.00400 0.00441 0.00435 0.00419 0.00404 0.00412 0.00320 0.00247 (0.99) (0.92) (1.01) (0.99) (0.95) (0.93) (0.94) (0.73) (0.17) Dummy for Investment Projects -0.0122 0.0489* 0.170* (-0.51) (1.73) (1.90) Share of Project in Largest Sector -0.00105*** -0.00111*** -0.00347*** (-3.15) (-3.31) (-3.04) Dummy for Repeater Projects 0.00747 0.00323 0.0183 (0.59) (0.25) (0.43) Log(Total Project Size) -0.00269 -0.0486*** -0.192*** (-0.50) (-4.46) (-4.80) Project length (years) -0.0132*** -0.00523 -0.0161 (-3.46) (-1.12) (-1.04) Log(Preparation Costs/Total Size) -0.00634 -0.00664 -0.0174 (-1.20) (-0.83) (-0.63) Log(Supervision Costs/Total Size) -0.0143*** -0.0479*** -0.194*** (-2.82) (-4.55) (-4.52) Number of Observations 6569 6569 6569 6569 6569 6569 6569 6569 6569 R-Squared 0.122 0.124 0.122 0.122 0.124 0.122 0.124 0.130 Sector Dummies Y Y Y Y Y Y Y Y Y Sector x Evaluation Period Dummies Y Y Y Y Y Y Y Y Y Sector x Approval Period Dummies Y Y Y Y Y Y Y Y Y Estimation Method OLS OLS OLS OLS OLS OLS OLS OLS Probit Note: Table continues on next page 37 Table 4, Cont'd: Project-Level Variables and Project Outcomes All Projects Evaluated 1995-2011 (10) (11) (12) (13) (14) (15) (16) (17) (18) Dependent Variable Is: 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating Real GDP Per Capita Growth 4.967*** 4.964*** 4.839*** 4.853*** 4.878*** 4.918*** 4.897*** 5.008*** 4.747*** (6.53) (6.48) (6.36) (6.41) (6.43) (6.55) (6.55) (6.60) (6.40) CPIA Rating 0.531*** 0.543*** 0.532*** 0.523*** 0.527*** 0.509*** 0.486*** 0.499*** 0.483*** (10.75) (11.01) (10.80) (10.27) (10.71) (10.19) (9.56) (9.77) (9.72) Freedom House Rating 0.0141 0.0127 0.0145 0.0155 0.0126 0.0148 0.0176 0.0108 0.00632 (0.87) (0.79) (0.89) (0.95) (0.77) (0.92) (1.10) (0.67) (0.41) Dummy for Investment Projects -0.203*** 0.0771 0.115 (-2.74) (0.81) (1.28) Share of Project in Largest Sector -0.00329*** -0.00250*** -0.00305*** (-3.61) (-3.28) (-2.85) Dummy for Repeater Projects 0.0143 -0.0126 -0.0286 (0.31) (-0.27) (-0.64) Log(Total Project Size) 0.0193 -0.136*** -0.132*** (1.06) (-3.72) (-3.74) Project length (years) -0.0640*** -0.0307** -0.0443*** (-5.38) (-2.11) (-3.26) Log(Preparation Costs/Total Size) -0.0500*** -0.0419 -0.0458* (-2.68) (-1.46) (-1.67) Log(Supervision Costs/Total Size) -0.0681***-0.137*** -0.134*** (-4.20) (-3.93) (-3.97) Number of Observations 4191 4191 4191 4191 4191 4191 4191 4191 4191 R-Squared 0.144 0.146 0.143 0.143 0.148 0.145 0.147 0.156 Sector Dummies Y Y Y Y Y Y Y Y Y Sector x Evaluation Period Dummies Y Y Y Y Y Y Y Y Y Sector x Approval Period Dummies Y Y Y Y Y Y Y Y Y Estimation Method OLS OLS OLS OLS OLS OLS OLS OLS O-Probit Note: *** (**) (*) denotes significance at the 1 (5) (10) percent level. T-statistics based on heteroskedasticity-consistent standard errors clustered at the country-evaluation-year level are reported in parentheses. All regressions are estimated pooling all projects in the indicated sample, and including sector fixed effects, sector times approval-period fixed effects, and sector times evaluation-period fixed effects. Columns (1)-(9) refer to the full set of projects evaluated over the period 1983- 2011, and Columns (10)-(18) refer to the subset of projects evaluated between 1995 and 2011. 38 Table 5: Early Warning Indicators All Projects Evaluated 1995-2011 (1) (2) (3) (4) (5) (6) Dependent Variable Is: 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating Real GDP Per Capita Growth 5.618*** 5.910*** 5.525*** 5.830*** 5.363*** 5.055*** (5.71) (5.96) (5.64) (5.90) (5.52) (5.38) CPIA Rating 0.457*** 0.453*** 0.407*** 0.419*** 0.389*** 0.400*** (6.98) (6.90) (6.27) (6.25) (5.89) (6.32) Freedom House Rating 0.0226 0.0227 0.0269 0.0256 0.0277 0.0200 (1.25) (1.25) (1.48) (1.40) (1.52) (1.15) Dummy for Investment Projects 0.0608 0.0312 -0.0254 0.0111 -0.00585 0.0285 (0.46) (0.23) (-0.19) (0.08) (-0.04) (0.23) Share of Project in Largest Sector -0.00313*** -0.00347*** -0.00353*** -0.00343*** -0.00339*** -0.00289*** (-2.91) (-3.26) (-3.32) (-3.23) (-3.17) (-2.83) Dummy for Repeater Projects 0.0262 0.0187 0.000206 0.0195 0.00690 0.00146 (0.45) (0.32) (0.00) (0.33) (0.12) (0.03) Log(Total Project Size) -0.236*** -0.220*** -0.170*** -0.211*** -0.181*** -0.185*** (-4.13) (-3.88) (-3.00) (-3.72) (-3.15) (-3.42) Project length (years) 0.00915 -0.00359 0.0194 0.00196 0.0261 0.0103 (0.50) (-0.20) (1.09) (0.11) (1.43) (0.59) Log(Preparation Costs/Total Size) -0.0247 -0.0258 -0.0343 -0.0245 -0.0337 -0.0370 (-0.77) (-0.81) (-1.11) (-0.78) (-1.08) (-1.22) Log(Supervision Costs/Total Size) -0.256*** -0.245*** -0.188*** -0.238*** -0.192*** -0.197*** (-4.04) (-3.86) (-2.95) (-3.74) (-3.01) (-3.23) Time from Approval to -0.145*** -0.0990** -0.0977** First Disbursement (quarters) (-3.33) (-2.27) (-2.36) Dummy for Restructuring 0.103 0.253** 0.276*** During First Half of Project (0.94) (2.31) (2.64) Dummy for Problem Project Flag -0.392*** -0.387*** -0.351*** During First Half of Project (-7.52) (-7.37) (-7.42) Dummy for Potential Problem Flag -0.138** -0.0940 -0.0639 During First Half of Project (-2.34) (-1.59) (-1.18) Number of Observations 3283 3283 3283 3283 3283 3283 R-Squared 0.160 0.157 0.174 0.159 0.178 Sector Dummies Y Y Y Y Y Y Sector x Evaluation Period Dummies Y Y Y Y Y Y Sector x Approval Period Dummies Y Y Y Y Y Y Estimation Method OLS OLS OLS OLS OLS O-PROBIT Note: *** (**) (*) denotes significance at the 1 (5) (10) percent level. T-statistics based on heteroskedasticity-consistent standard errors clustered at the country-evaluation-year level are reported in parentheses. All regressions are estimated pooling all projects in the indicated sample, and including sector fixed effects, sector times approval-period fixed effects, and sector times evaluation-period fixed effects. 39 Table 6: Robustness to Endogeneity Range of Prior Values for Correlation Between Regressor and Error Term ( r ) [0,0] [-.025,0] [-.50,0] [-.75,0] [-1.00,0] 1) Project length (years) P025 -0.087 -0.029 0.017 0.057 0.098 P50 -0.064 0.032 0.135 0.248 0.380 P975 -0.041 0.094 0.265 0.503 0.927 2) Log(Preparation Costs/Total Size) P025 -0.082 -0.002 0.059 0.117 0.180 P50 -0.050 0.083 0.222 0.376 0.558 P975 -0.018 0.168 0.402 0.722 1.289 3) Log(Supervision Costs/Total Size) P025 -0.097 -0.023 0.029 0.082 0.137 P50 -0.068 0.051 0.174 0.310 0.480 P975 -0.040 0.127 0.341 0.620 1.135 4) Time from Approval to First Disbursement (quarters) P025 -0.227 -0.036 0.094 0.234 0.358 P50 -0.146 0.153 0.465 0.811 1.225 P975 -0.065 0.345 0.877 1.587 2.841 5) Dummy for Restructuring During First Half of Project P025 -0.128 0.408 0.780 1.171 1.491 P50 0.102 0.956 1.856 2.841 3.959 P975 0.338 1.521 3.031 5.028 8.900 6) Dummy for Problem Project Flag During First Half of Project P025 -0.484 -0.272 -0.119 0.042 0.189 P50 -0.393 -0.052 0.309 0.696 1.180 P975 -0.300 0.173 0.781 1.584 3.054 7) Dummy for Potential Problem Flag During First Half of Project P025 -0.241 0.002 0.180 0.344 0.505 P50 -0.139 0.249 0.657 1.112 1.639 P975 -0.032 0.507 1.187 2.103 3.832 Notes: This table reports 2.5th, 50th, and 97.5th percentiles of the posterior distribution for the slope coefficient on the potentially-endogenous variables of interest in columns (14) to (16) of Table 4 and columns (1) to (5) of Table 5, for the range of prior values for the correlation between the regressor and the error term indicated in the columns. The distance from the 2.5th to 97.5th percentile is a Bayesian 95 percent highest posterior density interval analogous to a standard 95 percent confidence interval. Results are based on 10,000 draws from the posterior distribution of the slope coefficient of interest. 40 Table 7: Task Team Leader and Country Effects -- Analysis of Variance of Project Outcomes Partial Sum of Degrees of Mean Sum of Squares Freedom Squares F-Statistic Prob>F Panel A: 2407 Projects with TTL at Completion (R-Sq = 0.47) Final TTL Effects 1096 710 1.54 1.30 0.00 Country Effects 316 136 2.32 1.96 0.00 Residual 1848 1560 1.18 Total 3529 2406 Panel B: 846 Projects With Different TTL at Approval and Completion (R-Sq=0.96) Initial TTL Effects 630 509 1.24 1.71 0.01 Final TTL Effects 245 231 1.06 1.45 0.04 Country Effects 31 46 0.67 0.93 0.60 Residual 43 59 Total 1030 845 Notes: This table reports a standard ANOVA table for the two indicated samples of projects. The variation in project outcomes is decomposed into variation attributable to country fixed effects, task team leader (TTL) effects, and residual variation. Panel A considers only the identity of the TTL at project completion, while Panel B allows for fixed effects corresponding to the initial and final TTL of each project. 41 Table 8: Task Manager and Country Effects, 1995-2011 All Projects Evaluated 1995-2011 (1) (2) (3) (4) (5) (6) (7) (8) Dependent Variable Is: 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating CPIA Rating 0.539*** 0.504*** 0.413*** 0.363*** 0.471*** 0.458*** 0.307*** 0.366*** (10.63) (8.26) (6.73) (5.97) (7.84) (7.39) (3.29) (3.69) TTL Quality (Average Outcome on 0.180*** 0.167*** 0.131*** 0.104*** 0.109*** all Other Projects) (6.29) (5.19) (3.99) (2.75) (2.81) TTL Quality (Average Outcome on all 0.148*** Previous Projects) (5.07) TTL Quality (ISR-Weighted Average 0.188*** Outcome on all Other Projects) (4.32) TTL Quality (At Project Approval) 0.184*** (4.79) TTL Quality (At Project Completion) 0.176*** (4.23) TTL Turnover (Number of TTLs per ISR) -1.282*** -1.648*** -1.833*** (-6.08) (-4.83) (-5.29) Evaluator "Toughness" (Average 0.271*** 0.0433 0.0587 Outcome of all Other Projects Rated (3.50) (0.51) (0.66) By Same Evaluator) Number of Observations 2407 1706 1783 1783 1895 1672 1265 1265 R-Squared 0.084 0.082 0.049 0.081 0.089 0.059 0.227 Sector Dummies N N N N N N Y Y Sector x Evaluation Period Dummies N N N N N N Y Y Sector x Approval Period Dummies N N N N N N Y Y Estimation Method OLS OLS OLS OLS OLS OLS OLS O-PROBIT Controls Included N N N N N N Y Y Note: *** (**) (*) denotes significance at the 1 (5) (10) percent level. T-statistics based on heteroskedasticity-consistent standard errors clustered at the country-evaluation-year level are reported in parentheses. All regressions are estimated pooling all projects in the indicated sample, and for which relevant data on the identity of the task team leader is available. The specifications in columns (7) and (8) including sector fixed effects, sector times approval-period fixed effects, and sector times evaluation-period fixed effects, and all of the country and project characteristics included in Table 5. 42