Brenner et al. BMC Health Services Research 2014, 14:180 102442 http://www.biomedcentral.com/1472-6963/14/180 STUDY PROTOCOL Open Access Design of an impact evaluation using a mixed methods model – an explanatory assessment of the effects of results-based financing mechanisms on maternal healthcare services in Malawi Stephan Brenner1, Adamson S Muula2, Paul Jacob Robyn3, Till Bärnighausen4,5, Malabika Sarker1, Don P Mathanga2, Thomas Bossert4 and Manuela De Allegri1* Abstract Background: In this article we present a study design to evaluate the causal impact of providing supply-side performance-based financing incentives in combination with a demand-side cash transfer component on equitable access to and quality of maternal and neonatal healthcare services. This intervention is introduced to selected emergency obstetric care facilities and catchment area populations in four districts in Malawi. We here describe and discuss our study protocol with regard to the research aims, the local implementation context, and our rationale for selecting a mixed methods explanatory design with a quasi-experimental quantitative component. Design: The quantitative research component consists of a controlled pre- and post-test design with multiple post-test measurements. This allows us to quantitatively measure ‘equitable access to healthcare services’ at the community level and ‘healthcare quality’ at the health facility level. Guided by a theoretical framework of causal relationships, we determined a number of input, process, and output indicators to evaluate both intended and unintended effects of the intervention. Overall causal impact estimates will result from a difference-in-difference analysis comparing selected indicators across intervention and control facilities/catchment populations over time. To further explain heterogeneity of quantitatively observed effects and to understand the experiential dimensions of financial incentives on clients and providers, we designed a qualitative component in line with the overall explanatory mixed methods approach. This component consists of in-depth interviews and focus group discussions with providers, service user, non-users, and policy stakeholders. In this explanatory design comprehensive understanding of expected and unexpected effects of the intervention on both access and quality will emerge through careful triangulation at two levels: across multiple quantitative elements and across quantitative and qualitative elements. Discussion: Combining a traditional quasi-experimental controlled pre- and post-test design with an explanatory mixed methods model permits an additional assessment of organizational and behavioral changes affecting complex processes. Through this impact evaluation approach, our design will not only create robust evidence measures for the outcome of interest, but also generate insights on how and why the investigated interventions produce certain intended and unintended effects and allows for a more in-depth evaluation approach. Keywords: Mixed methods, Impact evaluation, Performance-based incentives, Study design * Correspondence: manuela.de.allegri@urz.uni-heidelberg.de 1 Institute of Public Health, Ruprecht-Karls-University, Heidelberg, Germany Full list of author information is available at the end of the article © 2014 Brenner et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Brenner et al. BMC Health Services Research 2014, 14:180 Page 2 of 17 http://www.biomedcentral.com/1472-6963/14/180 Background Traditional input-based forms of healthcare financing The strategic purchasing of healthcare services, together tend to invest relatively large amounts of funds into nu- with the generation of sufficient financial resources for merous healthcare input factors, such as health facility in- health and adequate risk pooling mechanisms, repre- frastructure, healthcare personnel, technical equipment sents an essential function of any health care financing and supplies. In contrast, PBI strategies tie financial incen- system [1]. Otherwise defined as “financing of the tives directly to the expected healthcare outputs. To do so, supply side”, the purchasing function determines what PBI models introduce contractual frameworks that define services are bought, in what quantity, for whom, from not only the roles and responsibilities of purchasers and which providers, and according to what payment mo- providers, but also clearly outline the output targets dalities [2]. In general, purchasing arrangements are ex- and output-dependent incentives, the result verification pected to set the right incentives for providers to deliver processes confirming the delivery of such outputs, and the adequate amount of quality healthcare services to all the payment mechanisms in response to the obtained those entitled to receive care within existing coverage con- results [11]. ditions [3]. In LMICs, PBI have been introduced to improve the In low- and middle-income countries (LMICs), health quality of the services delivered and to increase health ser- care purchasing models traditionally comprise mixtures vice utilization. Most PBI strategies have a service quality of direct or indirect input-based arrangements (i.e. salaries, focus and incentivize healthcare providers to adhere to commodities, capital investments) directly covered and clinical standards, to participate in training and accredit- managed by national governments, and direct output- ation programs, or to respect patient-centeredness, which based fee for service payments (i.e. formal and informal all are considered pathways of supply-driven service user fees) [4,5]. Health care purchasing models in which utilization leading to improved health outcomes [12]. public sector providers heavily rely on input-based finan- Besides the targeting of providers, in some instances, cing do not create strong enough incentives to deliver PBI have also been used to counteract health service the sufficient quantity and/or quality of services. These under-utilization through specific targeting of client models have long been proven to suffer from a number demand and service availability [13]. of flaws and inefficiencies, including high rates of pro- Two common forms of PBI are Performance-Based vider absenteeism, poor quality of service delivery, and Financing (PBF) and Conditional Cash Transfers (CCT). frequent drug shortages [6-8]. In addition, the system’s By definition, PBF programs target the supply and deliv- extreme dependency on direct user payments shifts the ery of healthcare services by incentivizing healthcare main responsibility to cover healthcare costs to the ill at providers, either as individuals or in form of entire point of service use. Especially in LMICs, the combin- health facilities. Financial rewards are usually paid in ation of weak provider incentives and high fee for ser- form of salary top-ups based on fee-for-service payments vice user payments further enhances already existing in relation to quality service outputs [10,14,15]. CCT gaps in service coverage by making care inaccessible to programs target demand for or utilization of healthcare many communities [9]. services. Their beneficiaries are healthcare users who In response to the alarming gaps in coverage and quality are incentivized to enroll into specific health programs or of care observed across LMICs, a new set of purchasing to comply with certain health-related behaviors. Direct models has emerged as a potential alternative to trad- financial payments to users are thus related to the degree itional input-based financing arrangements. A common of compliance [10,16]. feature underlying these new purchasing models is the Although a promising feature in public sector health focus on health service outputs. Encompassing a variety of service regulation, there is only limited evidence of the implementation experiences, outputs – defined in terms effectiveness of PBI programs on healthcare outcomes in of quantity and/or quality of services delivered – function Sub-Saharan Africa. Since PBI programs incentivize as the basis against which health authorities determine mainly quantity or quality of healthcare outputs, effects and authorize provider payments. As such, these output- on healthcare outcomes are more difficult to capture based financing arrangements are commonly labelled and depend on how predictive an output measure is for Results-Based Financing (RBF) or Performance-Based an expected outcome measure [17]. As healthcare out- Incentives (PBI) [10]. Such PBI aim at counteracting the comes are only indirectly linked to what can be directly flaws of input-based financing (and when coupled with influenced by a single provider’s performance or service additional complementary interventions also the flaws user ’s behavior, studies analyzing the impact of PBI of user fees) by steering the purchasing function, so that programs focus on service output measures related to providers are incentivized to provide the high-quality utilization rates or number of cases treated. With re- service outputs necessary to meet the community’s health gard to maternal and child healthcare service outputs, evi- care needs. dence from PBF pilots demonstrated that introduction of Brenner et al. BMC Health Services Research 2014, 14:180 Page 3 of 17 http://www.biomedcentral.com/1472-6963/14/180 financial incentives based on provider performance im- births) were among the highest in the world [28]. Poor proves healthcare quality, increases service utilization, managerial and organizational quality of care together creates more efficient financial and organizational man- with lack of adequate resources is largely responsible for agement structures, and restricts corruption [18-21]. In delays in care which ultimately result in the death of other instances, some evidence is available to show that women and their newborns [29]. The Essential Health PBF programs further contribute to inequalities in access, Package introduced in 2004 lists maternal care among encourage healthcare providers to adopt ‘gaming ’ be- those health services that should be provided free of haviors, lead to neglect of non-incentivized healthcare charge in Malawi, which led to an increase in maternal services, or are too cost-intensive in terms of their long- health service coverage and utilization in the following term sustainability [22,23]. In addition, there is still lim- years [30,31]. In 2010, 46% pregnant women were found ited understanding on the effects of performance-based to attend at least four antenatal care visits and 71% incentives on the intrinsic and extrinsic motivation of pregnant women delivered in the presence of a skilled healthcare providers [24,25]. Given the variety in PBI attendant [28]. In spite of these relatively high coverage implementation and evaluation, strong evidence on the rates, quality of care deficits due to persistent financial impact of PBI programs on both healthcare outputs and and geographical barriers, ongoing deficiencies in human outcomes in LMICs remains extremely meager [26,27]. resources for health, and frequent stock-outs of essential As outlined above, a vast majority of recent PBI impact equipment and drugs still contribute to poor maternal evaluations in LMICs have been based on purely quan- health outcomes [26]. titative research designs. In light of the current need With the ultimate objective of reducing maternal and for stronger evidence on the relationship between PBI neonatal mortality, the RBF4MNH Initiative was intro- and health outcomes, especially in sub-Saharan African duced by the Ministry of Health (MoH) of Malawi with countries, the purpose of this article is to describe a financial support of the Norwegian and German govern- rigorous mixed methods research protocol designed to ments. Options Consultancy Services was contracted by evaluate the causal impact of a PBI program currently the MoH for technical support in implementing and rolled out in Malawi on health service structures, pro- monitoring the project. The RBF4MNH Initiative seeks cesses, and outputs. The study design uses a sequential to improve quality of maternal and neonatal healthcare explanatory mixed methods approach developed to as- (MNHC) delivery and utilization in public and private sess the causal relationship between a set of PBF and not-for-profit health facilities [32]. The Initiative’s pri- CCT incentives and a number of maternal health ser- mary objective is to increase the number of deliveries vice outcomes. In the following sections, we outline that take place under skilled attendance in district-level the PBI program’s implementation context and its in- and rural health facilities with high quality maternal and fluence on our impact evaluation design. We then fur- neonatal services provision. For this purpose, the Initiative ther describe the overall study design by illustrating currently targets 17 emergency obstetric and neonatal care each of the proposed study components. We also de- (EmONC) facilities, of which 13 operate at the basic (rural fine the rationale and purpose behind each component health centers) and four at the comprehensive (4 district as they relate to the current PBI evidence gaps. Finally, hospitals) level. These 17 facilities were selected out of all we discuss the advantages of an explanatory mixed 33 facilities that are supposed to provide EmONC services methods design in addressing a number of essential according to WHO service coverage criteria (i.e. one com- characteristics in the evaluation process of a health system prehensive and four basic EmONC facilities per 500,000 intervention. With the strategic application of quantitative population [33]) as identified by the MoH within four and qualitative methods, as in our suggested research de- districts (Balaka, Dedza, Mchinji, and Ntcheu). Facility sign, we provide an example of an impact evaluation ap- selection was carried out jointly by the District Health proach that allows a broad evaluation focus with a high Management Teams (DHMT), the head of the Reproductive yield of robust measurements of the underlying treatment Health Unit (RHU) of the MoH and the Options team. In a effect. In sharing this protocol we aim to provide an ex- first step, a broad quality assessment was conducted based ample of how qualitative methods can be integrated into on a number of performance indicators related to four main commonly used quantitative impact evaluation designs. health service functions: a) leadership, b) resource man- agement, c) environmental safety, and d) service provision. Design In a second step, only those facilities with maternal care Study setting services performing all required EmONC signal functions, Malawi, like many sub-Saharan countries, is not on track operating day and night, having at least three qualified to meet its targets for Millennium Development Goal five. staff in place, meeting WHO-recommended population In 2010, Malawi’s maternal mortality ratio (675/100,000 coverage criteria, and having a functional referral system live births) and neonatal mortality rate (31/1,000 live in place were selected into the intervention. Brenner et al. BMC Health Services Research 2014, 14:180 Page 4 of 17 http://www.biomedcentral.com/1472-6963/14/180 Programmatically the RBF4MNH Initiative is divided staff (maternity unit plus other clinical units) of district into three major components: a) a basic infrastructural hospitals (CEmONC), the entire staff at health centers upgrade of the 17 facilities including architectural (BEmONC), and the DHMTs are rewarded. These finan- modification to extend available space, replacement and cial rewards are earmarked in a way to ensure both facility provision of essential equipment, and maintenance of investments (30%) and salary top-ups (70%), which aver- critical supply chains necessary in sustaining EmONC age 15–25% of health staff ’s total salary envelope. Facilities minimum standards; b) a supply-side PBF intervention are free to use the facility portion of the rewards to finance consisting of quality-based performance agreements be- any infrastructural improvements independent of direct tween the RHU on the one side and targeted facilities relevance to MNHC delivery. The rewards of the DHMTs and DHMT on the other; and c) a demand-side CCT are not only based on the achievements of the selected intervention consisting of monetary compensations to health facilities but also on the achievements of the district pregnant women for the recovery of expenses directly as a whole in order to avoid that DHMT support is tar- related to accessing and staying at target facilities during geted towards single facilities. Verification and payment and at least 48 hours after childbirth. cycles are scheduled to occur every six months. The initial phase of the infrastructural upgrade com- Concomitantly with the supply-side rewarding scheme ponent was completed prior to the official introduction a demand-side scheme was introduced in July 2013. The of the PBF and CCT program components, but for some demand-side intervention consists of CCTs targeted to- facilities still ongoing due to unforeseen administrative wards pregnant women living in the EmONC catchment and logistic delays. In early April 2013 performance agree- areas of the 17 selected facilities. The CCTs are intended ments were signed between the MoH, the 17 health fa- to support women a) to present to the intervention facil- cilities, and the four respective DHMT. Within these ities for delivery in time; and b) to remain under skilled agreements, performance rewards are linked to quantity maternal care observation at these facilities for the initial and quality indicators. All indicators are directly or 48-hour post-partum period. The financial support is con- indirectly related to the Initiative’s primary outcome to sidered as cash contributions towards costs incurred by increase the number of hospital-based deliveries of good delivering in a health facility, such as expenses related to quality and can be divided into two groups: a) core indica- transport to and from the facility, food while staying at the tors to determine health facility and DHMT reward facility, and essential childbirth items (blankets, wrapping payments based on the achievement of set targets; and cloth). Enrollment into the CCT scheme occurs during a b) quality indicators to deflate reward payments based woman’s first antenatal care visit at the respective health on deficits in performance quality. The core indicators facility. Upon enrollment, all eligible women (i.e. perman- measure among others the quantity of facility-based ent residence in the EmONC catchment area) are given a deliveries at BEmOC level facilities, HIV screening tests cash transfer card to keep with them until the day of deliv- offered, PMTCT treatments provided, maternal death au- ery. Following initial enrollment, Health Surveillance As- dits conducted, sufficient stocking of necessary medicines, sistants (HSA) at the community level will verify each and the timely completion and submission of HMIS re- woman’s eligibility based on her residential status. Upon ports. The quality indicators measure quality aspects of delivery at a target facility, all enrolled women are given a technical care during labor, delivery, and newborn care cash amount consisting of: a) a fixed amount to cover es- provided (e.g. use of partograph during first stage of sential childbirth expenses; b) a variable amount covering labor, use of oxytocics drugs during third stage of labor, transport expenses based on the actual distance between use of magnesium sulfate in cases of (pre-)eclampsia, health facility and a woman’s residence; and c) a fixed supplementation of vitamin A to newborns), but also amount for each 24 hours up to a total of 48 hours follow- assess the level of provider adherence to routine service ing childbirth a women stays under clinical observation at processes (e.g. patient feed-back mechanisms, equip- the facility to cover food and opportunity costs (loss of ment repair protocols, infection control guidelines). In- productivity being away from home). dicators receive different weights that are used in the The PBF supply-side incentives and demand-side CCTs calculation of financial rewards. are expected to increase the number of facility-based Within the agreements, performance targets for each deliveries through their combined effect on improved facility and DHMT are set individually. Performance re- quality, through changes in providers’ motivation and ports on core and quality indicators are submitted by each proactivity, and the removal of financial barriers to ac- health facility and DHMT to the RHU, and reported data cess. Accordingly, any observed changes in institutional is afterwards verified by an external verification agent. The delivery rates – from an implementation point of view – first verification was organized as peer review of districts can be directly linked to positive improvements in to facilitate joint learning. Based on the verified results current MNHC service delivery and service utilization and in relation to the achievement of the targets the entire (i.e. program outputs). Brenner et al. BMC Health Services Research 2014, 14:180 Page 5 of 17 http://www.biomedcentral.com/1472-6963/14/180 Research objectives we also consider an assessment of the distributional The impact evaluation design presented in this article was impact among clients of different socioeconomic back- developed by a multi-institutional team of researchers. grounds as relevant. As this design is used for an external evaluation process, Based on the main study objectives and in conjunction all researchers are independent of the MoH and its with the additional research objectives, we defined the implementation team. This impact evaluation study is following specific research aims: independently funded by USAID/TRAction and by the Norwegian Government. The evaluation study was con-  Specific research aim 1: To establish the effect of ceptualized to assess the impact of the RBF4MNH Ini- supply-side and demand-side incentives on quality of tiative for a period of approximately 24 months after health care services in Malawi. implementation. It is our hypothesis that the RBF4MNH supply-side Given the relatively short assessment period, the size incentives to maternal care providers and DHMTs of the study population, and the resulting power to de- have a positive effect on the quality of obstetric tect a reasonable effect size, it is not feasible for our services. We anticipate that the extent to which the evaluation design to use population-based indicators, performance-based incentives create expected such as maternal mortality ratios and neonatal mortality changes in service quality will depend, positively or rates, as outcome measures. Such long-term indicators negatively, on the level of service utilization are not sensitive enough to capture the expected PBI produced by the demand-side incentives. We expect effects, especially since the results incentivized by the that those districts where supply and demand of RBF4MNH Initiative directly target health care outputs, quality of care is met most optimally will demonstrate not outcomes or impacts. Our focus remains therefore the most pronounced outcome measures. on short- and mid-term structural, procedural, and out-  Specific research aim 2: To establish the effect of put measures to sufficiently capture any intermediate supply-side and demand-side incentives on the effects resulting from the PBI intervention, which are all utilization of maternal healthcare services in understood to contribute to the reduction of maternal Malawi. and neonatal mortality. The main study objectives are to It is our hypothesis that the demand-side incentives evaluate the impact of the RBF4MNH Initiative on mea- to pregnant women have a positive effect on the sures of quantity and quality related to the delivery and utilization of facility-based obstetric care services. utilization of maternal and neonatal healthcare services. We anticipate that the extent to which the CCT For this purpose, we chose quality of care and access incentivizes change health-seeking behaviors of to care as outputs of interest as they are closely related women will be more pronounced in districts where to maternal and neonatal health outcomes, such as ma- access barriers are relatively high. We also expect ternal and neonatal mortality, morbidity, and disability that in those districts where improvements in quality [34,35]. Furthermore, both service quality and service of service delivery in response to the supply-side accessibility are under direct influence of the health incentives are most successful outcome measures for system [36]. service utilization will be most pronounced. In the light of the current PBI evidence gap in Sub-  Specific research aim 3: To establish the effect of Saharan Africa, we also follow a number of additional re- supply-side and demand-side incentives on the access search objectives that more directly address the current to and quality of not directly incentivized maternal scientific discussion on performance incentives in health care services. financing. First of all, we introduce e an assessment of It is our hypothesis that the strong focus on the extent of negative PBI effects on the delivery of incentivizing quality of care and utilization of health services not directly targeted by financial incen- obstetric services will not affect the quality of other tives. For this we broaden the study focus beyond struc- maternal care services to a significant extent. We ture, process and output measures related to obstetric anticipate that only those aspects of care similar to care only to also include other health services along the all three services, such as stocking of essential continuum of maternal care, namely antenatal care (ANC) medicines or timely submitted HMIS reports, might and postnatal care (PNC) services. Second, since research have some impact on the quality of ANC and PNC on provider motivation is still limited, we included an as- services. Within obstetric care service delivery sessment of the interaction between financial incentives processes we expect some providers (least satisfied, and provider behavior based on experiential accounts of least trained) to focus mainly on those activities the working environment in response to a PBF package. directly related to incentivized outputs. Last, as the RBF4MNH Initiative offers a CCT package  Specific research aim 4: To establish the effect of independent of any specific pro-poor targeting strategy, supply-side and demand-side incentives on the Brenner et al. BMC Health Services Research 2014, 14:180 Page 6 of 17 http://www.biomedcentral.com/1472-6963/14/180 experiential dimensions of healthcare quality among output element comprising health service products maternal care providers and clients. generated by the input and process elements. Based It is our hypothesis that the combination of on this model, healthcare quality can be defined as an supply-side and demand-side incentives will generate outcome product that is dependent on both sufficient different varieties of reactions and responses based on input and efficient process factors. An additional individuals’ experiences with incentives and reward qualitative approach to this model will serve to systems. We anticipate positive as well as negative elucidate the experiential dimensions of service individual experiences of providers with performance delivery, understanding how care is delivered and why incentives with regard to workload, job satisfaction, [38]. In particular, the qualitative element will explore or motivation. We also expect similarly broad the social and cultural setting of service delivery, experiences of clients with CCT incentives in shedding light on why providers manage the clinical relation to changes in perceived quality of care, encounter the way they do, what are facilitating and service accessibility, or health-seeking behavior. hindering elements to the delivery of quality care  Specific research aim 5: To establish the extent to (within and beyond the PBF intervention), and what which the demand-side incentives generate equitable elements are responsible for motivation and access to care for pregnant women. satisfaction (within and beyond the PBF intervention). It is our hypothesis that the CCTs will increase  To frame healthcare utilization, we adapted utilization of obstetric services for pregnant women Andersen’s behavioral model of health services use currently facing financial access barriers, and thus [39]. Based on this model, health care utilization result in more equitable utilization. Still, we expect results from determinants of access. Access to care persisting additional barriers, financial and is in turn further defined along a number of non-financial, to prevent some women from contextual and individual characteristics: a) enrollment into the CCT scheme prior to childbirth. predisposing characteristics such as demographic and social structures, individual health beliefs, or the role Conceptual framework of a sick person within a community; b) enabling Both quality of care and service utilization are challenging characteristics such as income, insurance coverage, to assess since each one of them represents a complex the- user fees, travel and waiting times; and c) need oretical construct. For our impact evaluation approach, we characteristics such as the perceived urgency or chose the following conceptual models to allow for a com- prior experience with a given health problem. In this prehensive assessment of the expected impact of the inter- model, access is a prerequisite for healthcare vention on quality and utilization patterns. The resulting utilization. Access is understood as equitable when conceptual framework underlying our evaluation study is healthcare utilization solely depends on individual illustrated in Figure 1. need irrespective of other factors such as age, sex, income, or ethnicity [40].  Quality of care, according to the Donabedian model [37], results from a sequence of three elementary Theory of change steps. First, a structural element comprising Guided by the research aims and based on the selected service-related technical and human input factors; conceptual framework, we used our hypotheses about second, a process element comprising technical and the cause-effect relationships to further outline a compre- interpersonal activities needed to transform structural hensive theory of change. As shown in Figure 2, the theory elements into actual healthcare outputs; and third, an of change allows us to map the causal chains between Figure 1 Conceptual framework of quality of care and utilization. Brenner et al. BMC Health Services Research 2014, 14:180 Page 7 of 17 http://www.biomedcentral.com/1472-6963/14/180 Figure 2 Causal chains addressed by the impact evaluation design. supply- and demand-side incentives and to identify some in the delivery of ANC and PNC services, as is imagin- of the expected intended and unintended effects. able in rural health facilities where the entire continuum The supply-side incentives affect both the structural of MCH is provided by the same cadre of health profes- and procedural elements of healthcare quality positively. sionals, or with the effect that the focus on obstetric care Following this causal chain of events, providers’ auton- only leads to neglect of service quality of the non- omy, motivation and satisfaction increase. Ultimately, incentivized ANC and PNC services. Alternatively, the improved effectiveness, more timeliness, and higher supply-side incentives carry the risk to crowed out pro- quality of EmONC service delivery result. Since these viders’ intrinsic motivation. In this causal chain the loss supply-side incentives only target quality outputs of of altruistic behavior and work ethics with active ma- obstetric care services, the cause-effect relationship can nipulation of the rewarding system for personal gain play out in two possible ways. Either with the effect of does ultimately not yield any positive changes in service positive quality outcomes not only in obstetric but also quality outcomes. Brenner et al. BMC Health Services Research 2014, 14:180 Page 8 of 17 http://www.biomedcentral.com/1472-6963/14/180 In addition, the demand-side incentives remove finan- explanatory mixed methods approach therefore, sequen- cial access barriers and potentially allow more pregnant cing quantitative by qualitative research components women to utilize obstetric services. Following this causal provides additional information on unexpected or unex- chain of events, individual and household expenditures plainable results. In line with the rationale of an explana- related to facility-based deliveries are lowered as expen- tory mixed methods design in our study, qualitative data ditures associated with transport, childbirth equipment, collection follows quantitative data collection at mid-term and opportunity costs are compensated. Furthermore, and endpoint. Given that the focus of the qualitative work since enrollment of women into the CCT program is is on “explaining” the quantitatively measured changes organized through ANC visits, the number of pregnant produced by the intervention, there is no need for a quali- women who attend ANC services increases. Provided tative data collection at baseline, meaning before the inter- quality of care is perceived as high, as more women de- vention has even started. Figure 3 schematically displays liver at health facilities, the utilization of PNC and other the anticipated sequences of the research components services within the MCH continuum rises as a result of within our explanatory mixed methods design. The ration- patient education and trust in established provider-patient ale behind selecting an explanatory mixed methods design relationships. Since these demand- and supply-side incen- is guided by our research aims. Using quantitative and tives are applied independently of each other the effects qualitative methods sequentially to explain study results on utilization might exceed the capacity in quality service allows us: a) to comprehensively capture the complexity of delivery. In this causal chain, the demand-side incentives the impact measures (i.e. quality of care, utilization); b) to affect pregnant women’s health-seeking behavior to the keep a broader scientific scope to investigate on intended extent that they not only stay the intended 48-hours post- and unintended effects; and c) to yield sufficient credibility partum at the facility, but also present days prior to labor and validity of the resulting impact estimates. In addition, onset to the maternity services. As facilities’ capacity in the qualitative information will be particularly helpful maternal waiting homes, maternity beds, and midwives is to illuminate the heterogeneity in effects we expect to limited, service over-utilization is likely to result, which observe across facilities, communities, and households. over time can lead to negative implications on the level of A better understanding of relevant contextual elements quality provided. Alternatively, if the perception pregnant facilitating or hindering change yields valuable informa- women have of service quality remains low, removal of fi- tion, as it allows unraveling under which conditions PBI nancial access barriers alone are not necessarily sufficient schemes can be expected to produce which results. enough to increase utilization. Following this causal chain, As an explanatory mixed methods design relies heavily little or even no change in utilization of facility-based on the robustness of quantitative data, the set-up of the obstetric care services is a likely consequence. quantitative research component is a crucial factor. For the purpose of our impact evaluation, we structured Mixed methods study design the quantitative research component based on a quasi- The methodological framework of our impact evaluation experimental design [42] in the form of a controlled follows an explanatory mixed method design. In mixed pre- and post-test design with two post-test measure- methods research, ‘explanatory’ describes the purposeful ments. This allows us to collect data at baseline (prior inclusion of qualitative methods of data collection and to the implementation of incentives), at mid-term (ap- analysis to “explain” the quantitative results [41]. In an proximately one year after the incentives are in place), explanatory mixed methods design, quantitative research and at end-point (towards the end of the impact evalu- components dominate over qualitative ones. This fact ation funding period). Our quantitative design is ‘con- makes explanatory models very suitable for impact eval- trolled’ since we collect and compare data from both uations, as impact measures are usually of quantitative intervention sites (i.e. RBF4MNH-targeted EmONC fa- nature. Nevertheless, analyzing quantitative data in the cilities and EmONC catchment areas) and control sites presence of qualitative information supplies additional (i.e. non-targeted EmONC facilities and corresponding input for the interpretation of overall results. In an EmONC catchment areas) during each of the three data Figure 3 Sequential explanatory mixed methods design. QUANT = dominant quantitative study component, qual = sequential qualitative study component. Brenner et al. BMC Health Services Research 2014, 14:180 Page 9 of 17 http://www.biomedcentral.com/1472-6963/14/180 collection rounds. As described earlier, of the 33 facilities of provider patient encounters for selected ANC and de- authorized to provide EmONC in the concerned districts, livery services; c) a systematic review of patient records; 17 were non-randomly included in the RBF4MNH inter- d) a series of provider interviews among maternity staff; vention. The remaining 16 facilities are chosen as control and e) a series of patient exit interviews conducted at sites for the impact evaluation study. The rationale be- point of exit after ANC, delivery, and PNC service use. hind using a controlled pre- and post-test design to as- Each of the five data collection activities covers a number sess quantitative effects is in response to the fact that of quantitative indicators that follow our research aims, the randomization was not possible due to implementation conceptual framework, and the theory of change described considerations. As for the rationale behind the selection earlier. These quantitative indicators can be divided into of the control sites, the implementation process followed four thematic groups: a) infrastructure indicators measur- by the RBF4MNH Initiative as well as the presence of vari- ing the availability, accessibility, and functionality of facility ous different reproductive health intervention programs structures, medications, clinical equipment, and human re- and pilots throughout Malawi led the research team to de- sources in respect to EmONC, ANC, and PNC; b) process cide on the 16 control sites within the current intervention indicators measuring the adequacy of technical and inter- districts as the most feasible option. personal clinical performance in respect to EmONC, ANC, and PNC; c) output indicators measuring the quality Study components of immediate service deliverables related to EmONC, To better conceptualize and operationalize the various ANC, and PNC; and d) perception indicators measuring elements of our mixed methods impact evaluation design, providers’ and clients’ experience related to aspects of we distinguished the overall work in three study compo- EmONC, ANC, and PNC service delivery and service nents. The first study component focuses on all quantita- utilization. Table 1 provides an overview of Study Compo- tive aspects of quality of care based on data collected at nent 1 including the relevant facility-based data sources, the facility level. The second study component focuses on the data collection instruments, and the key quality of care quantitative aspects of health service utilization based on measures. data collected at the community level. A third cross- For each of the five data collection activities, we de- cutting qualitative study component complements the two veloped ad hoc data collection instruments. All data quantitative components, in line with the overall mixed collection instruments in this study component were methods design described above. We purposely focus on designed to collect quantitative data based on the quality multiple quantitative and qualitative indicators, because of care indicators outlined above. we expect that a complex health system intervention, such To conduct the infrastructural health facility assessments as a PBF coupled with CCT, will produce multiple effects. we developed a structured checklist to collect information We do not expect change to be homogenous across all in- on building structure, human resources, essential medi- dicators, but rather expect that while the intervention may cines and equipment in line with national guidelines and be successful to produce change on some dimensions of international recommendations regarding the provision of utilization and quality, it may fail to do so on other dimen- EmONC and ANC. The checklists are designed to not only sions. This discrepancy is not per se problematic, but identify the availability, but also the level of functionality needs to be reported to adequately inform policy makers and accessibility of technical equipment and the overall on the changes which can be expected from the applica- service organization. tion of a combined PBF and CCT intervention. The two A different set of structured checklists was developed for quantitative study components keep the conceptual div- the systematic observation of provider-patient encounters ision of our research as outlined in the theoretical frame- during obstetric and ANC service provision. These clinical work (see Figure 1). The qualitative study component in care checklists collect infromation on the level of quality contrast follows an emerging pattern rooted in the of routine service processes during patient encounters. grounded theory approach [43] in response to the prelim- There are different checklists for both routine obstetric inary findings yielded by the two quantitative study com- and routine ANC service activities to identify how clinical ponents. Final interpretation of results will rely on the tests, medical procedures, patient interview topics, and joint appraisal of the findings stemming from the quanti- prescribed drugs adhere to current national treatment tative and qualitative study components. standards for each of the examined services. A third type of structured checklists was developed to Study component 1 conduct the review of patient records. These checklists The first study component relates to the quantitative as- are structured to systematically extract clinical documen- sessment of service quality and relies on five data collec- tation form facility-stored patient charts to retrospectively tion activities: a) an infrastructural assessment at the assess the quality of care that has been delivered in pa- health facility level; b) a series of systematic observations tients with obstetric complications (i.e. pre-eclampsia, Brenner et al. BMC Health Services Research 2014, 14:180 Page 10 of 17 http://www.biomedcentral.com/1472-6963/14/180 Table 1 Overview quality of care study component Study component Data collection activity (Data source) Tool used Key outcome measures Assessment of facility Structured • Adequacy of infrastructural and organizational infrastructure (health observation set-up in relation to EmOC standards facility, heads of services) survey • Availability, accessibility, functionality of materials and equipment in obstetric care, ANC, PNC services • Availability, accessibility of clinical guidelines and Quality of Care: Structural & protocols related to obstetric care, ANC, PNC Input elements • Availability, accessibility of essential drugs related to obstetric care, ANC, PNC Assessment of providers’ Structured interview • Number & type of provider training activities professional qualification survey (using & technical knowledge clinical vignettes) • Level of providers’ technical knowledge on EmOC, (maternity care providers) ANC, PNC Assessment of provider-patient Structured • Clinical case management (clinical assessment, encounters (obstetric care visits, observation diagnosis, treatment) of obstetric, pregnant, ANC visits) survey and newborn patients Assessment of health facility Structured • Timely identification of obstetric/neonatal Quality of Care: Process & records (maternity registers, observation complication (patient assessment, diagnostic Outcome elements patient charts) survey procedures) • Timely supportive and definitive management of obstetric/neonatal complications (intravenous fluids, oxygen, antibiotics, blood transfusion, C-sections) • Case outcomes (length of stay, fatality, disability) Assessment of providers’ Structured • Provider’s role, responsibility, workload workload, motivation, interview • Provider’s training background and appreciation satisfaction (maternity survey care providers) • Provider’s compensation and incentives • Provider’s satisfaction and motivation Quality of Care: Experiential Assessment of clients’ Structured • Clients’ demographic and socioeconomic information elements perception, satisfaction, exit interview • Type of care received at healthcare facility experience (women survey attending obstetric care, • Type of care received at outside formal health sector ANC, or PNC services) • Clients’ perception of healthcare services received • Clients’ knowledge retention related to danger signs • Clients’ satisfaction of healthcare services received EmOC = Emergency Obstetric Care, ANC = Antenatal Care, PNC = Postnatal Care, QUAN = quantitative, qual = qualitative. eclampsia, hemorrhage, prolonged or obstructed labor, of quality of care, such as training and clinical knowledge neonatal asphyxia). These checklists are designed to iden- levels, by using clinical vignettes [44]. These vignettes are tify the extent to which clinical performance (i.e. proce- developed to replicate clinical scenarios common to ob- dures, interventions, and treatment options) adheres to stetric, ANC, PNC service delivery and reflect classic pre- national guidelines. The information collected by this sys- sentations of health complaints together with pertinent tematic chart review differs in so far from the information clinical findings. All vignettes are tailored to the epidemio- obtained by the direct clinical care observations in that we logical profile of Malawi and are aligned with national care try to capture quality of care aspects directly relevant to protocols. obstetric emergency cases. This focus on the management Another set of structured questionnaires was developed of obstetric complications is otherwise not possible with to conduct exit interviews with patients seen at the direct clinical observations, due to the relatively small obstetric, ANC, and PNC services of each facility. numbers that can feasibly be observed. These questionnaires collect information on how service To conduct provider interviews, we developed a struc- quality is perceived by those using them, to further tured questionnaire to assess the effect of RBF4MNH in- determine aspects of client satisfaction with the way centives on working conditions, motivation and satisfaction services are delivered. Clients’ views on quality will of facility-based maternity staff. In addition, this provider complement the quality of care information obtained survey instrument is designed to also identify objective in- through the infrastructural, clinical, and record review formation on providers’ prerequisites for technical aspects checklists. Brenner et al. BMC Health Services Research 2014, 14:180 Page 11 of 17 http://www.biomedcentral.com/1472-6963/14/180 Sampling techniques differ for each of the data collec- are drawn and their medical records retrieved from the tion activities listed above. Most sampling frames included facility’s medical record office based on the identifying in this study component consist of small overall figures, information. such as the total number of health facilities (33) and the total number of maternity care providers (approximately Study component 2 1–2 for each of the 29 rural facility clusters and 6–10 in The second study component relies on a population-based the 4 second-level facilities). household survey to provide a quantitative assessment of For the infrastructural facility assessments and the struc- utilization patterns across the entire spectrum of maternal tured provider surveys, either facility level or maternal care services. In line with our overall conceptual frame- care provider level is considered as entry point for sam- work and specific research objectives, the survey collects pling. For these two data collection activities, full samples information on women’s use of maternal care services are used in order to include all available study units or (ANC, delivery, and PNC, including family planning) and study subjects. The sampling approach for the direct clin- the out-of-pocket expenditure incurred in the process of ical observations needs to take into account the actual oc- seeking care. In addition, in order to allow for a more currence of observed events (deliveries might not occur scrupulous evaluation of factors associated with health on a daily basis, ANC clinics take place only at certain service utilization and out-of-pocket expenditure, the days in the week), the variation in length of some of these survey gathers information on the household’s socio- events (deliveries might need to be observed over multiple demographic and economic profile. This is especially hours), the variations in willingness of subjects to consent important given our intention to measure the distribu- (i.e. providers and patient involved in a case), and the tional impact of the intervention among socioeconomi- available number and length of stay of interviewer teams cally different client groups in relation to both utilization at each of the facilities. For this reasons, and in line with and out-of-pocket spending. The quantitative indicators previous research [45,38], we expect each data collection in this component can be divided into two main groups: team to spend three consecutive days per each health fa- a) demand indicators measuring various determinants cility. All cases encountered during this time period which of maternal care utilization during and after pregnancy consent to participate in the observations are included. among women living in the EmONC catchment areas; For the client exit interviews we expect the same limi- and b) socioeconomic indicators measuring numerous tations in case frequency with women exiting obstetric determinants of income, property, and social status of services after childbirth as outlined for the direct obser- women and their households to allow for assessment of vations. Sampling of participants of obstetric care exit equity aspects in service utilization among households interviews therefore follows the same sampling tech- in the EmONC catchment areas. Table 2 provides an niques as indicated above. As ANC and PNC services overview of Study Component 2 including the relevant are only provided during specific week-days at most household-based data sources, the data collection instru- health facilities, we expect higher numbers of patients ment used, and the key access and utilization outcome attending these services within a confined time period. measures. For client interviews of women exiting ANC and PNC The household survey targets exclusively households services we therefore anticipate to obtain systematic where at least one woman has completed a pregnancy random samples of service users at each facility during in the prior twelve months. To identify the women, we these clinic days. apply a three-stage cluster sampling procedure [47]. The sampling approach for the patient record review First, we define clusters; then within cluster we identify follows sample size estimations found in the literature relevant Enumeration Areas (EA); and then, within each [46] and is based on the number of indicators used in EA, we identify households that meet our selection cri- the checklist (Table 1). Thus, the target samples include teria (i.e. having at least one woman who has completed 30 maternal chart reviews per facility in order to reach a a pregnancy in the prior twelve months). In line with total sample size of approximately 900 reviews. Medical the RBF4MNH intervention and with the overall health records are selected following a two-stage sampling pro- system structure, we define clusters as the EmONC cedure. First, based on the case-logs kept in each facil- catchment area of the 33 facilities present in the four ity’s maternity unit all cases diagnosed with obstetric districts. Within the cluster, we opted to use EAs rather complications, prolonged admissions, or fatal outcomes than villages as initial starting point, because we could within the preceding three months are identified. For not retrieve complete information on the villages contained each eligible case identifying information (i.e. date of within a given EmONC catchment area. The limited num- presentation, patient name, medical registration number) ber of clusters represents an important constraining factor is separately listed by the review team. Second, out of in relation to sample size calculations. Assuming an intra- these generated lists random samples of 30 patient cases cluster correlation coefficient of 0.04 and a power of 0.8, a Brenner et al. BMC Health Services Research 2014, 14:180 Page 12 of 17 http://www.biomedcentral.com/1472-6963/14/180 Table 2 Overview service utilization study component Study component Data collection activity (Data source) Tool used Key outcome measures Assessment of demand for Structured • Proportion of women in catchment areas maternal care services (women) interview survey with facility-based deliveries • Proportion of women in catchment areas with four ANC visits during pregnancy. Utilization: Health-seeking elements • Proportion of mothers in catchment areas with first ANC visit during first pregnancy trimester • Proportion of women and newborns in catchment areas with at least one PNC visit Assessment of household Structured • Equity in distribution of facility-based deliveries socioeconomic status (women) interview survey among households in catchment areas Utilization: Livelihood asset • Equity in distribution of number and timing of elements ANC visits among households in catchment areas • Equity in distribution of number and timing of PNC visits among households in catchment area ANC = Antenatal Care, PNC = Postnatal Care, QUAN = quantitative, qual = qualitative. sample of 1800 women allows us to identify a significant the risk that the observed change is simply the product of impact on the primary study outcome (i.e. utilization of a secular trend [50-52]. facility-based delivery) if the increase between baseline and endpoint is at least 20%. A sample of a total of 1800 Study component 3 households entails interviewing approximately 25 to 28 The third study component refers to the qualitative assess- women in each EA. ment of both quality of care and service utilization. As The analytical approach for quantitative Study Compo- mentioned earlier in the text and in line with the overall nents 1 and 2 is identical. Data analysis will rely on the mixed methods explanatory design [41], the qualitative computation of a difference-in-differences (DID) model component relies on a grounded theory approach [43], in for each of the outcomes of interest. In line with the over- which qualitative information is coded, compared and all conceptual model, this yields multiple estimates with re-categorized as new themes or issues emerge. the expectation that the intervention may induce change This study component consists of qualitative data collec- on some indicators, but not on others. The DID approach tion activities and relies on a mixture of non-participant represents the most suitable analytical model the absence observations, in-depth interviews, and Focus Group Discus- of randomization, as it allows us to systematically estimate sions (FGDs) with both users and providers of maternity the extent to which intervention and control groups ma- care services [53]. An additional set of key informant in- ture differently over time. Different maturation over time terviews with policy and implementing stakeholders is in measured variables can thus be identified across the planned to shed light on the overall socio-political con- three data collection time points, while the resulting true text that characterized the introduction of the interven- effect estimates can be compared to the defined counter- tion to allow the research team to further understand factual [48,49]. observed results. As mentioned earlier, qualitative data To estimate the true causal effect of the intervention collection activities are largely intended to explain the (RBF4MNH) on the multiple quality of care and health heterogeneity we expect to observe across facilities and service utilization measures between baseline, mid-term, communities in relation to all the outcomes observed and end line, the DID assumes parallel trends in both quantitatively. This is considered to lead to a more intervention and control sites in indicators aside from comprehensive understanding of the underlying cause- those caused by the intervention. In situations where the effect relationships, compared to what would otherwise parallel trend assumption is not fully given, incorrect es- be possible through an exclusive use of quantitative timation of the true effect will result. In our case, effect methods. In other words, the qualitative component is estimation is strengthened by the fact that the analytical shaped in a way to fill the knowledge gaps identified by model relies on multiple post-test measurements, at the quantitative components [41]. mid-term and endline, rather than a simple before- Sampling approaches for the qualitative study compo- and-after design as in many other impact evaluations. nent differ from those in the quantitative components. Multiple post-test measurements allow for a more precise Both users and providers of healthcare services are sam- estimation of the effect as the observable trend in change pled purposely to ensure the theoretical relevance of the due to the intervention can be better identified, minimizing selected samples in relation to the themes to be explored Brenner et al. BMC Health Services Research 2014, 14:180 Page 13 of 17 http://www.biomedcentral.com/1472-6963/14/180 [53]. To the extent possible given financial and pragmatic at a later stage. An additional valuable source of tri- constraints, qualitative data collection continues until sat- angulation is provided by comparing findings across uration and redundancy are reached [54,55]. In line with data sources (interviews, FGDs, and observations) and the grounded theory principle, sampling techniques are across respondents (policy stakeholders, providers, and geared towards an emerging design, allowing for the inclu- users). When needed, the research team will refer back sion of new constituencies of respondents should new im- to the quantitative analysis to elucidate understanding portant relevant themes emerge as we progress through of the emerging qualitative findings and vice versa. As qualitative data collection [43]. indicated earlier in the description of the overall mixed While the specific themes to be explored will be de- methods approach, the final result interpretation and fined only once the preliminary quantitative analysis is the subsequent policy recommendations emerge once completed, more general attention is given to exploring both analytical processes are completed and quantitative the experience of providing and receiving care under the and qualitative findings are brought together. new service purchasing scheme. Providers are probed to reflect on if and how the RBF4MNH intervention has con- Ethical approval tributed to improve their working conditions, to increase The study protocol, comprehensive of all of its quantita- their motivation, and to enable them to provide quality tive and qualitative tools, received ethical approval by the services to their communities. In addition, users and po- Ethics Committee of Faculty of Medicine of the University tential users of care (i.e. women in need of maternal care of Heidelberg, Germany. In addition, the single study services, but not using them) are probed to reflect on if components and their specific quantitative and qualitative and how the RBF4MNH intervention in the experience of tools (including interviews and clinical chart reviews) the communities has facilitated equal financial access to underwent ethical approval by the College of Medicine maternity services and has improved the quality of these Research and Ethic Committee (COMREG), the ethical services. While a comprehensive process evaluation asses- board located at the College of Medicine, Malawi. Prior to sing the fidelity of implementation [56,57] is not possible ethical approval by the University of Heidelberg, the study given the financial resources at our disposal, the use of protocol underwent a multidisciplinary competitive peer non-participatory observation as well as the interviews with review process coordinated by the United States Agency for providers and communities planned within the framework International Development (USAID) Translating Research of our impact evaluation nevertheless allow us to identify Into Action (TRAction) Project and its awardee, the potential gaps in the implementation process and to University Research Company (URC), and was approved understand how such gaps relate to the observed effects. for funding under Subagreement No. FY12-G01-6990. The All interviews and FGDs are conducted in the local qualitative component of our study protocol is reported languages by trained research assistants working under in conformity with the Qualitative Research Review the direct supervision of the research team. All verbal Guidelines (RATS), required in a BMC publication. material (interviews and FGDs) is tape-recorded, fully transcribed, and translated into English for analysis. Discussion Transcripts and translations are checked for content Impact evaluations serve multiple purposes: to create consistency and accuracy. For quality assurance rea- empirical evidence, to advise project management, to sons, all non-participant observations are carried out by guide policy decisions, and to inform budget allocations members of the research team with specific training in [59]. The recent promotion of PBI schemes represents a taking accurate memos that will later serve as formal good example of how an innovative policy in health sys- material for the overall process analysis. Analysis of the tems development – although scientifically backed-up qualitative information is carried out with support of by only meager evidence – receives a lot of attention the software NVivo [58]. In line with our underlying from implementing organizations, national health policy grounded theory approach, qualitative analysis will rely makers, and international funding agencies [60,61]. To on an inductive standard comparison method [43]. The tap into existing knowledge and to generate stronger analysis begins with a first reading of the memos and evidence are the goals of this impact evaluation. To pro- transcripts to acquire familiarity with the data. Categories vide comprehensive and robust results that fill at least and sub-categories are developed, modified and extended some of the identified knowledge gaps in the PBI litera- on the basis of what themes emerge as the analysis pro- ture, our study design follows closely the implementation ceeds. Links between categories are identified to illuminate process of a PBI scheme. Our choice for a mixed methods the understanding of the research question. Analyst tri- design rests on the awareness that understanding the pro- angulation is applied across all qualitative data sets. At cesses through which health interventions produce change least two independent researchers conduct the analysis is as important as measuring the actual change produced. separately and only compare and contrast their findings It follows that a comprehensive assessment of health Brenner et al. BMC Health Services Research 2014, 14:180 Page 14 of 17 http://www.biomedcentral.com/1472-6963/14/180 interventions can therefore only be achieved by coup- methods design explicitly allows to explore and incorpor- ling quantitative methods with qualitative ones as part ate information on how and why social, economic, and of a systematic mixed-methods study design [62]. As cultural factors shape the success or failure of provider- guiding principle for impact evaluation studies, the Inter- targeted performance incentives, especially with regards to national Initiative for Impact Evaluation (3ie) proposes molding intrinsically and extrinsically motivated behavior evaluation research to not only focus on aspects that seem [66,67]. To gain deeper understanding of healthcare to work, but also to supply explanations on the why (or worker motivation it thus not only of importance in rela- why not) [63]. In this understanding, impact evaluation tion to quality service delivery, but also allows to shed designs have to focus on both outcome and processes by more light on health worker attrition and retention in considering each of the following six key elements: 1) the general, as this is of special importance within the underlying causal chain; 2) context-related factors; 3) an- Malawian context with its extreme human resource ticipation of impact heterogeneity; 4) a credible counterfac- for health crisis and the recently implemented policies to tual; 5) counterfactual and factual analysis of results; and counteract this situation [68]. Similarly, the application of 6) the use of mixed methods. a mixed methods design allows us to unravel the context- Our approach to impact evaluation explicitly attempts ual factors surrounding the implementation of the CCT to address all these six elements. To start with, to unravel and their potential effect on increasing equity or inequity the underlying causal chain, we adopt a theory-based ap- [69]. Not only for our study purpose, but also in the proach to impact evaluation [64]. We do this through the broader frame of Malawi’s current poverty reduction strat- development of a theory of change closely linked to the egies, a deeper understanding of such contextual factors initial conceptual framework which, as indicated in the lit- modulating utilization and accessibility of reproductive erature, serves both to guide the initial research aims and health services will be of value to a variety of national to inform the development of the study design. To in- population development programs [70]. crease its robustness, the initial conceptual framework de- We expect impact heterogeneity to emerge in our study fining the concepts of utilization and quality is rooted in in relation to two factors. First, we expect the intervention existing literature [37,39], but was later integrated into to produce different effects across the various concerned one single theory of change which allows us to merge and facilities. In line with what is stated in the paragraph map all possible causal chains related to the introduction above, we are aware that specific contextual elements in of the RBF4MNH intervention. During this theoretical each district and in each EmONC catchment area will process, it was important to us to also accommodate spe- interact with the intervention to produce differential ef- cific issues related to the Malawian context that potentially fects. We are aware that since the selection of the facilities reflect the different factors that determine the causal receiving the intervention was non-random, chances that chains leading to service quality and utilizations, such as heterogeneous effects will be observed are even higher. the extreme shortage of skilled health care providers, re- Again, we return to the application of a mixed methods curring stock-outs for essential medicines and equipment, design as a tool to understand heterogeneity, unraveling the poorly developed referral system, the large portion of key success as well as key failure factors. Second, as out- rural population, the under-funded user fee exemption lined in the methods section, we do not expect the inter- policy, or the serious economic crisis the nation has been vention to produce uniform results across all selected facing for years [65]. indicators. It would be naïve to imagine that the interven- Understanding the context as an essential part of the tion could only work to produce exactly and exclusively evaluation approach, we allow contextual elements to the initially expected results. Unlike prior evaluations, enter our design on multiple levels. First, in our theory which focused on a restricted number of indicators closely of change, we explicitly consider how social, political, aligned with the ones set by the program itself [18,20], cultural, and economical contextual elements may inter- we explicitly chose to monitor a broader range of ser- fere with the intervention to produce both intended and vices with the aim of capturing unexpected effects. The unintended effects. Second, on a thematic level, some comprehensive nature of our evaluation, including both contextual determinants are subject to those research providers (at multiple levels of care) and users, while questions (e.g. satisfaction with working environment or targeting multiple outcome indicators, is therefore more perception of service quality) that directly aim at evalu- likely to yield heterogeneous results. Thus, we are more ating relationship and interaction between financial likely to capture both successes and failures of the inter- incentives and provider motivation or access inequities. vention. The challenge, but also an additional opportunity, Third, on a methodological level, the use of an explana- ahead is the appraisal of heterogeneity across outcome tory mixed methods design provides the opportunity to measures in the light of the complementary qualitative purposefully investigate contextual elements that might data. In combination, this yields more robust evidence as have been overseen initially. For instance, the mixed to what changes the intervention is able to induce or not, Brenner et al. BMC Health Services Research 2014, 14:180 Page 15 of 17 http://www.biomedcentral.com/1472-6963/14/180 which is facilitated by triangulation across the multiple In conclusion, the study protocol presented here is an data sources available within the study [53]. example for a rigorous mixed methods impact evaluation Counterfactual-based analysis requires reference to a design that addresses all characteristics desirable for comparison group that allows to estimate how the out- theory-based impact evaluation research. Building the come measure would have changed in the treated popula- impact evaluation on an explanatory mixed methods de- tion in absence of the intervention [42]. Identifying a sign offers the opportunity to address our research aims proper counterfactual is essential to estimate the cause- and conceptual framework comprehensively, especially effect relationship, i.e. being able to attribute the observed since our outcome measures – healthcare quality and ser- change to the intervention under study [71,72]. Identifying vice utilization – require a multi-dimensional approach. a robust counterfactual for the quantitative study compo- The operationalization of our research design in three nents was the most challenging aspect of our impact components – a quantitative quality of care, a quantitative evaluation design. The team was left to follow the inter- service utilization, and a qualitative cross-cutting study vention design, with no option to propose randomization component – follows the thematic, but also the concep- (potentially yielding a trial) or allocation of the interven- tual structure of the study. As the aim of this impact tion exclusively to the facilities above a given quality score evaluation is to generate evidence on performance incen- (potentially yielding a regression-discontinuity design) tives, we feel confident that this explanatory mixed [51]. The pre- and post-test design, allowing for the appli- methods design will be able to contribute to existing evi- cation of DID modelling techniques, was the only feasible dence by addressing the current knowledge gaps related to option. While we are aware that a fully experimental de- the effect of output-based financing models, but will also sign would have been preferable from a scientific point of generates specific information relevant for the Malawian view, we are confident that the DID analytical model will health policy context. allow for sufficiently accurate estimation of the effect while controlling for history and maturation bias [42], Competing interests The authors declare that they have no competing interests. thanks also to the application of multiple post-test mea- sures. At the same time, simply following the intervention Authors’ contributions team’s plans and aligning our design with their decisions All authors participated in the conceptualization of the study, the gained us the respect and support of those implementing development of the research objectives and relevant theory of change. MDA developed the overall mixed methods design, with contribution from PJR, the program. AM, TBo, and SB. MDA, PJR, and TBa were responsible for the quantitative We opted to select controls located within the districts design. MDA and MS were responsible for the qualitative design. SB for two primary reasons. First, this is considered preferable developed the quality of care study component and all related indicators with contributions from AM and MS. MDA and PJR developed the health from a scientific point of view, since the expectation is service utilization component and all related indicators with contributions that facilities (and catchment communities) within a same from DM. SB and MDA drafted the manuscript with contribution from all district are more likely to be similar than facilities (and authors. All authors read and approved the final manuscript. catchment communities) across districts. This limits the Acknowledgements risks that underlying contextual differences are responsible This research project is made possible through Translating Research into for the observed changes, rather than the intervention per Action, TRAction, and is funded by United States Agency for International se. Second, it was impossible to select control sites beyond Development (USAID) under cooperative agreement number GHS-A-00-09- 00015-00. The project team includes prime recipient, University Research Co., the district boundaries and to be sure that these control LLC (URC), Harvard University School of Public Health (HSPH), and sub-recipient sites would not be affected by another maternal and neo- research organization, University of Heidelberg. natal health program in the very near future, making it This research project is co-funded through a grant by the Norwegian Ministry of Foreign Affairs to the Government of Malawi under Programme impossible for us to attribute causal effects. Still, choosing Title MWI 12/0010 Effect Evaluation of Performance Based Financing in as controls facilities and catchment communities located Health Sector. The Malawi College of Medicine as implementing institution is within the same districts introduces the risk of a potential recipient of this grant. The research team would like to thank the RBF4MNH Initiative implementing spill-over effect from the intervention areas [73], while team of the Reproductive Health Unit of the Malawi Ministry of Health, as any contamination of intervention sites due to proximity well as the support team from Options Consultancy Services for allowing an to non-intervention sites may lead to under-estimation of open dialogue between project implementation and impact evaluation which was vital to the definition of the study protocol and its alignment our effect measurements. In addition, close proximity with the implementation process. between interventions and control areas might bias re- The authors would like to thank Aurélia Souares and Gerald Leppert for their sponses or behavior of both clients and providers in valuable contributions during the early design phases; Julia Lohmann, Jobiba Chinkhumba, Christabel Kambala and Jacob Mazalale for their contributions the control areas. Still, we feel confident that given the during research tool development, and Albrecht Jahn for his support during multiple sources of data available within the study, we the preparation of the research protocol. Additional thanks to Julia Lohmann will be able to control for potential bias by relying on for her critical revision of the intellectual contact. We would also like to thank the anonymous reviewers who took the care to read our proposal an extensive triangulation process at the analytical application to URC/TRAction in 2011 and to provide useful comments to stage of our work. improve it. Brenner et al. BMC Health Services Research 2014, 14:180 Page 16 of 17 http://www.biomedcentral.com/1472-6963/14/180 Author details services in Rwandan health centres: 3-year experience. Trop Med Int 1 Institute of Public Health, Ruprecht-Karls-University, Heidelberg, Germany. Health 2009, 14:830–837. 2 Department of Community Health, University of Malawi, College of 20. Soeters R, Peerenboom PB, Mushagalusa P, Kimanuka C: Performance-based Medicine, Blantyre, Malawi. 3The World Bank, Washington, DC, USA. financing experiment improved health care in the democratic republic of 4 Department of Global Health and Population, Harvard School of Public Congo. Health Aff 2011, 30:1518–1527. Health, Boston, Massachusetts, United States of America. 5Wellcome Trust 21. Gorter A, Ir P, Meessen B: Evidence review: results-based financing of Africa Centre for Health and Population Studies, University of KwaZulu-Natal, maternal and newborn health care in low- and middle-income countries. Mtubatuba, South Africa. In Report for the project Programme to Foster Innovation, Learning and Evidence in Health Programmes of the German Development Cooperation: Received: 7 February 2014 Accepted: 7 April 2014 study commissioned and funded by the German federal ministry for economic Published: 22 April 2014 cooperation and development (BMZ) through the sector project PROFILE at GIZ - Deutsche Gesellschaft für Internationale Zusammenarbeit. Eschborn: German Health Practice Collection; 2013. References 22. Lundberg M: Client satisfaction and perceived quality of primary health 1. Kutzin J: Towards universal health care coverage: a goal-oriented care in Uganda. In Are you being served? new tools for measuring service framework for policy analysis. In Health, nutrition and population discussion delivery. Edited by Amin S, Dasb J, Goldstein M. Washington D.C: The World paper. Edited by Preker AS. Washington D.C: The World Bank; 2000. Bank; 2008:313–341. 2. Gottret PE, Schieber G: Health financing revisited: a practitioner’s guide. 23. Kalk A, Paul FA, Grabosch E: ‘Paying for performance’ in Rwanda: does it Washington, D.C: The World Bank; 2006. pay off? Trop Med Int Health 2010, 15:182–190. 3. World Health Assembly Resolution 58.33: Sustainable health financing, 24. Songstad NG, Lindkvist I, Moland KM, Chimhutu V, Blystad A: Assessing universal coverage and social health insurance. Geneva: World Health performance enhancing tools: experiences with the open performance Organization; 2005. review and appraisal system (OPRAS) and expectations towards 4. Laxminarayan R, Mills AJ, Breman JG, Measham AR, Alleyne G, Claeson M, payment for performance (P4P) in the public health sector in Tanzania. Jha P, Musgrove P, Chow J, Shahid-Salles S, Jamison DT: Advancement of Global Health 2012, 8:33–45. global health: key messages from the disease control priorities project. 25. Leonard KL, Masatu MC: Professionalism and the know-do gap: exploring Lancet 2006, 367:1193–1208. intrinsic motivation among health workers in Tanzania. Health Econ 2010, 5. World Health Organization: The world health report 2010 - health systems 19:1461–1477. financing: the path to universal coverage. Geneva: World Health Organization; 26. Witter S, Fretheim A, Kessy FL, Lindahl AK: Paying for performance to 2010. improve the delivery of health interventions in low- and middle-income 6. Chaudhury N, Hammer J, Kremer M, Muralidharan K, Rogers FH: Missing in countries. Cochrane Database Syst Rev 2012, 2:1–81. action: teacher and health worker absence in developing countries. 27. Fretheim A, Witter S, Lindahl AK, Olsen IT: Performance-based financing in J Econ Perspect 2006, 20:91–116. low- and middle-income countries: still more questions than answers. 7. Hutchinson PL, Do M, Agha S: Measuring client satisfaction and the Bull World Health Organ 2012, 90:559–559A. quality of family planning services: a comparative analysis of public and private health facilities in Tanzania, Kenya and Ghana. BMC Health Serv 28. National Statistical Office and ICF Macro: Malawi demographic and health Res 2011, 11:203–219. survey 2010. Zomba: National Statistical Office; 2011. 8. Gauthier B: PETS-QSDS in sub-Saharan Africa: a stocktaking study. 29. Kongnyuy EJ, Mlava G, van den Broek N: Facility-based maternal death In Report for the project measuring progress in public services delivery: 7 review in three districts in the central region of Malawi: an analysis of September 2006. Washington D.C: The World Bank; 2006. causes and characteristics of maternal deaths. Womens Health Issues 2009, 9. Gupta S, Verhoeven M, Tiongson ER: Public spending on health care and 19:14–20. the poor. Health Econ 2003, 12:685–696. 30. Carlson C, Boivin M, Chirwa A, Chirwa S, Chitalu F, Hoare G, Huelsmann M, 10. Musgrove P: Financial and other rewards for good performance or Ilunga W, Maleta K, Marsden A, Martineau T, Minett C, Mlambala A, von results: a guided tour of concepts and terms and a short glossary. In The Massow F, Njie H, Olson IT: Malawi health SWAp mid-term review: World Bank health results innovation trust fund - results-based financing for summary report. In Norad collected reviews, vol 22. Malawi: Norad: health: 14 September 2010. Washington D.C: The World Bank; 2011. Commissioned by the Ministry of Health; 2008. 11. Toonen J, van der Wal B: Results-based financing in healthcare: developing an 31. Bowie C, Mwase T: Assessing the use of an essential health package in a RBF approach for healthcare in different contexts: the case of Mali and Ghana. sector wide approach in Malawi. Health Research Policy and Systems 2011, Amsterdam: KIT Publishers; 2012. 9:4–13. 12. Ergo A, Paina L, Morgan L, Eichler R: Creating stronger incentives for 32. RBF4MNH Options Office: Inception report results based financing for high-quality health care in low- and middle-income countries. In USAID maternal and neonatal health (RBF4MNH initiative) 2012–2014. Lilongwe: maternal and child health integrated program. Washington D.C: USAID; 2012. Reproductive Health Unit; 2012. 13. Eichler R, Levine R, Performance-Based Incentives Working Group: 33. World Health Organization: Monitoring emergency obstetric care: a handbook. Performance incentives for global health - potential and pitfalls. Washington Geneva: World Health Organization; 2009. D.C: Center for Global Development; 2009. 34. Ronsmans C, Graham WJ: Maternal mortality: who, when, where, and 14. Ireland M, Paul E, Dujardin B: Can performance-based financing be used why. Lancet 2006, 368:1189–1200. to reform health systems in developing countries. Bull World Health 35. Filippi V, Ronsmans C, Campbell OM, Graham WJ, Mills A, Borghi J, Koblinsky Organ 2011, 89:695–698. M, Osrin D: Maternal health in poor countries: the broader context and a 15. Meessen B, Soucat A, Sekabaraga C: Performance-based financing: just a call for action. Lancet 2006, 368:1535–1541. donor fad or a catalyst towards comprehensive health-care reform? 36. World Health Organization: Everybody’s business: strengthening health systems Bull World Health Organ 2011, 89:153–156. to improve health outcomes: WHO’s framework for action. Geneva: World 16. Ranganathan M, Legarde M: Promoting healthy behaviours and improving Health Organization; 2007. health outcomes in low and middle income coutnries: a review of the 37. Donabedian A: The quality of care: how can it be assessed? JAMA 1988, impact of conditional cash transfer programmes. Prev Med 2012, 55:S95–S105. 260:1743–1748. 17. Giuffrida A, Gravelle H, Roland M: Measuring quality of care with routine 38. Conrad P, De Allegri M, Moses A, Larsson EC, Neuhann F, Müller O, Sarker M: data: avoiding confusion between performance indicators and health Antenatal care services in rural Uganda missed opportunities for outcomes. BMJ 1999, 319:94–98. good-quality care. Qual Health Res 2012, 22:619–629. 18. Basinga P, Gertler PJ, Binagwaho A, Soucat AL, Sturdy J, Vermeersch CM: 39. Andersen RM: Revisiting the behavioral model and access to medical Effect on maternal and child health services in Rwanda of payment to care: does it matter? J Health Soc Behav 1995, 36:1–10. primary health-care providers for performance: an impact evaluation. 40. Aday LA, Andersen RM: Equity of access to medical care: a conceptual Lancet 2011, 377:1421–1428. and empirical overview. Med Decis Making 1981, 19:4–27. 19. Rusa L, Ngirabega J de D, Janssen W, Van Bastelaere S, Porignon D, 41. Creswell JW, Clark PVL: Designing and conducting mixed methods research. Vandenbulcke W: Performance-based financing for better quality of 2nd edition. Thousand Oaks: SAGE Publications; 2010. Brenner et al. BMC Health Services Research 2014, 14:180 Page 17 of 17 http://www.biomedcentral.com/1472-6963/14/180 42. Shadish WR, Cook TD, Campbell DT: Experimental and quasi-experimental 70. Population Unit, Ministry of Development Planning and Cooperation: RAPID: designs for generalized causal inference. Belmont: Wadsworth Cengage population and development in Malawi. Lilongwe: Ministry of Development Learning; 2001. Planning and Cooperation; 2010. 43. Glaser BG, Strauss AL: The discovery of grounded theory: strategies for 71. White H: A contribution to current debates in impact. Evaluation 2010, qualitative research. Hawthorne: Aldine de Gruyter; 1999. 16:153–164. 44. Peabody JW: Comparison of vignettes, standardized patients, and chart 72. Morgan SL: Counterfactuals and causal inference: methods and principles for abstraction: a prospective validation study of 3 methods for measuring social research. New York: Cambridge University Press; 2007. quality. JAMA 2000, 283:1715–1722. 73. Puffer S, Torgerson D, Watson J: Evidence for risk of bias in cluster 45. Sarker M, Schmid G, Larsson E, Kirenga S, De Allegri M, Neuhann F, Mbunda randomised trials: review of recent trials published in three general T, Lekule I, Müller O: Quality of antenatal care in rural southern Tanzania: medical journals. BMJ 2003, 327:785–789. a reality check. BMC Res Notes 2010, 2010(3):209–315. 46. Gearing RE, Mian IA, Barber J, Ickowicz A: A methodology for conducting doi:10.1186/1472-6963-14-180 retrospective chart review research in child and adolescent psychiatry. Cite this article as: Brenner et al.: Design of an impact evaluation using J Can Acad Child Adolesc Psychiatry 2006, 15:126–134. a mixed methods model – an explanatory assessment of the effects of 47. O’Donnell O, Van Doorslaer E, Wagstaff A, Lindelow M: Analyzing health results-based financing mechanisms on maternal healthcare services in equity using household survey data: a guide to techniques and their Malawi. BMC Health Services Research 2014 14:180. implementation. In WBI learning resource series. Washington, D.C: The World Bank; 2008. 48. Donald SG, Lang K: Inference with difference-in-differences and other panel data. Rev Econ Stat 2007, 89:221–233. 49. Athey S, Imbens GW: Identification and inference in nonlinear difference-in-difference models. Econometrica 2006, 74:431–497. 50. Victora CG, Black RE, Boerma JT, Bryce J: Measuring impact in the millennium development goal era and beyond: a new approach to large-scale effectiveness evaluations. Lancet 2011, 377:85–95. 51. Shadish WR, Cook TD: The renaissance of field experimentation in evaluating interventions. Annu Rev Psychol 2009, 60:607–629. 52. Winship C, Morgan SL: The estimation of causal effects from observational data. Annu Rev Sociol 1999, 25:659–707. 53. Patton MQ: Qualitative research & evaluation methods. Thousand Oaks: SAGE Publications; 2002. 54. Bowen GA: Naturalistic inquiry and the saturation concept: a research note. Qual Res 2008, 8:137–152. 55. Tuckett AG: Qualitative research sampling: the very real complexities. Nurse Researcher 2004, 12:47–61. 56. Harachi TW, Abbott RD, Catalano RF, Haggerty KP, Fleming CB: Opening the black box: using process evaluation measures to assess implementation and theory building. Am J Community Psychol 1999, 27:711–731. 57. Oakley A, Strange V, Bonell C, Allen E, Stephenson J: Process evaluation in randomised controlled trials of complex interventions. BMJ 2006, 332:413–416. 58. QSR International Pty Ltd: NVivo qualitative data analysis software; 2012. Version 10. 59. Bamberger M: The evaluation of international development programs: a view from the front. Am J Eval 2000, 21:95–102. 60. Beane CR, Hobbs SH, Thirumurthy H: Exploring the potential for using results-based financing to address non-communicable diseases in low- and middle-income countries. BMC Public Health 2013, 13:92–100. 61. Hurley R: Funding aid according to outcomes can improve health in poor countries, seminar hears. BMJ 2011, 342:d2322. 62. Bamberger M: Introduction to mixed methods in impact evaluation. In Impact evaluation guidance notes. Washington D.C: InterAction; 2012. 63. White H: Theory-based impact evaluation: principles and practice. In 3ie working papers. New Delhi: International Initiative for Impact Evaluation (3ie); 2009. 64. Chen H-T, Rossi PH: The theory-driven approach to validity. Evaluation and Program Planning 1987, 10:95–103. 65. World Health Organization: Country cooperation strategy brief Malawi. Submit your next manuscript to BioMed Central Geneva: World Health Organization; 2013. and take full advantage of: 66. Prytherch H, Kagone M, Aninanya GA, Williams JE, Kakoko DC, Leshabari MT, Ye M, Marx M, Sauerborn R: Motivation and incentives of rural maternal and • Convenient online submission neonatal health care providers: a comparison of qualitative findings from Burkina Faso, Ghana and Tanzania. BMC Health Serv Res 2013, 13:149–163. • Thorough peer review 67. Ssengooba F, McPake B, Palmer N: Why performance-based contracting • No space constraints or color figure charges failed in Uganda: an ‘open-box’ evaluation of a complex health system intervention. Soc Sci Med 2012, 75:377–383. • Immediate publication on acceptance 68. Observatory AHW: Human resources for health country profile - Malawi. • Inclusion in PubMed, CAS, Scopus and Google Scholar Brazzaville: Republic of Congo; 2009. • Research which is freely available for redistribution 69. Victora CG, Wagstaff A, Schellenberg JA, Gwatkin D, Claeson M, Habicht JP: Applying an equity lens to child health and mortality: more of the same is not enough. Lancet 2003, 362:233–241. Submit your manuscript at www.biomedcentral.com/submit