WPS8622 Policy Research Working Paper 8622 Inequality of Opportunity in Education Accounting for the Contributions of Sibs, Schools and Sorting across East Africa Paul Anand Jere R. Behrman Hai-Anh H. Dang Sam Jones Development Economics Development Research Group October 2018 Policy Research Working Paper 8622 Abstract Inequalities in the opportunity to obtain a good educa- results show that although household factors account for tion in low-income countries are widely understood to be a significant share of total test score variation, variation related to household resources and schooling quality. Yet, in school quality and positive sorting between households to date, most researchers have investigated the contribu- and schools are, together, no less important. The analysis tions of these two factors separately. This paper considers also finds evidence of substantial geographical heterogeneity them jointly, paying special attention to their covariation, in schooling quality. The paper concludes that promoting which indicates whether schools exacerbate or compen- equity in education in East Africa requires policies that go sate for existing household-based inequalities. The paper beyond raising average school quality and should attend develops a new variance decomposition framework and to the distribution of school quality as well as assortative applies it to data on more than one million children in matching between households and schools. three low-income East African countries. The empirical This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/research. The authors may be contacted at hdang@worldbank.org and jones@wider.unu.edu. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Inequality of Opportunity in Education: Accounting for the Contributions of Sibs, Schools and Sorting across East Africa Paul Anand Jere R. Behrman Hai-Anh H. Dang Sam Jones* JEL: D6, H0, I2, O1 Key words: inequality of opportunity, education achievement, decomposition, household, school, sorting, Africa *Jones (jones@wider.unu.edu; corresponding author) is a Research Fellow with UNU-WIDER; Dang (hdang@worldbank.org; corresponding author) is an economist in the Survey Unit, Development Data Group, World Bank, and a non-resident senior research fellow with Vietnam’s Academy of Social Sciences; Behrman is the William R. Kenan, Jr. Professor of Economics at the University of Pennsylvania; Paul Anand is a Professor at the Open University and Research Associate in the Department of Social Policy and Intervention in Oxford University and the London School of Economics. We would like to thank Jed Friedman, Ha Nguyen, and Ron Smith for helpful comments on earlier versions. We are grateful to the UK Department of International Development for funding assistance through its Strategic Research Program (SRP) program.   1 Introduction The inability of some children in low-income countries to access quality schooling is a matter of concern, both for economic efficiency and social justice. If able children do not achieve their educational potential, countries face potentially significant losses against the counterfactual where all have an equal opportunity to develop their talents and skills. Likewise, there are good reasons based on social justice to ensure that development is as equitable a process as it can be (Sen, 2002). In a widely cited overview, Corak (2013) shows that interactions between family circumstances, labor markets and public policies ‘all structure a child’s opportunities’ and concludes there is a need to promote policies that promote children’s human capital in a way that offers relatively greater benefits to the relatively disadvantaged. Though focused mainly on higher-income countries, there is little reason to think these conclusions might not apply to lower-income countries. Indeed, international comparisons of educational achievement highlight large gaps between students from richer and poorer countries as well as substantial within-country gaps in both grade attainment and learning outcomes (e.g., Dabalen, 2015; Sandefur, 2018). Following Roemer (1996), promotion of equity in education has typically focused on tackling inequalities that can be traced to differences in opportunities, defined as the circumstances that lie beyond the control of individual children, rather than those due to effort or personal choice.1 Existing research in this domain has primarily operationalized inequality of opportunity in education (IOE) as the (proportional) contribution of the home environment to inequalities in educational outcomes. Studies from a range of higher income countries suggest that IOE is surprisingly high, with upwards of 40% of variation in schooling outcomes being associated with given household circumstances (Björklund and Salvanes, 2011). A more limited number of studies for developing countries also indicate that differences in family circumstances account for a material share of differences in both grade attainment and achievement (Ferreira et al., 2008; Ferreira and Gignoux, 2014). An important limitation of previous studies is that they mostly focus on capturing IOE via a single factor, such as home circumstances. However, family circumstances may be only one of several factors that can be considered part of the ‘given’ circumstances that children face. Access to quality schooling is a no less important factor that determines children’s educational achievement and yet, in general, this factor is also beyond the control of individual children and many (disadvantaged) households. Schools in developing countries                                                              1 See Roemer (1996, 2002), who notes that equality of opportunity is the most universally supported conception of justice in advanced societies. Within Sen’s (1985) framework it is possible to view household factors as not just having a direct effect on educational achievement but also as both helping a child access outside educational resources as well as conditioning the child’s ability to convert school quality into scholastic outcomes.   2    are often ill-equipped and some do not even meet their basic functionalities, with teachers sometimes not coming to school to teach (Glewwe and Muralidharan, 2016). In addition, even in richer countries, other characteristics related to the oftentimes exogenous organization and financing of school systems (e.g., use of ability grouping) have been shown to influence the magnitude of inequalities in final achievement (Rivkin et al., 2005; Hanushek and Rivkin, 2006). In this paper, therefore, we seek to provide a more comprehensive understanding of the magnitude and sources of inequality of opportunity in education. To do so, we develop a framework that jointly accounts for the contributions of both school and household factors, as well as their covariance, to variation in learning outcomes. The framework is then applied to a rich micro-data set for over 1 million children from three East African countries (Kenya, Tanzania and Uganda). As such, the paper makes three contributions to the literature on inequality of opportunity in education (IOE). Firstly, taking as a point of departure the idea that IOE may not only be attributed to family circumstances in a developing country context, we add to the literature by quantifying the distinct contributions of both households and schools to variation in learning outcomes. To our knowledge, this is the first paper to do so in a developing country context. In any case, the existing literature offers very few studies that examine both of these factors, particularly for several countries at the same time as attempted here; and, of the few that do, most focus on richer countries.2 Secondly, we advance the field by investigating the interactions (covariance) of school and household effects. In a purely theoretical IOE setting, these two types of effects may exist independently of each other and are exogenously given to the household. But in reality, even in a lower-income country context, some (richer) households may be able to select (better) schools through their (stronger) resources and social connections. Put differently, there is likely to be some sorting (assortative matching) between households and schools. Identifying the magnitude of this sorting effect can help policy makers reduce inequalities, for example, by setting school zoning or mobility policies that ensure similar chances of access to high- quality school for all households, including the most disadvantaged. Thirdly, we build on the existing variance-decomposition framework in the established IOE literature. An unresolved challenge in the present setting, where more than one unobserved factor is considered, concerns how to specify the between-factor covariance. Indeed, the ‘true’ nature of this covariance typically cannot be identified unambiguously from the underlying data; and different empirical approaches effectively adopt contrasting a priori assumptions regarding how the covariance is allocated across the factors.3 We propose a new                                                              2 We return to a more detailed discussion of the literature in the next section. 3 See, e.g., Gibbons et al. (2014) for a related study in the field of urban economics.   3    empirical procedure that not only permits simultaneous estimation of two unobserved fixed latent factors (household and school effects), but also allows alternative assumptions regarding their covariance to be handled in a transparent fashion. This provides (strict) bounds on the variance contributions of interest. Furthermore, in our empirical implementation, we validate (cross-test) our results showing that extreme assumptions about the between-factor covariance can be ruled out. The rest of this paper consists of four sections. Section 2 sets out a general framework to account for IOE, which also provides a window on findings of previous studies. It also describes the empirical strategy we use to identify the household and school factors, treating them both as unobserved effects, and proposes a simple mechanism that offers a unified variance decomposition strategy, covering the full range of alternative assumptions regarding the between-factor covariance. Section 3 applies the proposed approach to extensive test score data on over 1,000,000 school-aged children from three East African countries (Kenya, mainland Tanzania and Uganda). It compares variance-decomposition results from three choices of the initialization parameter, each of which corresponds to an intuitive characterization of the between-factor covariance. Using both unconditional and conditional models, we show that household and school circumstances jointly account for nearly half of the variance in test scores (normalized by age). However, confirming the limitations of past studies, households cannot be considered the primary or only source of IOE. More specifically, we find the upper-bound variance share attributable to schools is generally as large as the upper bound attributable to households, which in itself is indicative of a positive association between the latent factors. And under our preferred model specification, we find evidence of substantial positive sorting whereby higher ‘quality’ households tend to be matched to higher quality schools. In Section 4 we validate the main findings using alternative estimation procedures and investigate heterogeneity in the variance decomposition. We find systematic patterns in the level of inequality and the magnitudes of the variance components across different sub-groups and geographical locations, including a larger contribution of sorting in more disadvantaged locations. Section 5 concludes and reflects on our findings. 2 Analytical framework 2.1 Accounting model A general framework for the analysis of inequality in education splits the proposed process   4    generating (differences in) test scores into the effect of given circumstances (opportunities) and the effect of other factors (idiosyncratic effects, effort, preferences, etc.). This suggests an educational production function of the following form: ℎ , (1) Where t is a measure of educational achievement (e.g., test scores), and indexes i = (1, 2,…, N), j = (1, 2,…, H) and k = (1, 2, …, S) refer to individual children, families and schools respectively. Thus, f(ꞏ) captures the contribution of given household and school circumstances, and e captures remaining individual variation that is treated as orthogonal to the former; i.e., | , ℎ 0.4 Following our concern to parse out the respective contributions of households and schools to IOE, ℎ is defined as a comprehensive metric of the contribution of all factors shared by children in the same household (hereafter sibs); and is defined as a comprehensive metric of the contribution of the given school (and grade) to their learning.5 Note that under the assumption that not all sibs attend the same school and/or grade, the household factor does not nest the school factor; that is, they are crossed.6 Equation (1) defines test score levels, from which various metrics of inequality have been proposed. Within the domain of education, the simple variance of outcomes is widely used. As Ferreira and Gignoux (2014) note, the variance is ordinally invariant to standardization procedures often used to express test scores on a comparable scale. Also, the linear additive property of the variance makes it straightforward to isolate the contributions of individual components to the overall variance. However, even if we assume a linear form for f, the test score variance can be defined in various ways, depending on what assumptions are made about the relations between its constitutive elements. Table 1 describes four main cases of the relationship between h and s.7 Each row sets out an assumed underlying data-generating process (model), which incorporates specific assumptions about the level and variance of t. The model in the first row assumes the household and school factors make independent,                                                              4 This last assumption may seem strong, but facilitates our primary interest in identifying the relevant contributions of latent household and school factors. Without additional individual-level controls (see below) individual characteristics (e.g., ability) that are correlated with h and s will be absorbed by the latter factors. Even so, there are individual factors that are expected to be orthogonal to the estimated terms. For example, Behrman et al. (1994) use monozygotic (identical) and dizygotic (fraternal) twins in the United States to identify individual- specific factors (orthogonal to the household factors) and find these account for about a quarter of the variance in adult earnings. 5 Throughout, the dimension of the observed school and household effect vectors is taken to be N × 1, where N is the number of children (observations). However, the number of unique household and school effects is strictly less than N. Indeed, assuming no singletons, at most H + S – 1 < N fixed effects can be estimated. 6  This design stands in contrast to the conventional estimation of neighborhood effects (e.g., Solon, 2000), where neighborhoods nest households, meaning the variance contribution of the former cannot be larger than that of the latter (when treated as unobserved effects). 7 While a number of studies report conditional variances, Table 1 ignores any conditioning variables. However, in our empirical application these are included (see Section 3.2).   5    additive contributions to outcomes (test scores). In terms of the associated outcome variance, this imposes the assumption: E ℎ 0, which rules out the possibility of any correlation between the household and school effects. The zero-covariance assumption embedded in Row 1 appears restrictive. Sorting or clustering of households has been identified in a wide range of domains (Davidoff, 2005; Combes et al., 2008). Indeed, even if school quality were allocated randomly at time zero and held fixed thereafter, households’ demand for better schools may bid-up local house prices over time, stimulating residential sorting and generating a positive correlation between household income and school quality (e.g., Hanushek and Yilmaz, 2007). Other behavioral responses of families toward school quality support the possibility of sorting into schools and evidence from a number of countries suggests that certain teachers prefer to teach certain kinds of children or to reside in specific kinds of neighborhoods (Jackson, 2009, 2013; Pop-Eleches and Urquiola, 2013). Thus, pre-existing clustering of households by income or ethnicity may stimulate across-location sorting of teacher quality. In either case, the assumption of a zero covariance between household and school factors in Row 1 becomes untenable, and an unrestricted linear model would apply (Row 2). One interpretation of the unrestricted linear (sorting) model is that the household and school factors have no direct mutual effects – i.e., while they are separately pre-determined they become correlated through ex post processes of sorting. But this is not the only mechanism that could generate a correlation between household and school effects. Some part of the school effect may reflect the (mean) contribution of constituent households, such as when households make direct financial or time commitments to school functioning. This kind of mechanism also is suggested by certain versions of cream-skimming models, where average peer quality in a school (or class) is driven by household characteristics, which in turn directly influences individual achievement (Walsh, 2009). A strong version of this is captured in the third row of Table 1, which assumes that s can be partitioned into a component that is oblique or parallel to h and an orthogonal component ν, with own variance : (2) ℎ ∀ ∣ E ℎ 0 Applying this expression, the second column of Row 3 gives a strict upper bound on the variance contribution due to households.8 The corollary is given in Row 4, where household                                                              8 To go from equation (2) to the model in Row 3 we have made the simplifying assumption that hj is constant within each school/grade k. Where this is not the case, it can be shown that the variance contribution due to households will be of a somewhat smaller magnitude but remains an upper-bound.   6    effects are assumed to be (partial) reflections of pre-given school effects, plus an orthogonal component. Note that in both these cases, the observed covariance between household and school effects is attributed wholly to either one of the factors. In other words, the contribution of sorting to the variance of test scores is moot under the assumption of a direct (one-way) causal relationship between the two factors. In this sense, alternative ex ante assumptions about the structure of the covariance are used to pin-down the variance contributions of the two factors. Reflecting on Table 1, previous literature has frequently estimated IOE via some variant of the household upper-bound model (Row 3). Concretely, a number of studies treat family effects as a single fixed unobserved factor and omit any consideration of school effects. Björklund and Salvanes (2011) describe this approach and show how, under this set-up, the relative variance contribution of households is equal to the correlation in outcomes between siblings. The same authors summarize estimates of sibling correlations in various developed countries. Excluding estimates for twins, these range from 0.24 in former East Germany to over 0.60 in the USA. While many of these estimates are based on grades of completed schooling, Mazumder (2011) gives estimates of sibling correlations in various learning domains based on direct tests of children in the USA. His estimates are of the same broad magnitude, ranging from approximately 0.35 to 0.50. For the UK, Nicoletti and Rabe (2013) analyze results from compulsory national tests and find somewhat larger sibling correlations (>0.50). Estimates of sibling correlations in developing countries are scarce, mainly reflecting data constraints. An exception is Behrman et al. (2001), who find the sib correlation in terms of completed grades of schooling lies between around 0.30 and 0.60 across Latin American countries. To get around data constraints, an alternative approach is to identify a number of observed proxies for family effects, estimate their relationship to the outcome variable (using regression techniques), and then derive the variance of their fitted contribution(s). For example, Ferreira and Gignoux (2014) do so using ten variables as proxies for family background including: parental education, father’s occupation, access to books at home, and migration status. Similarly, Schütz et al. (2008) use the number of books at home as their main proxy for the effect of background variables. However, since this approach amounts to a partition of the household effect into an observed and unobserved component, the variance attributable to the observed component only can be expected to represent a partial upper bound. This is verified empirically – i.e., the variance contribution of family background estimated via sibling correlations is typically much higher than found in studies using observed proxies. This suggests that these observed factors, such as the level of parental education, are rarely comprehensive (see also Behrman and Rosenzweig, 2004; Freeman and Viarengo, 2014). School (or teacher) effects are of interest as they point to possible differences in school   7    quality. A large number of studies seek to assess the magnitude of such effects (e.g., Pritchett and Viarengo, 2015; Sass et al., 2012; Hanushek and Rivkin, 2006), in some cases adjusting for family background. However, studies of this sort are rare for low-income contexts,9 and even fewer attempt provide a multi-country analysis, as we do here. Furthermore, just a handful explicitly estimate the variance contributions of both schools and households (e.g., Carneiro, 2008), and since most such studies are school-based they generally rely on a relatively limited set of observed proxies for family background.10 For instance, Freeman and Viarengo (2014) use PISA data to investigate the (sources of) variance in school effects. They report that a regression of test scores on school dummies (only) explains around two-thirds of the variation in the data, while a limited set of observed family background variables accounts for just one-third, after controlling for school effects (also via dummies). The general point emerging from the above (albeit brief) review is that existing studies have generally focused on estimation of either the contribution of households or of schools to variation in educational outcomes. These are of substantive interest, but such estimates correspond to special cases where the contribution of any covariance (sorting) between these effects is effectively absorbed into the main factor under consideration. Furthermore, even in cases in which such upper-bound estimates are tightened by introducing additional controls, such as proxies for one of the sets of effects, past studies have not reported the full variance decomposition incorporating the implied covariance between households and schools. However, it is precisely this covariance that can be of critical interest: it tells us about the extent to which schools – or educational policies more generally – exacerbate or compensate for differences deriving from given family background. In light of this gap, the next section outlines how a more complete decomposition can be elaborated. 2.2 Decomposition methods Assuming that the household and school factors are not fully observed, decomposing their variance contributions is non-trivial. However, as hinted above, estimation of the upper bounds for each factor is straightforward and can be derived simply by treating each factor (separately) either as a fixed or as a random effect. Variants of this approach have been applied extensively (e.g., Raaum et al., 2006; Lindahl, 2011; Gibbons et al., 2014) and, under additional assumptions, also can be used to identify approximate lower bounds on the variance contribution of the second factor. Similarly, random effects (mixed-linear)                                                              9 See Behrman and Birdsall (1983) and Dang and Glewwe (2018) for two studies that respectively investigate school quality in Brazil and Vietnam. 10 School-based studies are frequently limited to one or two grades and therefore do not contain data on multiple siblings. However, such studies do tend to provide rich data on school-specific characteristics.   8    approaches may be used, but these also place specific restrictions on the covariance of the unobserved factors. In Section 3.3 below we implement such upper-bound and mixed-effects approaches. How- ever, for now, our interest is in a procedure consistent with an unrestricted linear model. This requires we treat both latent factors as fixed effects with a (possible) non-zero covariance – i.e., we are in a two-way fixed-effects setting. Simultaneous estimation of crossed factors raises distinct technical challenges. In most applications, including here, the dimensions of the effects are extremely high, and their design is unbalanced. Consequently, standard approaches, such as inclusion of a full set of dummy variables, are not computationally feasible. Also, some kind of normalization restrictions are required in order for the model to be identifiable due to model over-parameterization (Mittag, 2012). Following the contributions of Abowd et al. (2002), among others, various solutions to these challenges have been proposed. Guimarães and Portugal (2010) show how a partitioned iterative algorithm can be optimized to solve the normal equations of a least-squares problem including multiple high-dimensional fixed-effects. The algorithm avoids the problem of inverting a large sparse matrix and can provide direct estimates of the fixed-effects.11 Two more specific challenges arise if the properties of the fixed effects are of standalone interest, as here. First, as the fixed effects are estimated with error, estimates of their variances will be biased upwards. Thus, variance shares calculated directly from the fixed- effects estimates will tend to over-state their importance relative to the residual component (Koedel et al., 2015). Second, Andrews et al. (2008) demonstrate that the covariance of the estimated fixed-effects vectors tends to be biased downwards. This is driven by a quasi- mechanical relation, whereby if one factor (e.g., household effects) is over-estimated then on average the other factor (e.g., schools) will be under-estimated (also Andrews et al., 2012). Intuitively, this reflects the general problem of model over-parameterization; and the magnitude of bias tends to be larger where fewer observations are available to estimate each effect. Addressing these challenges remains an active area of research. Nonetheless, in Appendix A we set out the details of our proposed solutions. To correct for measurement error, we apply a conventional procedure that shrinks the variance contribution of estimated factors in accordance with the number of observations available to estimate each effect (see also Stanek et al., 1999; Koedel et al., 2015). To deal with downward bias in the between-factor covariance, we propose a novel approach. Looking ‘under the hood’ of the iterative algorithm reveals that a part of the bias is driven by how starting values for the fixed-effects are calculated. In previous applications, extreme                                                              11 Their procedure is implemented in Stata under the user-written reghdfe command (Correia, 2017).   9    initializations have been employed that apportion the (residual) variation in the outcome to a single (presumed dominant) factor, effectively treating the second factor as orthogonal. Our insight is that these starting assumptions about the fixed effects are important because they become effectively locked-in from one iteration to the next. That is, the starting values represent a crucial identifying assumption for the variance decomposition. Furthermore, we show that the assumed form of the initial between-factor covariance can be explicitly controlled via introduction of a single initialization parameter. Denoted ∈ 0, 1 , this parameter controls how the (residual) variation in the outcome is apportioned across the two factors to start the iteration procedure. In keeping with earlier discussion (Table 1), we hypothesize that corner values 0, 1 will correspond to upper-bound factor models, such as described in rows 3 and 4 of Table 1, which rule out between-factor sorting. However, an agnostic or midpoint choice 0.5 initially apportions the variation roughly equally across both factors. So, a corollary hypothesis is that this choice is likely to yield an upper bound on the magnitude of the between-factor covariance (sorting). As such, we do not propose a single correction for the potential bias in the estimates of the between-factor covariance. Instead, we provide a unified approach to the estimation of two-way fixed effects that makes it possible to impose – in a transparent way – alternative (starting) assumptions about the relationship between the two factors, including their covariance.12 As we show, this serves to bound the estimates of interest. 3 Application to East Africa The previous section set out a general factor model for thinking about IOE and a unified empirical approach to decompose the variance within a two-way fixed-effects context. In the remainder of the paper we implement and compare results from three main choices for the fixed-effects initialization parameter, , and validate our results by considering both alternative empirical methods and a wider range of choices for . Based on our preferred results, we also go on to investigate heterogeneity in the variance contributions. 3.1 Data Our application of these methods uses test score data from East Africa. Since 2010, the Uwezo initiative has undertaken large-scale household surveys in Kenya, mainland Tanzania and Uganda (for further details and comparison to other regional assessments see Uwezo,                                                              12 Note that extensions to more than two variables are possible in theory but add substantially to the number of covariance terms to be estimated, as well as the dimensionality of the choice space for the initialization process.   10    2012; Jones et al., 2014). The approach adopted by Uwezo has been inspired by exercises carried out in India by the Assessment Survey Evaluation Research Centre (ASER), which has surveyed the literacy and numeracy abilities of over 500,000 children each year since 2005. As with ASER, the target population of the Uwezo surveys has been children (residing in households) aged between the official starting-school age and 16. The surveys have been designed to be representative at both national and district levels, based on the administrative 13 classifications in the most recently available population census. Excluding the initial surveys, five rounds of the Uwezo surveys have been completed (2011-2015) and are used here. In each assessment, the surveys collected information at the household level covering household characteristics and the demographic and educational details of all resident children (e.g., age, gender, whether or not attending school, etc.). Within each household, the children of school age were individually administered a set of basic oral literacy and numeracy tests. These tests have been based on a common template, but have been tailored to each country and varied by survey round. Specifically, in each round and country, local experts have taken the template and developed item content to reflects competencies stipulated in the national curriculum at the grade 2 level. That is, the tests are anchored to skills that should be achieved by the majority of pupils after two years of completed schooling. The literacy and numeracy tests (the Uwezo tests) are described in detail in Jones et al. (2014). The literacy tests refer to national languages of instruction in which pupils are tested at the end of primary school – i.e., English and Kiswahili in Tanzania and Kenya; and just English in Uganda. Importantly, the Uwezo tests are not adapted to the children’s ages or their completed level of schooling. Given that they focus on basic competencies, it is thus unsurprising there are strong age- related differences, which affect both the level and variance of scores between age cohorts. From the present perspective, this between-cohort variation can be considered unwanted noise (see Mazumder, 2008).14 As a result, so as to construct an overall metric of achievement, we transform raw integer scores on the individual tests in three steps. First, each score is standardized by age, such that the individual test scores have means of zero and standard deviations of 100 for each age group. Second, we calculate weighted means of the age-standardized scores on each test, placing equal weight on the literacy and numeracy components. This gives a synthetic or overall test score, the primary outcome of interest hereafter. Third, to facilitate interpretation and to address potential differences in the test difficulty between countries and rounds, we normalize this measure for each country and                                                              13 In some survey rounds, however, administrative difficulties meant that certain districts could not be surveyed. Throughout, (adjusted) survey weights are used that take into account these implementation issues. 14 We recognize the contribution of different variance components may vary across age cohorts and we investigate this in our empirical analysis.   11    round such that the final standardized score has an overall mean of zero and standard deviation of one.15 Table 2 reports regional means and standard deviations of the test scores, calculated at each step. The first (column I) are the weighted means of the raw competency tests (before any standardization); column II reports the age-standardized versions; and column III reports the final measures. As can be seen, movement from the second to the third metric constitutes a simple monotone transformation. The test scores reported in Table 2 refer to the final sample used in the subsequent analysis, which pools all survey rounds. This is a slightly reduced sample of the original Uwezo data. Specifically, observations have been dropped that can be perfectly predicted using either household or school fixed effects – i.e., all singletons were removed. The objective of restricting the data in this way is to mitigate upward bias of the variance contribution estimates. As per the methodological discussion of Section 2.2, the analytical focus is on the variance components of the test score; and there is no evidence to suggest these dropped observations are distributed in a systematic pattern over regions or districts.16 The (sample) standard deviations of the test scores, which can be directly interpreted as normalized measures of educational inequality (Van de Werfhorst and Mijs, 2010; Hanushek and Wößmann, 2006), are reported in parentheses in the table. It merits noting that the rank position of each region according to its test score variance is largely preserved, regardless of the transformation applied. To implement the variance decomposition, the household and school indexes must be defined. The former is trivial – unique indexes are ascribed to all sibs in the same household in each year.17 The school effects are less straightforward. In the present data, detailed information about the particular school each child attends is not provided. Nonetheless, we can identify the grade of attendance and the location of the household. Consequently, for each enumeration area (containing approximately 20 surveyed households), we categorize children into one of three school-grade categories based on their highest grade of completed or current schooling – namely: those attending grades 1-2; grades 3-4; and grades 5 or higher. For children who have never attended school, we hypothesize that the quality of schools (teachers) available in their locale may have an effect on their ability in numeracy and literacy. This may work directly, through the choice not to attend school, as well as indirectly through both sibling and peer effects – e.g., what other children learn can spill-over to non- attenders. In order to allow for these effects in the data, we allocate never-attenders to the                                                              15 Due to the absence of equating or anchor items in the Uwezo tests, we cannot distinguish between changes in test difficulty over time and changes in (average) learning outcomes. However, from the perspective of a variance decomposition, standardization by year ensures that the variance contributions retain a consistent meaning in relation to the overall distribution of outcomes in each age cohort and year. 16 Details available on request from the authors. 17 The Uwezo surveys are cross-sectional in nature and no explicit attempt is made to track the same children over time.   12    median grade category (index k) of children with the same age and location.18 Additionally, individual-level controls can be included in the variance decomposition (see Section 3.3) to account for children’s specific educational status. Overall, this classification ensures that children within each household are unlikely to share the same school-grade index – i.e., the school-grade and household effects are crossed as opposed to nested. A downside is that if there are multiple schools of the same type in a given enumeration area, or children travel to another location to attend school, then children may be incorrectly treated as attending the same school. Consequently, school-grade fixed effects capture average school quality of a given type in a given location for each schooling level, but children may be subject to classification error.19 Descriptive statistics for the data set are reported in Table 3. This shows the number of unique children (i), households (j) and school-grade effects (k) covered in the data set. Additionally, the table reports summary statistics of child characteristics (age, gender), schooling status indicators, and a normalized measure of socio-economic status (SES) based on observed household assets. Overall, these indicate the sample is comprehensive and balanced (by age and gender).20 It also reveals that, although the vast majority of children are attending school, there are systematic differences in (mean) school status among regions within each country, and these seem to relate closely to differences in mean socio-economic status. For instance, in Kenya, there is an average difference of 1.5 grades between the (wealthier) Central and (poorer) North Eastern regions. These also appear to map into large differences in the magnitude of inequality in test scores. 3.2 Unconditional decomposition The first empirical model we implement closely follows the framework outlined in Section 2, incorporating only the two fixed effects of interest. As also described earlier, we focus on three initializations of our implementation of the partitioned iterative algorithm. These are: (1) π = 0, which first allocates variation in the outcome to the household effect and the residual to the school effect; (2) π = 0.5, which is agnostic about how the variation should be initially allocated; and (3) π = 1 which first allocates variation in the outcome to the school effect and the residual to the household effect. Since we are primarily interested in converging on (stable) estimates of the estimated fixed-effects, we stop each run of the algorithm when the estimated scalars on the plugged-in fixed effects (see Appendix A) are both not significantly different from one, and the absolute change in root mean square error between                                                              18 We examine the robustness of this set-up in our empirical analysis. 19 Where the ‘true’ schooling effects are mutually independent, then misclassification error in the definition of the fixed-effects would attenuate their estimates, biasing the variance contribution downward. 20 Average ages are higher in Tanzania as the starting school age is seven, compared to six in the other countries.   13    iterations falls below 0.01. Convergence is typically achieved in less than 20 iterations. A summary of the main results is set out in Table 4, which reports the absolute and relative variance contributions respectively. Four main insights stand out. First, the different initializations have the hypothesized effects, with the variance components varying in the directions expected. Looking more specifically at how these contributions change, we note that the household and school variance contributions, but not their covariance, vary monotonically with changes in π, with the extreme values for π corresponding to upper/lower bounds (as hypothesized). Concretely, when π = 0 the household share is largest (at around 35% in relative terms), the household-school sorting component share is close to 0%, and the school share is moderate (around ¼ of the household share or 7%). When π = 1, the magnitudes of the household and school shares are roughly reversed, although the household component remains somewhat larger, at around 13%; but the sorting component continues close to zero. This switch directly reflects the opposite ways in which the between-factor covariance is allocated – i.e., under the extreme choices we assume either all covariance emanates from the causal effect of households on schools (π = 0), or it goes vice versa (π = 1). In contrast to these two extreme choices, the agnostic initialization does not assume one effect is uniquely driven by the other (see Table 1); rather, it retains the covariance (sorting) term as a separate and substantial contribution. In turn, the results show this estimator does not yield such a substantial mechanical negative covariance bias as is also associated with more conventional two-way fixed-effects estimators (Andrews et al., 2012; Gaure, 2014). Indeed, in our case, this initialization indicates sorting between households and schools accounts for up to around 8% of the variation in test score outcomes (or an effect size of 0.28 standard deviation units). Comparable estimates for the sorting contribution in other contexts are rare; but the magnitudes found here are broadly in line with the un-shrunken estimates for Carneiro (2008) for Portugal. The immediate implication is that the allocation of children and/or teachers to schools tends to aggravate rather than compensate for existing (familial) inequalities. In turn, this suggests there is ample scope for policies to enhance access to schools of the same quality. Second, the results suggest that households are not necessarily the primary source of IOE. Of course, such a conclusion naturally holds under the household upper bound model (π = 0), but we have highlighted that this pertains to specific assumptions that rule out the contribution of sorting. When sorting half-way between the extreme cases is explicitly allowed (π = 0.5), we find households and schools make approximately equal contributions to outcome inequality, accounting for around 15% of the total variance each. That is, by   14    excluding the contribution of sorting, previous studies using household upper bound models may well have over-stated the relative importance of households to outcome inequalities. In turn, and contrary to the proposition that schools contribute little to test score outcomes, we find differences among schools are a material source of inequalities in educational achievement. Moreover, the magnitude of the variance attributable to schools (grades) is not an order of magnitude lower than that found elsewhere (Hanushek and Rivkin, 2012; Azam and Kingdon, 2015), including studies that report upper bound school variance estimates (e.g., Pritchett and Viarengo, 2015). Third, summing together the estimated variance contributions, we note that IOE is substantial across all countries. The residual or unexplained component, which roughly captures effort and preferences for education, accounts for a little over half of the total outcome variance. This indicates that equalizing educational opportunities would be expected to reduce the variance of test score outcomes by at least 40%. In relation to existing literature on IOE, these magnitudes are substantial. And it is clear that a part of this may relate to the more comprehensive approach we have adopted – i.e., we cover multiple sources of IOE and do not exclusively rely on (partial) observed proxies. Again, an implication is that previous studies may well be underestimating IOE. Fourth, while the broad pattern of results described above holds across the three countries, there are also some differences. Figure 1 shows that the relative variance estimates for Tanzania are more distinctive, suggesting a generally larger contribution of household and sorting effects, and a smaller contribution from schools. A complete interpretation of these differences falls outside the scope of the current analysis. However, it hints at heterogeneity that may extend below the national level, to which we turn later in Section 3.5. 3.3 Conditional decomposition The specification considered above abstracts from a range of (observed) child characteristics, such as gender and school enrollment status. Where these are correlated with either the school or household fixed effects, the previous estimates may be contaminated by omitted variables bias. To address this, we extend the simple unconditional model in two directions. First, we add an individual-specific component. Without longitudinal data, we cannot treat this as a latent variable. Instead, we partition this component into observed and unobserved parts: ; where the vector xi contains five dummy variables that take a value of one if: children are female, they are the first born (oldest observed child), they are currently enrolled in school, they have never enrolled, and they are attending private school. The unobserved individual component remains in the error term. Second, the preceding definition of school fixed effects is somewhat crude. Specifically,   15    children who have never attended school are allocated to the same unit (school-grade effect) as their local peers; and we do not distinguish between school types (private versus public). Increasing the number of school fixed effects is problematic due to the limited number of available observations. Nonetheless, we can extend the specification to allow each of the existing school-grade effects to vary by a multiple (i.e., to be scaled upwards or downwards) across children attending public schools, children attending private schools, and never attenders. Putting these extensions together yields the following empirical specification: ℎ 1 (3) where and are both elements of x, being the dummy variables for never attenders and private school pupils respectively. Note this specification nests a test for whether there is any spillover of school quality from local peers attending school to non-attenders – i.e., positive spillovers obtain as long as 1; and we can also test for the extent to which variation in school quality between public and private schools is correlated across locations – i.e., we cannot reject that they are correlated if ̂ 0. Under our proposed partitioned iterative algorithm (see Section 2; Appendix A), inclusion of these interaction terms is straightforward. For the purposes of the variance decomposition, however, the additional terms demand consideration of multiple extra covariance terms. To simplify matters, for each individual we calculate the individual-specific aggregate or final value for the school fixed effect ( ), which absorbs the estimated contribution(s) of the interaction terms. For instance, in the case of never attenders, the final estimated school effect is calculated as: ̂ 1 . Using these final school effect estimates, the remaining covariance terms are subsequently estimated. Thus, the variance decomposition we report now contains seven elements: Var ≡ 2Σ 2Σ 2Σ (4) 2Σ 2Σ 2Σ (5) 1 Turning to the results, it is informative to begin with the regression output from the three estimators. These are summarized in Tables B1-B3 in Appendix B, treating each of the three countries separately. For purposes of comparison, column (1) is a naïve estimator, which is just an OLS regression of the baseline model ignoring the fixed effects; column (2) reports results from a conventional implementation of the partitioned iterative algorithm incorporating the two fixed effects of interest (based on the user-written reghdfe command in Stata, due to Correia, 2017), where the household effect is initially swept out of the regression (for speed). Columns (3) to (5) report the regression results associated with the three   16    alternative initializations, now including the interaction terms. Three main points merit attention. First, the estimated regression coefficients and the overall coefficients of determination (R-squared) change materially when moving from columns (1) to (2), indicating the latent fixed effects are both relevant and correlated with the observed covariates – e.g., in Kenya, introduction of the fixed-effects leads to a more than 50 percentage point increase in the model R-squared. Second, comparing results across columns (2)-(5), the reported regression coefficients are not statistically indistinguishable from each other and the R-squared statistics do not change.21 Together, this indicates that when the fixed effects are not of inherent interest (e.g., are to be treated as nuisance parameters), then the choice of initialization is unimportant. That is, the initial allocation of the covariance across the fixed-effects is material for the variance decomposition (as established in Section 3.2), but not for the levels regression estimates. Additionally, the similarity of the coefficient estimates supports the specific implementation of the iterative algorithm we have employed here – i.e., regardless of the initialization parameter, the procedure yields regression coefficients that are consistent with established two-way fixed effects approaches. Third, the coefficients on the interaction terms are generally negative and statistically significant. In particular, and as might be expected, the school effects are substantially scaled downwards (shrunk toward zero) for children who have never attended school – e.g., under the midpoint initialization, the school effect is halved in Uganda for those without school experience. However, in no case does the final school effect approximate zero, suggesting the presence of some learning spillovers from attenders to non-attenders. It follows that improvements in school quality may well have a broader effect on achievement beyond children directly exposed to any improvements. At the same time, we find much less of a systematic difference in the (level of the) school effects between public and private schools in the same locations. This implies these effects are correlated – we tend to find relatively better (worse) private schools alongside relatively better (worse) public schools; and that there is a roughly similar amount of heterogeneity in private school and public school quality (see further below). Nonetheless, note there remain marked differences in the average level contributions of private schools. This is given by the coefficient on the private attendance variable, which is positive in all countries and ranges from 0.10 to 0.20 standard deviation units. Sticking with the estimates from the partitioned iterative algorithm based on the extended specification, Table 5 reports the associated variance decompositions, in which the joint contribution of the individual terms is aggregated (see Appendix Tables B4-B6 for the                                                              21 Estimates for the ‘never enrolled’ term do vary across estimators. However, this is due to the inclusion of the interaction term. When this is dropped there are no remaining differences. (Results available on request.)   17    complete disaggregation). An immediate observation is that the previously-excluded contribution of the individual terms is material, accounting for between 4% and 9% of the total test score variance (or 0.19 to 0.30 standard deviation units). Furthermore, the extended specification suggests the contributions of both the household and school factors are moderately smaller in comparison to unconditional results. For example, in Uganda the upper bound estimate ( 0) for the household effect contribution drops from 32% (0.57 s.d. units) to 28% (0.53 s.d. units) when we condition on the individual characteristics. Similarly, the sorting contribution at the mid-point initialization also marginally declines under the conditional model to around 6% (0.13 s.d. units). At the same time, it is clear that only the household and school variance contributions, plus the corresponding sorting term, are sensitive to the choice of initialization. Thus, consistent with the regression outputs (Tables B4-B6), the contributions of the individual component and the residual remain stable across the values of π. In sum, these results suggest that there is substantial continuity between the unconditional and conditional (extended) models, but the latter provides a more nuanced picture of the variance components in which some variation due to individual characteristics is permitted. Moreover, the main substantive insights from the unconditional model are retained here. Namely, we find that IOE is substantive and is unlikely to be only due to the household component. 4 Validation and Further Analysis The previous subsections reported results from our proposed general estimator of a two-way fixed-effects model, which provides a practical and unified framework for the decomposition of variance under alternative covariance assumptions. As hypothesized, the results effectively bound the main estimates of interest: π = 0 gives upper bound estimates for the household contribution; π = 0.5 gives upper bound estimates on the sorting component; and π = 1 gives upper bound estimates for the school contribution. To validate these results, we now pursue two strategies. First, we consider a broad range of values for π (increasing at intervals of 0.1) and plot the corresponding relative variance contributions. These results are shown in Figure 2 and confirm that the chosen values do yield upper/lower bounds on the main components. Furthermore, the figures indicate that the lower bound household share in all countries is moderately greater than the lower bound school share; and that the sorting component follows a shallow inverted-U shape. Indeed, we note that π = 0.5 only corresponds to an approximate upper bound on the between-factor variance. Marginally larger point estimates are obtained when π = 0.4 in two countries (Kenya and Uganda), however such differences are within sampling variation. In fact, broadly similar estimates for the contribution of sorting are found across the range π ∈ [0.33, 0.66]. This supports the notion   18    that a positive and non-trivial contribution of sorting obtains over a material domain of the choice space. Second, we compare our results to those obtained from three alternative approaches.22 Firstly, we consider a standard household upper bound model (denoted UBH), estimated via a high-dimensional fixed-effects estimator (also a partitioned algorithm), but where only a single fixed effect for households is included. In doing so, the contribution of schools is initially ignored. However, consistent with the data-generating process set out in Row 3 of Table 1, the remaining orthogonal contribution associated with schools can be derived by taking school-wise averages of the estimated residuals from the previous step regression. This amounts to a two-step estimation process. For the second approach, an upper bound on the school effects (model UBS) can be estimated in similar fashion. Given the available data, a distinctive feature here is we are able to include observed household covariates (for a similar approach see Solon, 2000; Raaum, 2006). Splitting out the latent household effect into observed and unobserved components: ℎ ℎ ℎ ℎ , a first-step model corresponds to the following specification (see also Carneiro, 2008): ̃ (6) where ̃ ℎ Explicit inclusion of the observed household effects in equation (6) means the school effects can be considered as being conditional on these proxies; thus, their estimated variance should yield a tight(er) upper bound. Deriving the unobserved household component from the first- stage residuals as before – i.e., ℎ 1/ ∑ | ̃ , this approach implies an estimate for the lower bound variance contribution of household factors as follows: Var ℎ ′ . As with the first model, with estimates of each component in hand, the corresponding covariance terms also can be derived. The third procedure adopts a different strategy. Recognizing that the unobserved household component omitted from equation (6) (later derived from the residual) can be considered a random effect, simultaneous estimation using a mixed-linear model is viable, treating both unobserved components (ℎ , ) as random effects. In this case, the empirical model to be estimated is: ℎ (7)                                                              22 As indicated below (see equation 6), our implementations of the alternative approaches omit the interactions with the school effect. However, this does not materially affect the broad direction of results.   19    and where the square brackets ⋅ indicate effects treated as random latent variables. A potential drawback of this strategy is that some parametric structure on the random effects and their covariances must be imposed. In principle, the two random effects can each be treated as correlated with the elements included as fixed effects (e.g., vector ). However, in practice, such unrestricted covariance structures are not only computationally intensive but also show poor convergence properties in large datasets (Gurka, 2011; Chirwa, 2014). Thus, typically at least some covariance restrictions do need to be applied in the estimation procedure; and the random effects must be treated as mutually orthogonal. Together, these assumptions mean that the population variance components estimated via a mixed-linear model may not adequately approximate those of the true data-generating process under the linear unrestricted model. Offsetting this concern, we note that the properties of the predicted effects from a mixed- linear model do not necessarily reflect the restrictions imposed on the population model. As discussed in Bates (2010) (also Stanek et al., 1999), the best linear unbiased predictors (BLUPs) of random effects in these estimators are derived from the residuals of the estimated model and can be understood as conditional means (i.e., being conditional on the observed data and model parameter estimates). Consequently, since they represent an (optimal) approximation to the unit-specific values of the latent variables in the observed sample, they do not necessarily conform to the properties assumed to hold in the population. For example, as shrinkage methods are applied in derivation of the BLUPs, the variance of the predicted conditional means of the random effects tend to be lower than the corresponding population variance estimates. Similarly, and as we find here, in order to provide an optimal fit to the actual data, the same BLUPs may not be mutually orthogonal. This suggests that in the present case, whereas imposing a zero covariance restriction may be too strong, a variance decomposition based on the properties of the random effects BLUPs and estimated fixed-effects from a mixed-linear model can nonetheless remain informative (and less restrictive) for the specific sample in hand. Appendix Tables B4-B6 compare results from our modified partitioned iterative algorithm (as earlier) and the three alternative estimators described in this subsection. In all cases, the household and school upper bound variance components, derived from the (two-stage) UBH and UBS models respectively, are highly comparable in magnitude to the same estimates from the corresponding modified partitioned iterative algorithm. For instance, in Tanzania, the absolute variance contribution due to households when π = 0 is equal to 0.59 standard deviation units, and 0.58 under the UBH model. Similarly, in Kenya, the school variance contribution is 0.53 units when π = 1, and 0.51 units under the UBS estimates. In other words, these results confirm that extreme choices for π map to corner assumptions about how the between-factor covariance is allocated.   20    Despite the similarity of the upper-bound estimates, the corresponding lower bounds in each model are not so similar. Again, taking the example of Tanzania, the estimate for the school variance component is 0.12 standard deviation units under the (two-step) UBH model but is 0.25 for π = 0. Effectively, since the individual and residual terms are (on aggregate) also similar under these two models, the difference is found in the covariance term, which is systematically positive under the two-step estimates but is always much closer to zero under the extreme initializations of the modified partitioned iterative algorithm. This supports the contention that the latter approach produces systematically smaller estimates of the sorting component. Furthermore, estimates from these extreme choices do not appear very credible. Recall the UBS model includes proxies for the household terms, and this model yields estimates for the magnitude of sorting that are always positive and significantly greater than those based on the extreme initializations. For instance, the between-factor correlation (ρ) in Table B4 (Kenya) is 1% when π = 0 but is 9% under the two-step procedure (model UBH). That is, as the two-step upper-bound model points to the presence of positive sorting, the absence of this finding under the extreme initializations of the partitioned iterative algorithm support the contention such initializations suffer from a mechanical negative bias on the sorting (covariance) term. Finally, estimates based on the BLUPs from the mixed-linear model closely resemble the results from the agnostic initialization across the various components. This is perhaps most stark for the sorting component, which is also larger in magnitude under this approach than found under either the UBH or UBS methods (as well as for the extreme initializations). Admittedly, the contribution of the school effects is somewhat larger under the mixed-linear model compared to the agnostic initialization, but this in part reflects the influence of the interaction terms. Thus, while we are not in a position to claim the agnostic initialization corresponds to the ‘true’ data generating process, it appears to provide the most reasonable and well-supported (conditional) variance decomposition in the present case. 4.1 Sub-group heterogeneity Having validated our proposed methodology, we now investigate the presence of heterogeneity in the variance components, focusing on the preferred agnostic initialization (upper bound sorting model) with the extended specification. To do so, we re-run the variance decomposition based on the earlier regression estimates but now, instead of reporting whole-sample results, we stratify individuals according to various individual characteristics (gender, age group, schooling level, attendance of public/private school, SES status, and maternal schooling).23 Appendix Tables B7–B9 report these findings.                                                              23 This is based on the same aggregate (regression) results reported previously, the difference being that the   21    Notwithstanding differences among the three countries, there generally appear to be systematic differences in the magnitude and (relative) contributions of the components to learning inequality. In particular, we observe greater inequality across boys (versus girls), among children in lower grades (or out of school) and among children from poorer households. There is also evidence that IOE is (relatively) more significant among younger children – e.g., in Kenya the residual accounts for 55% of the variance for children aged 6– 8, but over 66% for those aged 12 and above. Also, with the exception of Uganda, IOE due to schooling tends to be larger in both absolute and relative terms among children attending public as opposed to private schools – e.g., in Kenya, the variance contribution of schools is 0.38 standard deviation units (17%) for children attending public schools, but 0.30 (13.5%) for children attending private schools, implying less heterogeneity in learning outcomes across schools in the private sector.24 Looking more closely at children who have not attended school (indicated in the tables by the level zero of the grade category), two features are of interest. Consistent with the regression coefficient interaction terms reported in Tables B1-B3, the (relative) variance contribution of schools among non-attenders is considerably lower than for attenders. Even so, we note the sorting term remains positive and significant for the non-attenders, being in fact larger in magnitude (absolutely and relatively) in Kenya than for school attenders. This indicates a direct effect of school quality on the decision to (ever) enroll in school, which seems to disadvantage children from less privileged backgrounds. Figure B2 and Appendix Tables B10-B13 confirm the smaller contribution of schools as well as a larger (relative) contribution due to households among non-attenders versus attenders for alternative choices of the initialization parameter, now using only sub-samples of children for which all children in the same family share the same schooling status. Moreover, since the magnitude of learning inequalities is considerably lower among children who attend school than among those who do not, we conclude that even in low-income contexts, such as found in East Africa, access to schooling does go some way to addressing learning inequalities. 4.2 Spatial heterogeneity A further form of heterogeneity is among geographical locations. Arguably, this is particularly relevant from the perspective of policy as it speaks to the possibility for targeted interventions. This is also motivated by the educational differentials within each country, shown in Table 2, which indicate large differences in the mean levels and variances                                                              variance components are simply calculated separately for each sub-group. Other stratifying variables were also examined, such as the survey year, but were not found to be of substantive interest (results available on request). 24 What may account for this finding lies beyond the scope of the present paper. However, private schools are more common in urban areas, where there is greater competition (choice) in school provision.     22    of test scores across regions, and which in cases are larger than those due to low socio- economic status alone. In this spirit, Tables 6 and 7 report the variance decomposition at the regional level, using the preferred choice π = 0.5, and where we also report the upper bound estimates for the household and school effects (first two columns). As above, we find substantial heterogeneity in the size and relative importance of the different factors. This is most stark in Kenya where the absolute variance contribution of different factors can vary by a factor of around four. For instance, under the preferred agnostic initialization, the absolute contribution of household factors in the Central region is 0.30 standard deviation units, which is smaller than the absolute contribution of sorting in the North Eastern region at 0.36 units. Similarly, we see very large differences in the variance contribution of schools, ranging from 0.26 (Central) to 0.47 units (North Eastern). When considered in relative terms (Table 7), these differences are less pronounced; but, even here there remain material differences in the contributions of schools and sorting between regions (e.g., the sorting component accounts for a minimum of 2% and maximum of 7% across Kenyan regions). We also note that the regions containing the capital city of each country (defined here as Central in Kenya and Uganda, Dar Es Salaam in Tanzania) tend to display comparatively low overall test score inequalities as well as larger contributions from the household effects and smaller contributions from both school and sorting effects (in relative terms). This is consistent with the notion the capital cities provide more equal access to schools of a similar quality. Finally, we proceed to an analysis at the district level. Figure 3 illustrates the cumulative distributions of the relative variance shares for various components taken from the preferred estimates. These confirm substantial variations within each country, but they also suggest country-specific differences continue to be evident, especially in the contributions of the household and school effects – i.e., the distribution functions display (near) first- order dominance. To investigate whether these geographic differences are systematic, we regress the absolute and relative variance shares of the same components against a number of district-level characteristics (essentially, means taken from the same dataset). This analysis, which is intended only to indicate conditional correlations, is found in Table 8, treating all countries together. Three points should be highlighted. First, a part of the heterogeneity in the variance decomposition seems systematic. Aside from the country fixed effects, whether the mother has any schooling and whether the child attends private school show significant correlations across the different variance components. In absolute terms (super-column I), lower maternal schooling is associated with larger contributions from all components – i.e., a higher prevalence of maternal schooling maps to lower educational inequalities, which is also   23    consistent with the same children being out of school. Districts with a higher prevalence of children attending private school show higher inequalities associated with household factors, but lower relative inequalities associated with schools. The latter provides further indicative evidence that (greater) competition across schools may help narrow the distribution of schooling effects. Furthermore, and as indicated by the interaction term, this effect seems most acute at lower grade levels – i.e., there is substantial homogeneity in the contribution of private primary schools to learning outcomes in early grades. Second, and relatedly, sorting effects appear be larger, in both absolute and relative terms, in districts with a higher prevalence of disadvantaged households (i.e., those with lower average SES values and a higher proportion of mothers with no education). The data do not indicate what lies behind this – e.g., it may be due to greater clustering or residential segregation in these districts, or a less equal distribution of school quality. However, this result reinforces the idea that a deeper investigation of sorting processes may be helpful. Equally, and third, country differences remain persistent after controlling for other covariates. One interpretation is that such persistence can be explained by differences in the overall organization of schooling systems (macro-policies) and that these differences are material for educational inequalities. This would be in line with previous studies that find policy differences, such as ability-tracking and the allocation of teachers among schools, can contribute to national differences in educational inequalities (Hanushek and Wößmann, 2006; Van de Werfhorst and Mijs, 2010). And this agenda merits further attention in the context of East Africa. 5 Conclusion The purpose of this paper is to shed light on the sources of educational inequalities in East Africa. Starting from the proposition that household circumstances, school factors and assortative matching (sorting) are all potentially important components of IOE, we seek to parse out their respective contributions to the observed variation in test scores. To do so, we review various variance decomposition procedures and suggest how a partitioned iterative algorithm can be used to estimate the relevant variance components, treating the two effects of interest as fixed latent variables. In order to address technical challenges of estimation, namely the problem of a mechanical negative bias in the correlation between the fixed- effects, we elaborate a unified procedure that controls how the fixed effects are estimated (initialized) and that maps to alternative assumptions about the between-factor covariance. We apply the approach to rich test score data from East Africa from which three main findings stand out. First, we confirm that how the fixed effects are initialized under a partitioned iterative algorithm matters for subsequent estimates of the variance components.   24    At the same time, the choice of initialization does not affect the regression estimates (coefficients) for the included observed covariates. This implies the choice of initialization is not material when the fixed effects are considered nuisance parameters; but, if scholars wish to extract and interpret the latent fixed effects, the initialization choice is fundamental. Second, our proposed unified approach to estimation, in which different initializations of the latent fixed effects are examined, provides bounds on their respective variance contributions. That is, extreme choices of the initialization parameter yield upper and lower bounds on the household and school variance contributions; and a midpoint (agnostic) initialization provides an (approximate) upper bound on the between factor covariance, which is interpreted as the contribution of sorting. Methodologically, these insights are validated using both conventional (single) fixed effects and mixed-linear models. Consistent with the existing literature, we confirm that the extreme (corner) initializations of the fixed effects bias the covariance term toward zero. However, we show the agnostic (midpoint) initialization substantively mitigates this problem and provides estimates of sorting that are material, positive, and comparable in magnitude to those from a mixed-linear model. Third, taking the agnostic initialization as our preferred approach, we find that household factors are an important source of inequality in educational opportunity. However, when school effects and sorting are also accounted for, family effects are not decisive and contribute only around 15% to the variance in outcomes. Indeed, unlike the findings of the neighborhood- effects literature (e.g., Solon, 2000), we find the combination of school and sorting components is generally larger than the standalone contribution of households. This means that conventional upper-bound estimates of the contribution of household factors (e.g., as captured by simple sibling correlations) may well overstate the unique contribution of family circumstances. Despite low average learning outcomes, it also indicates that variation in school quality is substantial and that positive sorting (matching) between households and schools aggravates extant learning inequalities. This conclusion is supported by evidence of substantial spatial heterogeneity in the variance components, in which regional differences in poverty and parental education play a role. Pulling these findings together, we find that inequality in educational opportunity is substantial, accounting for almost half of all variation in test scores. However, given the importance of schools and sorting within this total, it follows that educational (school) reforms that alter the distribution of school quality, such as via the allocation of teachers across schools, can enhance opportunities for the most disadvantaged.   25    References Abowd, J.M., Creecy, R.H. and Kramarz, F. (2002). Computing Person and Firm Effects Us- ing Linked Longitudinal Employer-Employee Data. Longitudinal Employer- Household Dynamics Technical Papers 2002-06, Center for Economic Studies, U.S. Census Bureau. Ahn, S. and Fessler, J.A. (2003). Standard errors of mean, variance, and standard deviation estimators. Technical report, University of Michigan, EECS Department Paper. Andrews, M.J., Gill, L., Schank, T. and Upward, R. (2008). High wage workers and low wage firms: negative assortative matching or limited mobility bias? Journal of the Royal Statistical Society: Series A (Statistics in Society), 171(3):673–697. —— (2012). High wage workers match with high wage firms: Clear evidence of the effects of limited mobility bias. Economics Letters, 117(3):824–827. Azam, M. and Kingdon, G.G. (2015). Assessing teacher quality in India. Journal of Development Economics, 117:74–83. Bates, D.M. (2010). lme4: Mixed-Effects Modeling with R. New York: Springer.  Behrman, J.R. and Birdsall, N. (1983). The quality of schooling: quantity alone is misleading. The American Economic Review, 73(5):928–946.  Behrman, J.R., Gaviria, A. and Székely, M. (2001). Intergenerational mobility in Latin America. Economía, 2(1):1–31. Behrman, J.R., Rosenzweig, M.R., and Taubman, P. (1994). Endowments and the allocation of schooling in the family and in the marriage market: the twins experiment. Journal of Political Economy, 102(6): 1131-1174. Behrman, J.R. and Rosenzweig, M.R. (2004). Returns to birthweight. Review of Economics and Statistics, 86(2):586–601. Björklund, A. and Salvanes, K.G. (2011). Education and family background: Mechanisms and policies. In E. Hanushek, S. Machin and L. Wößmann (Eds.), Handbook of the Economics of Education, volume 3, chapter 3, pp. 201–247. Elsevier. Carneiro, P. (2008). Equality of opportunity and educational achievement in Portugal. Portuguese Economic Journal, 7(1):17–41. Chirwa, E.D., Griffiths, P.L., Maleta, K., Norris, S.A. and Cameron, N. (2014). Multi- level modelling of longitudinal child growth data from the Birth-to-Twenty Cohort: a comparison of growth models. Annals of Human Biology, 41(2):168–179. Combes, P.P., Duranton, G. and Gobillon, L. (2008). Spatial wage disparities: Sorting matters! Journal of Urban Economics, 63(2):723–742. Corak, M. (2013). Income inequality, equality of opportunity, and intergenerational mobility. Journal of Economic Perspectives, 27(3): 79-102. Correia, S. (2017). reghdfe: Stata module for linear and instrumental-variable/GMM   26    regression absorbing multiple levels of fixed effects. Technical Report Statistical Software Components s457874, Boston College Department of Economics. URL https:// EconPapers.repec.org/RePEc:boc:bocode:s457874. Dabalen, Andrew, Ambar Narayan, Jaime Saavedra-Chanduvi, and Alejandro Hoyos Suarez, with Ana Abras and Sailesh Tiwari (2015). Do African Children Have an Equal Chance? A Human Opportunity Report for Sub-Saharan Africa. Directions in Development. Washington, DC: World Bank. doi:10.1596/978-1-4648-0332-1. Dang, H.-A. and Glewwe, P. (2018). “Well Begun, But Aiming Higher: A Review of Vietnam’s Education Trends in the Past 20 Years and Emerging Challenges”. Journal of Development Studies, 54(7): 1171-1195. Davidoff, T. (2005). Income sorting: Measurement and decomposition. Journal of Urban Economics, 58(2):289–303. Ferreira, F.H. and Gignoux, J. (2014). The measurement of educational inequality: Achievement and opportunity. The World Bank Economic Review, 28(2):210–246. Ferreira, F. H., Molinas Vega, J. R., Paes de Barros, R., & Saavedra Chanduvi, J. (2008). Measuring inequality of opportunities in Latin America and the Caribbean. The World Bank. Freeman, R.B. and Viarengo, M. (2014). School and family effects on educational outcomes across countries. Economic Policy, 29(79):395–446. Gaure, S. (2014). Correlation bias correction in two-way fixed effects linear regression. Stat, 3(1):379–390. Gibbons, S., Overman, H.G. and Pelkonen, P. (2014). Area disparities in Britain: Under- standing the contribution of people vs. place through variance decompositions. Oxford Bulletin of Economics And Statistics, 76(5):745–763. Glewwe, P. and Muralidharan, K. (2016). ‘Improving education outcomes in developing countries: Evidence, knowledge gaps, and policy implications’, in Hanushek, E. A., Machin, S. and Woessmann, L. (eds.), Handbook of Economics of Education, Vol. 5, Amsterdam: North-Holland. Guimarães, P. and Portugal, P. (2010). A simple feasible procedure to fit models with high-dimensional fixed effects. Stata Journal, 10(4):628. Gurka, M.J., Edwards, L.J. and Muller, K.E. (2011). Avoiding bias in mixed model inference for fixed effects. Statistics in Medicine, 30(22):2696–2707. Hanushek, E. and Rivkin, S. (2006). Teacher quality. Handbook of the Economics of Education, 2:1051–1078. Hanushek, E.A. and Rivkin, S.G. (2012). The distribution of teacher quality and implications for policy. Annual Review of Economics, 4(1):131–157. Hanushek, E. and Wößmann, L. (2006). Does educational tracking affect performance and inequality? Differences-in-differences evidence across countries. The Economic Journal, 116(510):C63–C76.   27    Hanushek, E. and Yilmaz, K. (2007). The complementarity of Tiebout and Alonso. Journal of Housing Economics, 16(2):243–261. Jackson, C.K. (2009). Student demographics, teacher sorting, and teacher quality: Evidence from the end of school desegregation. Journal of Labor Economics, 27(2):213–256. —— (2013). Match quality, worker productivity, and worker mobility: Direct evidence from teachers. Review of Economics and Statistics, 95(4):1096–1116. Jones, S., Schipper, Y., Ruto, S. and Rajani, R. (2014). Can your child read and count? Measuring learning outcomes in East Africa. Journal of African Economies, 23(5):643– 672. Koedel, C., Mihaly, K. and Rockoff, J.E. (2015). Value-added modeling: A review. Eco- nomics of Education Review, 47:180–195. Lindahl, L. (2011). A comparison of family and neighborhood effects on grades, test scores, educational attainment and income – evidence from Sweden. The Journal of Economic Inequality, 9(2):207–226. Mazumder, B. (2008). Sibling similarities and economic inequality in the US. Journal of Population Economics, 21(3):685–701. —— (2011). Family and community influences on health and socioeconomic status: sibling correlations over the life course. The B.E. Journal of Economic Analysis & Policy, 11(3):Article 1. Mittag, N. (2012). New methods to estimate models with large sets of fixed effects with an application to matched employer-employee data from germany. FDZ Methodenreport 201201, Institute for Employment Research, Nuremberg, Germany. Nicoletti, C. and Rabe, B. (2013). Inequality in pupils’ test scores: How much do family, sibling type and neighbourhood matter? Economica, 80(318):197–218. Pop-Eleches, C. and Urquiola, M. (2013). Going to a better school: Effects and behavioral responses. The American Economic Review, 103(4):1289–1324. Pritchett, L. and Viarengo, M. (2015). Does public sector control reduce variance in school quality? Education Economics, 23(5):557–576. Raaum, O., Salvanes, K.G. and Sørensen, E.Ø. (2006). The neighbourhood is not what it used to be. The Economic Journal, 116(508):200–222. Rivkin, S.G., Hanushek, E.A. and Kain, J.F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2):417–458. Roemer, J.E. (1996). Theories of distributive justice. Harvard University Press: Cambridge MA. —— (2002). Equality of opportunity: A progress report. Social Choice and Welfare, 19(2):455–471. Sandefur, J. (2018). Internationally comparable mathematics scores for fourteen African   28    countries. Economics of Education Review, 62: 267-286. Sass, T.R., Hannaway, J., Xu, Z., Figlio, D.N. and Feng, L. (2012). Value added of teachers in high-poverty schools and lower poverty schools. Journal of Urban Economics, 72(2):104–122. Schütz, G., Ursprung, H.W. and Wößmann, L. (2008). Education policy and equality of opportunity. Kyklos, 61(2):279–308. Sen, A. (2002). Why health equity? Journal of Health Economics, 11:659–666. Sen, A. (1985). A sociological approach to the measurement of poverty: a reply to Professor Peter Townsend. Oxford Economic Papers, 37(4): 669-676. Smyth, G.K. (1996). Partitioned algorithms for maximum likelihood and other non-linear estimation. Statistics and Computing, 6(3):201–216. Solon, G., Page, M.E. and Duncan, G.J. (2000). Correlations between neighboring children in their subsequent educational attainment. Review of Economics and Statistics, 82(3):383– 392. Stanek, E.J., Well, A. and Ockene, I. (1999). Why not routinely use best linear unbiased predictors (BLUPs) as estimates of cholesterol, per cent fat from Kcal and physical activity? Statistics in Medicine, 18(21):2943–2959. Uwezo (2012). Are our children learning? literacy and numeracy across East Africa. Technical report, Uwezo. URL www.uwezo.net/wp-content/uploads/2012/ 08/RO_2012_UwezoEastAfricaREport.pdf. Walsh, P. (2009). Effects of school choice on the margin: The cream is already skimmed. Economics of Education Review, 28(2):227–236. Van de Werfhorst, H.G. and Mijs, J.J. (2010). Achievement inequality and the institutional structure of educational systems: A comparative perspective. Annual Review of Sociology, 36:407–428.     29    Figure 1: Relative variance shares, by estimator π=0 π = 0.5 π=1 KE KE 8 15 14 KE - 28 TZ 34 6 TZ 9 18 13 TZ - 31 UG 4 UG 4 16 16 UG - 30 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 % % %               Individual (all) Sorting Note: bars indicate relative shares reported in Table 4 for different models/estimators; component ‘individual (all)’ aggregates the three components including the observed individual characteristics; ‘sorting’ is the household-school covariance term; KE is Kenya; TZ is Tanzania (mainland); and UG is Uganda. Source: own calculations.   30    Figure 2: Relative variance shares, by values of π (a) KE (b) TZ (c) UG 35 35 35 30 30 30 25 25 25 % contribution % contribution % contribution 20 20 20 15 15 15 10 10 10 5 5 5 0 0 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 π π π                     Household School Sorting Note: ‘sorting’ is the household-school covariance term; KE is Kenya; TZ is Tanzania (mainland); and UG is Uganda. Source: own calculations.   31    Figure 3: Cumulative distribution functions of relative variance shares, by country .4 .6 .8 1 .2 .4 .6 .8 1 Cumulative probability Cumulative probability 0 0 5 10 15 20 25 10 15 20 25 30     Individual (all)         Household     Cumulative probability Cumulative probability .2 .4 .6 .8 .2 0 0 5 10 15 20 25 -5 0 5 10 15     School       Sorting (house.-school)   KE TZ UG Note: sub-figures show cumulative distribution functions (by country) for the relative variance shares calculated at the district-level; component ‘individual (all)’ aggregates the three components including the observed individual characteristics; KE is Kenya; TZ is Tanzania (mainland); and UG is Uganda. Source: own calculations.   32    Table 1: Alternative test score data-generating processes Model Score level Score variance Description 1 Restricted ℎ Independent linear households & schools 2 Unrestricted ℎ 2Σ Correlated household linear & school factors 3 Household 1 ℎ 1 Households dominate upper bound school effects 4 School upper 1 1 Schools dominate bound household effects   33    Table 2: Description of synthetic test scores, by country & region (1) Raw means (2) Age std.ized (3) Normalized   Country & Region Mean St. dev. Mean St. dev. Mean St. dev.   KE Central 4.99 (1.52) 0.30 (0.67) 0.37 (0.73)     Coast 4.25 (1.98) -0.13 (0.93) -0.10 (1.02)     Eastern 4.51 (1.84) -0.00 (0.87) 0.04 (0.95)     North Eastern 3.67 (2.12) -0.47 (1.23) -0.48 (1.35)     Nyanza 4.39 (1.90) -0.05 (0.83) -0.02 (0.91)     Rift Valley 4.36 (1.95) -0.06 (0.97) -0.03 (1.06)     Western 4.23 (1.98) -0.16 (0.89) -0.13 (0.98)     All 4.42 (1.91) -0.04 (0.91) -0.00 (1.00)   TZ Arusha 3.69 (1.78) 0.15 (0.84) 0.20 (0.97)     Dar Es Salaam 3.98 (1.64) 0.29 (0.76) 0.36 (0.88)     Iringa 3.41 (1.83) -0.00 (0.85) 0.03 (0.99)     Kagera 3.24 (1.86) -0.10 (0.87) -0.09 (1.00)     Kigoma 2.92 (1.90) -0.25 (0.88) -0.26 (1.02)     Ruvuma 3.36 (1.79) -0.03 (0.81) -0.01 (0.94)     Singida 3.47 (1.83) 0.03 (0.83) 0.07 (0.96)     Tabora 2.94 (1.95) -0.25 (0.91) -0.26 (1.06)     Tanga 3.43 (1.83) 0.01 (0.83) 0.04 (0.96)     All 3.37 (1.86) -0.03 (0.86) -0.00 (1.00)   UG Central 3.43 (1.87) 0.27 (0.89) 0.31 (1.00)     Eastern 2.78 (1.89) -0.15 (0.84) -0.16 (0.94)     Northern 2.57 (1.90) -0.28 (0.88) -0.30 (0.98)     Western 3.13 (1.91) 0.06 (0.88) 0.08 (0.98)     All 3.00 (1.92) -0.01 (0.90) -0.00 (1.00)   Note: synthetic test scores combine achievement in literacy and numeracy, as described in the text; ‘age std.’ scores are standardized within each age group (for each survey round and country) for the reference group defined as all children who are currently enrolled or have completed primary school; ‘normalized’ centers the age-standardized scores to have a mean of zero and standard deviation of one for the reference group in each country; removes mean level differences between districts; survey rounds are pooled; KE is Kenya; TZ is Tanzania (mainland); UG is Uganda; regions in Tanzania and Kenya are aggregated for clarity of presentation (for details see Appendix C). Source: own calculations.   34    Table 3: Descriptive sample statistics, by country & region Index count Country & Region i j k Age Female SES Attend Grade KE Central 37,870 15,194 6,353 11.2 50.6 62.5 95.5 5.1 Coast 49,921 17,013 5,084 11.1 49.9 -17.8 87.6 4.0 Eastern 84,133 30,596 10,160 11.2 50.0 -23.6 93.5 4.6 North Eastern 54,132 16,605 3,501 11.0 43.4 -61.7 80.7 3.6 Nyanza 76,471 27,014 8,410 11.2 49.5 -18.4 92.3 4.5 Rift Valley 169,672 58,737 17,095 11.1 49.0 -13.3 90.3 4.3 Western 83,720 28,392 8,296 11.2 50.0 -10.8 92.9 4.4 All 555,919 193,551 58,899 11.1 49.4 -6.6 91.5 4.5 TZ Arusha 52,883 19,979 6,592 11.6 48.8 10.8 88.5 4.4 Dar Es Salaam 23,025 8,948 3,392 11.7 51.1 68.2 91.0 4.4 Iringa 48,617 19,414 6,981 11.6 49.9 -7.8 85.8 4.0 Kagera 50,476 17,815 5,921 11.6 49.6 -14.1 82.6 3.7 Kigoma 32,462 11,801 3,728 11.6 49.2 -28.6 80.8 3.6 Ruvuma 29,867 12,233 4,920 11.7 49.5 -16.0 86.6 4.1 Singida 32,269 12,379 4,168 11.6 49.7 -7.7 85.8 4.2 Tabora 49,936 17,289 5,311 11.5 49.1 -19.6 78.5 3.5 Tanga 39,928 15,071 5,010 11.6 48.6 -8.9 86.3 3.9 All 359,463 134,929 46,023 11.6 49.5 -3.1 84.8 4.0 UG Central 64,077 20,650 6,796 11.0 49.8 51.9 92.5 3.8 Eastern 120,142 36,183 9,762 11.1 49.2 -21.9 94.7 3.7 Northern 102,723 32,629 8,588 11.2 47.9 -38.5 86.3 3.2 Western 75,186 24,955 7,593 11.1 50.0 -12.8 91.7 3.5 All 362,128 114,417 32,739 11.1 49.3 -2.4 91.6 3.6 Note: regions in Tanzania and Kenya are aggregated for clarity of presentation (see Appendix C); KE is Kenya; TZ is Tanzania (mainland); UG is Uganda; i, j, k refer to the number of unique observations for the individual, household and school-grade effects respectively; remaining columns are regional means (age, highest grade) or proportions; survey rounds are pooled. Source: own calculations.   35    Table 4: Unconditional absolute and relative variance contributions, alternative choices of Absolute Relative π Kenya Tanzania Uganda Kenya Tanzania Uganda Household 0 0.57 0.64 0.57 32.42 40.61 32.14 0.5 0.41 0.45 0.42 17.11 20.7 17.63 1 0.36 0.39 0.37 12.99 15.04 13.53 School 0 0.28 0.24 0.27 8.11 5.92 7.43 0.5 0.38 0.36 0.36 14.37 13.24 13.25 1 0.57 0.59 0.54 32.08 34.38 29.64 Sorting 0 -0.03 -0.11 0.04 -0.09 -1.14 0.18 0.5 0.27 0.29 0.27 7.46 8.38 7.3 1 -0.03 -0.08 0.05 -0.11 -0.71 0.25 Residual . 0.77 0.74 0.77 58.55 54.53 59.56 Total . 1.00 1.00 1.00 100.00 100.00 100.00 Note: following the logic of equations (4) and (5), but excluding the individual controls, the table sets out the absolute and relative variance contributions attributable to each component of the test score, reported in standard deviation units and percent respectively; different initializations of the partitioned iterative algorithm are indicated by column π; standard errors not shown, but available on request. Source: own calculations.   36    Table 5: Conditional absolute and relative variance contributions, alternative choices of Absolute Relative π Kenya Tanzania Uganda Kenya Tanzania Uganda Individual 0 0.29 0.29 0.19 8.14 8.59 3.58 0.5 0.28 0.29 0.19 7.91 8.65 3.51 1 0.28 0.30 0.19 8.09 8.90 3.62 Household 0 0.52 0.59 0.53 26.63 34.31 28.03 0.5 0.38 0.42 0.39 14.58 17.61 15.57 1 0.34 0.36 0.35 11.26 12.80 12.22 School 0 0.29 0.25 0.31 8.44 6.23 9.49 0.5 0.38 0.36 0.40 14.14 12.91 15.77 1 0.53 0.55 0.55 28.45 30.74 30.39 Sorting 0 0.04 -0.08 0.10 0.18 -0.72 0.96 0.5 0.24 0.26 0.25 5.62 6.78 6.38 1 -0.06 -0.09 -0.04 -0.31 -0.84 -0.13 Residual . 0.75 0.72 0.75 55.62 51.34 56.87 Total . 1.00 1.00 1.00 100.00 100.00 100.00 Note: following equations (4) and (5), the table sets out the absolute and relative variance contributions attributable to each component of the test score, reported in standard deviation units and percent respectively; different initializations of the partitioned iterative algorithm are indicated by column π; ‘individual’ component aggregates all observed individual effect components; full details found in Appendix Tables B4-B6. Source: own calculations.     37    Table 6: Absolute variance contributions (in s.d. units), by country & region UBH UBS 0.5 House. School Indiv. House. School Sorting Resid. Score Correl. 2Σ KE Central 0.38 0.34 0.15 0.30 0.26 0.11 0.59 0.73 0.08 Coast 0.54 0.50 0.31 0.39 0.38 0.24 0.77 1.02 0.19 Eastern 0.50 0.49 0.23 0.37 0.37 0.21 0.73 0.95 0.17 North Eastern 0.72 0.71 0.44 0.49 0.46 0.35 1.03 1.35 0.27 Nyanza 0.46 0.46 0.22 0.35 0.37 0.16 0.70 0.91 0.10 Rift Valley 0.57 0.54 0.33 0.40 0.39 0.27 0.80 1.06 0.23 Western 0.51 0.49 0.20 0.40 0.39 0.18 0.76 0.98 0.11 All 0.53 0.51 0.28 0.38 0.38 0.24 0.76 1.00 0.20 TZ Arusha 0.59 0.54 0.22 0.42 0.35 0.27 0.72 0.97 0.25 Dar Es Salaam 0.51 0.44 0.24 0.37 0.31 0.20 0.67 0.88 0.19 Iringa 0.58 0.54 0.29 0.41 0.36 0.26 0.73 0.99 0.23 Kagera 0.58 0.50 0.31 0.44 0.35 0.24 0.74 1.00 0.19 Kigoma 0.60 0.53 0.32 0.45 0.37 0.25 0.74 1.02 0.19 Ruvuma 0.54 0.50 0.23 0.40 0.35 0.22 0.71 0.94 0.17 Singida 0.54 0.50 0.27 0.40 0.35 0.23 0.73 0.96 0.19 Tabora 0.61 0.54 0.34 0.44 0.37 0.26 0.77 1.06 0.21 Tanga 0.55 0.52 0.26 0.40 0.36 0.24 0.71 0.96 0.19 All 0.58 0.53 0.29 0.42 0.36 0.26 0.74 1.00 0.22 UG Central 0.55 0.50 0.19 0.39 0.38 0.24 0.77 1.00 0.18 Eastern 0.51 0.49 0.12 0.38 0.37 0.22 0.73 0.94 0.17 Northern 0.51 0.53 0.17 0.37 0.40 0.22 0.76 0.98 0.16 Western 0.56 0.53 0.15 0.41 0.40 0.23 0.75 0.98 0.17 All 0.56 0.53 0.19 0.39 0.40 0.25 0.77 1.00 0.20 Note: top-level column indicates the model, where UBH is the upper bound household model, UBS is the upper bound school model and π = 0.5 is the (preferred) PIA estimator; is the aggregate of all observed individual effect components; all other components are as before; values are reported in standard deviation units; regions in Tanzania and Kenya are aggregated for clarity of presentation (see Appendix C); KE is Kenya; TZ is Tanzania (mainland); UG is Uganda. Source: own calculations.   38    Table 7: Relative variance shares, by country & region UBH UBS 0.5 House. School Indiv. House. School Sorting Resid. Score 2Σ KE Central 26.6 21.9 4.5 16.4 12.5 2.3 64.4 100.0 Coast 27.9 24.4 9.1 14.8 13.9 5.4 56.7 100.0 Eastern 28.2 26.6 6.1 15.0 14.9 5.0 58.9 100.0 North Eastern 28.8 27.5 10.5 13.4 11.4 6.7 58.0 100.0 Nyanza 25.7 25.8 5.7 15.0 16.7 3.3 59.4 100.0 Rift Valley 28.7 26.2 9.7 14.1 13.4 6.4 56.4 100.0 Western 27.4 25.1 4.2 16.4 15.6 3.6 60.2 100.0 All 28.2 25.8 7.9 14.6 14.1 5.6 57.7 100.0 TZ Arusha 37.4 30.8 5.1 18.8 13.2 8.0 55.0 100.0 Dar Es Salaam 32.9 25.3 7.1 17.2 12.1 5.4 58.1 100.0 Iringa 34.4 29.5 8.8 17.2 13.4 6.9 53.8 100.0 Kagera 33.5 25.0 9.5 18.9 11.9 5.8 54.0 100.0 Kigoma 34.1 27.1 9.9 19.2 12.9 5.9 52.0 100.0 Ruvuma 33.5 28.7 5.9 17.9 14.0 5.4 56.9 100.0 Singida 31.8 26.8 7.9 16.9 12.9 5.6 56.7 100.0 Tabora 32.8 26.2 10.4 17.5 12.1 6.2 53.8 100.0 Tanga 33.4 29.5 7.6 17.7 14.5 6.2 54.1 100.0 All 34.1 27.9 8.7 17.6 12.9 6.8 54.0 100.0 UG Central 30.5 25.4 3.8 15.7 14.9 5.6 60.0 100.0 Eastern 30.1 27.1 1.6 16.4 15.5 5.4 61.2 100.0 Northern 27.6 28.9 3.1 14.6 16.4 5.0 60.9 100.0 Western 32.4 29.7 2.5 17.3 16.4 5.7 58.0 100.0 All 31.0 28.1 3.5 15.6 15.8 6.4 58.8 100.0 Note: top-level column indicates the model, where UBH is the upper bound household model, UBS is the upper- bound school model and π = 0.5 is the (preferred) PIA estimator; is the aggregate of all observed individual effect components; all other components are as before; regions in Tanzania and Kenya are aggregated for clarity of presentation (see Appendix C); KE is Kenya; TZ is Tanzania (mainland); UG is Uganda. Source: own calculations.   39    Table 8: Analysis of systematic patterns in variance components, by district (I) Absolute shares (II) Relative shares Indiv. House. School Sorting Resid. Indiv. House. School Sorting Resid. 2Σ 2Σ ∗∗ ∗∗ ∗∗ Female -0.00 -0.13 -0.33 -0.32 -0.49 1.31 10.29 -9.45 -6.11 3.96   (0.21) (0.16) (0.14) (0.14) (0.22) (5.63) (8.95) (8.97) (4.70) (13.16) ∗∗∗ ∗ Never enrolled 0.12 0.19 -0.30 -0.20 0.36 -4.84 14.15 -26.92 -10.00 27.61   (0.28) (0.19) (0.19) (0.18) (0.25) (10.75) (9.16) (9.97) (6.01) (17.09) ∗∗∗ ∗∗∗ Current attending -0.78 -0.06 0.07 -0.12 -0.08 -33.32 8.35 12.96 -2.18 14.19   (0.20) (0.14) (0.15) (0.16) (0.17) (8.29) (8.01) (8.64) (5.85) (14.73) ∗∗∗ ∗∗∗ ∗∗∗ Highest grade -0.01 -0.00 -0.06 -0.02 0.01 -0.77 0.91 -4.21 -0.49 4.57   (0.02) (0.01) (0.01) (0.01) (0.02) (0.48) (0.68) (0.70) (0.37) (0.99) ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗ Attends private sch. 0.30 0.16 0.02 0.07 0.18 6.91 4.58 -7.15 0.33 -4.68   (0.05) (0.04) (0.04) (0.03) (0.06) (1.44) (2.24) (2.30) (1.04) (2.66) ∗∗ ∗∗∗ ∗∗∗ Private × grade -0.20 -0.04 0.10 -0.04 0.06 -9.69 -3.94 8.50 -1.09 6.22   (0.08) (0.07) (0.06) (0.05) (0.11) (2.36) (4.09) (2.71) (1.63) (4.43) ∗ ∗ SES index -0.00 -0.01 -0.02 -0.01 -0.01 0.86 -0.21 -0.96 -0.52 0.83   (0.01) (0.01) (0.01) (0.01) (0.02) (0.45) (0.75) (0.58) (0.34) (0.95) ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Mother no schooling 0.15 0.07 0.07 0.15 0.25 2.57 -3.37 -3.41 2.99 1.22   (0.02) (0.02) (0.02) (0.02) (0.03) (0.91) (1.08) (1.31) (0.72) (1.78) ∗∗∗ ∗∗∗ ∗ ∗∗ Test score (percentile) 0.07 -0.24 -0.04 -0.02 -0.31 3.29 -7.88 8.42 2.49 -6.32   (0.11) (0.07) (0.07) (0.07) (0.11) (3.73) (4.60) (3.98) (2.43) (6.91) ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Tanzania 0.02 0.04 -0.01 0.03 -0.02 1.48 2.91 -1.52 1.28 -4.15   (0.01) (0.00) (0.00) (0.01) (0.01) (0.24) (0.28) (0.31) (0.20) (0.45) ∗∗∗ ∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Uganda -0.09 0.02 0.02 0.02 0.01 -3.38 0.91 1.13 0.84 0.51   (0.01) (0.01) (0.01) (0.00) (0.01) (0.18) (0.32) (0.34) (0.17) (0.42) ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ ∗∗∗ Constant 0.23 0.37 0.35 0.18 0.72 6.03 15.67 14.25 3.65 60.40   (0.00) (0.00) (0.00) (0.00) (0.01) (0.12) (0.19) (0.23) (0.12) (0.27) Obs. 434 434 434 434 434 434 434 434 434 434   40    (I) Absolute shares (II) Relative shares Indiv. House. School Sorting Resid. Indiv. House. School Sorting Resid. 2Σ 2Σ R2 (adj.) 0.79 0.52 0.50 0.53 0.69 0.79 0.31 0.46 0.33 0.41 RMSE 0.05 0.04 0.04 0.04 0.05 1.56 2.10 2.22 1.35 3.05 Note: the table sets out OLS regression results for the conditional correlates of the district-level variance component estimates, based on the preferred estimator π = 0.5; dependent variable is indicated in the columns, where the absolute share is in standard deviation units; robust standard errors are reported in parentheses. Source: own calculations.   41    SUPPLEMENTARY MATERIAL A Technical details on empirical methods As noted in the main text, two primary challenges arise when estimating high-dimensional two-way fixed effects. First, to deal with bias from measurement error, empirical Bayes shrinkage approaches are often employed. This involves adjusting each estimated effect toward a common prior, where the adjustment factor is proportional to the estimated noise-to-signal ratio in the original estimates. Following Stanek et al. (1999), and to ensure consistency across the various methods deployed, we apply the approach typically used to adjust predictions of the random effects. Concretely, for a given estimated fixed effect (e.g., ℎ ) we shrink it toward a global mean as follows: 2 ℎ ℎ ℎ ℎ ℎ (A1) 2 ℎ 2 / 1 where 1 is the effective degrees of freedom available to estimate each of the effects; is the variance of the estimated effect; is the estimated residual variance; and ℎ is the population mean, typically zero under conventional normalization restrictions. The second challenge is the (mechanical) negative covariance bias of the two estimated fixed effects. While this may be partially mitigated by the aforementioned empirical Bayes shrinkage, since this procedure simply modifies both sets of effects by a (varying) scalar bound between zero and one, it should have little effect on their correlation. A closer look at the nature of this bias indicates it may be driven (at least in part) by how the fixed effects are initialized under the iterative algorithm. While the latent fixed effects are adjusted iteratively based on model residuals, the assumed starting values for the two effects fundamentally determine their final estimated levels and variance shares. Referring to the unrestricted linear model (without additional controls), the regression specification (estimated via simple OLS) used in the first step of the iterative algorithm is just: ℎ 1 ℎ 0 1 0 1 (A2) where ℎ , are parameters to be estimated; ℎ, are initial estimates for the fixed effects (see below); and the numeric indexes in the subscripts represent the iteration number. In the second step, the model to be estimated is updated using the residual from equation (A2), as: ℎ ∑| ̂ ̂ ∑| ̂ (A3) from where the algorithm iterates until some convergence criterion is reached, such as when: | ∑ ̂ ∑ ̂ | . From this, the proposed starting values for the two fixed effects that enter equation (A2) appear fundamental. Typically, these are approximated using the group- specific means of the residuals taken from a (zero step) naïve model. Continuing with our simple case, without additional covariates, the general expression for these is: 1 ℎ 0 ∑| ∑| 1 1 0 ∑| ∑| where ∈ 0,1 serves as an initialization scalar that apportions the variation in across the school and household effects. For instance, if 0 then the observed variation in is allocated primarily   42    to the initial estimate for the household fixed effect: ℎ 0 1/ ∑| ; in turn, the initial estimate for the school fixed effect captures only any residual variation in , averaged by index . The same initialization is also implicit when one of the fixed effects is initially 'swept out' of the regression, leaving the iterative adjustment to focus on the remaining effect (Guimaraes and Portugal 2010). This clarifies the aforementioned concern that any over-estimation (upward bias) of the initial values for one factor will be mechanically reflected by an under-estimation in the other (and vice versa). Indeed, since the assumed starting values are derived directly from the dependent variable (or residuals thereof), they always contain relevant information and effectively become locked-in as the algorithm proceeds – i.e., regression estimates for ℎ 1 , 1 derived from equation (A3) should always be close to one. These mechanics demonstrate that the initialization of the fixed effects embeds specific presumptions about how variation in the outcome is to be allocated across the fixed effects. Our working hypothesis is that this translates into specific assumptions about the form of the between- factor covariance. Specifically, as extreme choices for the initial values (e.g., 0, 1) treat the second effect as a residual term, the implicit assumption is that the two factors are orthogonal. Thus, these corner choices are expected to correspond to upper bound models in which one factor is dominant (Rows 3 and 4 of Table 1). A midpoint or agnostic choice ( 0.5), however, is likely to behave conversely to the extreme choices. By giving equal weight to both effects in the initialization they are no longer assumed a priori to be orthogonal, which would correspond to a case where sorting (between-factor covariance) is not ruled out from the outset.     43    B Additional figures and tables Figure B1: Relative unconditional variance shares, by estimator π=0 π = 0.5 π=1 KE -0 8 KE 14 KE - TZ -1 41 6 TZ 13 TZ - UG UG 18 13 UG 14 30 0 20 40 0 20 40 0 20 40 10 30 50 10 30 50 10 30 50 % % % Sorting Note: bars indicate relative variance contributions based on the same variance decomposition reported in Tables B4-B6 but without individual-specific controls; ‘sorting’ is the household- school covariance term; KE is Kenya; TZ is Tanzania (mainland); and UG is Uganda. Source: own calculations.   44    Figure B2: Relative variance shares, by estimator and schooling status pi = 0 pi = 0.5 pi = 1 Never 0 35 2 0 Never 0 17 8 10 Never 0 11 27 4 KE KE KE Now -1 0 29 11 Now 0 16 17 5 Now 0 13 -2 33 Never -2 0 47 2 Never 0 26 6 10 Never 0 17 21 7 TZ TZ TZ Now -1 -0 38 8 Now -0 19 16 7 Now -0 14 -3 38 Never 3 32 2 0 Never 3 21 46 Never 3 15 12 5 UG UG UG Now 1 29 10 1 Now 1 16 17 7 Now 1 13 -1 33 0 20 40 0 20 40 0 20 40 10 30 50 10 30 50 10 30 50 % % % Individual (all) Household School Sorting Note: bars indicate relative variance contributions based on the same variance decomposition reported in Tables B4-B6 for all school-age children in the household either out of school (‘never’) or attending school (‘now’); ‘sorting’ is the household-school covariance term; KE is Kenya; TZ is Tanzania (mainland); and UG is Uganda. Source: own calculations.   45    Table B1: Regression results for alternative models/estimators, Kenya Naïve reghdfe π =0 π = 0.5 π =1 UBH UBS (1) (2) (3) (4) (5) (6) (7) Child is female 0.10 0.06 0.06 0.06 0.06 0.08 0.07 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Oldest sib 0.06 -0.15 -0.15 -0.15 -0.15 0.04 -0.17 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Never enrolled -0.52 -0.15 -0.24 -0.22 -0.20 -0.24 -0.19 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) Currently enrolled 0.69 0.48 0.47 0.47 0.48 0.49 0.59 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) Attends private 0.35 0.13 0.14 0.15 0.16 0.09 0.22 (0.01) (0.01) (0.00) (0.00) (0.00) (0.01) (0.01) Never × -0.37 -0.21 -0.10 (0.02) (0.02) (0.01) Private × -0.19 -0.16 -0.11 (0.01) (0.01) (0.01) Obs. 555,919 555,919 555,919 555,919 555,919 555,919 555,919 R 2 0.13 0.72 0.72 0.72 0.72 0.64 0.48 Note: columns refer to different estimators/models; UBH is the household upper bound (excludes school effects); ‘Naïve’ is simple OLS excluding fixed effects; reghdfe reports results based on the (improved, accelerated) partitioned iterative algorithm due to Correia (2017) in which household effects enter first (using the Stata command of the same name); columns π = (0, 0.5, 1) are taken from the partitioned iterative algorithm set out in the text; UBH is the household upper bound model and UBS is the school upper bound models, in which household effects are proxied by observed characteristics (not shown); all reported coefficients are significantly different from zero; cluster robust standard errors reported in parentheses. Source: own calculations.   46    Table B2: Regression results for alternative models/estimators, Tanzania Naïve reghdfe π =0 π = 0.5 π =1 UBH UBS (1) (2) (3) (4) (5) (6) (7) Child is female 0.08 0.04 0.05 0.04 0.05 0.06 0.05 (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00) Oldest sib 0.05 -0.13 -0.12 -0.13 -0.12 0.01 -0.14 (0.01) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00) Never enrolled -0.29 0.02 -0.02 -0.06 -0.07 -0.06 -0.07 (0.02) (0.02) (0.01) (0.01) (0.01) (0.02) (0.02) Currently enrolled 0.74 0.58 0.58 0.58 0.58 0.56 0.67 (0.02) (0.02) (0.01) (0.01) (0.01) (0.02) (0.01) Attends private 0.28 0.10 0.10 0.10 0.11 0.11 0.09 (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Never × -0.31 -0.35 -0.25 (0.02) (0.02) (0.01) Private × 0.02 -0.02 -0.02 (0.02) (0.02) (0.01) Obs. 359,463 359,463 359,463 359,463 359,463 359,463 359,463 R2 0.12 0.77 0.77 0.77 0.77 0.69 0.51 Note: columns refer to different estimators/models; UBH is the household upper bound (excludes school effects); ‘Naïve’ is simple OLS excluding fixed effects; reghdfe reports results based on the (improved, accelerated) partitioned iterative algorithm due to Correia (2017) in which household effects enter first (using the Stata command of the same name); columns π = (0, 0.5, 1) are taken from the partitioned iterative algorithm set out in the text; UBH is the household upper bound model and UBS is the school upper bound models, in which household effects are proxied by observed characteristics (not shown); all reported coefficients are significantly different from zero; cluster robust standard errors reported in parentheses. Source: own calculations.   47    Table B3: Regression results for alternative models/estimators, Uganda Naïve reghdfe π =0 π = 0.5 π =1 UBH UBS (1) (2) (3) (4) (5) (6) (7) Child is female 0.06 0.02 0.03 0.02 0.02 0.04 0.04 (0.01) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Oldest sib 0.09 -0.22 -0.21 -0.21 -0.21 0.05 -0.23 (0.01) (0.01) (0.00) (0.00) (0.00) (0.01) (0.01) Never enrolled -0.16 0.07 -0.02 -0.06 -0.04 -0.01 0.06 (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) Currently enrolled 0.48 0.35 0.34 0.34 0.35 0.39 0.43 (0.02) (0.02) (0.01) (0.01) (0.01) (0.02) (0.01) Attends private 0.43 0.18 0.19 0.19 0.20 0.12 0.25 (0.01) (0.01) (0.00) (0.00) (0.00) (0.01) (0.01) Never × -0.48 -0.55 -0.43 (0.04) (0.03) (0.02) Private × -0.03 -0.02 -0.01 (0.01) (0.01) (0.01) Obs. 362,128 362,128 362,128 362,128 362,128 362,128 362,128 R2 0.08 0.71 0.71 0.71 0.71 0.62 0.45 Note: columns refer to different estimators/models; UBH is the household upper bound (excludes school effects); ‘Naïve’ is simple OLS excluding fixed effects; reghdfe reports results based on the (improved, accelerated) partitioned iterative algorithm due to Correia (2017) in which household effects enter first (using the Stata command of the same name); columns π = (0, 0.5, 1) are taken from the partitioned iterative algorithm set out in the text; UBH is the household upper bound model and UBS is the school upper bound models, in which household effects are proxied by observed characteristics (not shown); all reported coefficients are significantly different from zero; cluster robust standard errors reported in parentheses. Source: own calculations.   48    Table B4: Full variance decomposition for KE Absolute contributions (in s.d. units) Relative shares (in %) (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) π=0 π = 0.5 π=1 UBH UBS MLM π=0 π = 0.5 π=1 UBH UBS MLM   0.21 0.21 0.20 0.21 0.24 0.19 4.39 4.27 4.16 4.38 5.73 3.63 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.06) (0.06) (0.05) (0.06) (0.06) (0.05)   0.52 0.38 0.34 0.53 0.29 0.37 26.63 14.58 11.26 28.22 8.54 13.79 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.19) (0.14) (0.13) (0.20) (0.11) (0.14)   0.29 0.38 0.53 0.14 0.51 0.42 8.44 14.14 28.45 2.00 25.75 17.81 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.18) (0.23) (0.33) (0.09) (0.31) (0.26) 2Σ   0.04 0.24 -0.06 0.12 0.16 0.25 0.18 5.62 -0.31 1.39 2.48 6.41 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.02) (0.11) (0.03) (0.06) (0.07) (0.12) 2Σ   0.19 0.15 0.12 0.21 0.11 0.12 3.44 2.28 1.46 4.52 1.19 1.48 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.06) (0.05) (0.04) (0.07) (0.03) (0.04) 2Σ   0.06 0.12 0.16 0.07 0.17 0.14 0.31 1.36 2.47 0.50 2.90 2.02 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.02) (0.04) (0.06) (0.03) (0.07) (0.05)   0.75 0.76 0.72 0.77 0.73 0.74 56.61 57.75 52.51 58.98 53.40 54.86 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.20) (0.20) (0.19) (0.21) (0.20) (0.20)   1.00 1.00 1.00 1.00 1.00 1.00 100.00 100.00 100.00 100.00 100.00 100.00 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) . . . . . .   0.01 0.20 -0.01 0.09 0.08 0.20 . . . . . . (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) . . . . . . Note: following equations (3b) and (3c), the table sets out the absolute and relative variance shares attributable to each component of the test score; absolute shares are in standard deviation units; different models/estimators are indicated in the columns – (a) to (c) refer to results from a partitioned iterative algorithm for different choices of initialization scalar π, (d) is the household upper bound, (e) is the school upper bound, and (f) is a mixed linear model; ρhs is the estimated correlation coefficient between household and school effects; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.         49    Table B5: Full variance decomposition for TZ Absolute contributions (in s.d. units) Relative shares (in %) (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) π=0 π = 0.5 π=1 UBH UBS MLM π=0 π = 0.5 π=1 UBH UBS MLM   0.23 0.23 0.23 0.22 0.27 0.19 5.06 5.38 5.51 4.84 7.11 3.63 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.08) (0.08) (0.08) (0.07) (0.09) (0.06)   0.59 0.42 0.36 0.58 0.30 0.43 34.31 17.61 12.80 34.11 9.25 18.55 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.26) (0.19) (0.16) (0.26) (0.14) (0.19)   0.25 0.36 0.55 0.12 0.53 0.42 6.23 12.91 30.74 1.53 27.90 17.24 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.17) (0.25) (0.39) (0.09) (0.37) (0.29) 2Σ   -0.08 0.26 -0.09 0.10 0.16 0.28 -0.72 6.78 -0.84 0.92 2.63 7.74 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.05) (0.14) (0.05) (0.05) (0.09) (0.15) 2Σ   0.18 0.16 0.13 0.21 0.10 0.11 3.16 2.41 1.71 4.29 1.03 1.15 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.07) (0.06) (0.05) (0.08) (0.04) (0.04) 2Σ   0.06 0.09 0.13 0.04 0.15 0.16 0.37 0.87 1.67 0.16 2.22 2.53 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.03) (0.04) (0.06) (0.02) (0.07) (0.07)   0.72 0.74 0.70 0.74 0.71 0.70 51.59 54.04 48.39 54.15 49.85 49.17 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.24) (0.25) (0.23) (0.25) (0.24) (0.23)   1.00 1.00 1.00 1.00 1.00 1.00 100.00 100.00 100.00 100.00 100.00 100.00 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) . . . . . .   -0.02 0.22 -0.02 0.06 0.08 0.22 . . . . . . (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) . . . . . . Note: following equations (3b) and (3c), the table sets out the absolute and relative variance shares attributable to each component of the test score; absolute shares are in standard deviation units; different models/estimators are indicated in the columns – (a) to (c) refer to results from a partitioned iterative algorithm for different choices of initialization scalar π, (d) is the household upper bound, (e) is the school upper bound, and (f) is a mixed linear model; ρhs is the estimated correlation coefficient between household and school effects; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.         50    Table B6: Full variance decomposition for UG Absolute contributions (in s.d. units) Relative shares (in %) (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) π=0 π = 0.5 π=1 UBH UBS MLM π=0 π = 0.5 π=1 UBH UBS MLM   0.17 0.18 0.18 0.13 0.20 0.19 2.94 3.12 3.13 1.75 4.05 3.59 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.06) (0.06) (0.06) (0.04) (0.07) (0.06)   0.53 0.39 0.35 0.56 0.31 0.38 28.03 15.57 12.22 31.03 9.35 14.72 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.25) (0.19) (0.17) (0.27) (0.15) (0.18)   0.31 0.40 0.55 0.14 0.53 0.44 9.49 15.77 30.39 2.03 28.11 19.44 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.25) (0.32) (0.45) (0.12) (0.43) (0.36) 2Σ   0.10 0.25 -0.04 0.13 0.18 0.22 0.96 6.38 -0.13 1.73 3.15 4.78 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.06) (0.16) (0.02) (0.08) (0.11) (0.14) 2Σ   0.14 0.11 0.09 0.16 0.09 0.11 1.89 1.31 0.86 2.47 0.74 1.31 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.05) (0.04) (0.04) (0.06) (0.03) (0.04) 2Σ   -0.11 -0.10 -0.06 0.04 -0.08 0.18 -1.25 -0.92 -0.37 0.19 -0.59 3.34 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.05) (0.05) (0.03) (0.02) (0.04) (0.09)   0.76 0.77 0.73 0.78 0.74 0.73 57.94 58.77 53.90 60.79 55.20 52.82 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.25) (0.25) (0.24) (0.26) (0.25) (0.24)   1.00 1.00 1.00 1.00 1.00 1.00 100.00 100.00 100.00 100.00 100.00 100.00 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) . . . . . .   0.03 0.20 -0.00 0.11 0.10 0.14 . . . . . . (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) . . . . . . Note: following equations (3b) and (3c), the table sets out the absolute and relative variance shares attributable to each component of the test score; absolute shares are in standard deviation units; different models/estimators are indicated in the columns – (a) to (c) refer to results from a partitioned iterative algorithm for different choices of initialization scalar π, (d) is the household upper bound, (e) is the school upper bound, and (f) is a mixed linear model; ρhs is the estimated correlation coefficient between household and school effects; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.     51    Table B7: Variance contributions for Kenya, by sub-group Absolute (s.d. units) Relative (in %)   Strata Level 2Σ 2Σ   Female 0 1.02 0.21 0.39 0.38 0.25 4.21 14.49 13.92 5.79 61.59     1 0.98 0.20 0.38 0.37 0.23 4.28 14.76 14.46 5.43 61.06   Age 6 0.98 0.29 0.38 0.39 0.21 9.08 15.38 15.66 4.83 55.05     9 1.01 0.16 0.38 0.39 0.28 2.58 14.24 15.12 7.47 60.59     12 1.01 0.15 0.38 0.33 0.26 2.36 14.15 10.80 6.68 66.01     15 1.01 0.18 0.38 0.30 0.24 3.08 14.38 8.96 5.63 67.95   Grade level 0 1.34 0.15 0.54 0.38 0.37 1.25 16.38 8.14 7.62 66.61     1 1.07 0.10 0.41 0.39 0.23 0.87 14.76 13.29 4.47 66.61     3 1.07 0.11 0.39 0.40 0.25 1.01 13.58 13.83 5.26 66.32     5 0.65 0.11 0.32 0.24 0.08 3.01 23.93 13.55 1.42 58.09   SES tercile 1 0.91 0.10 0.36 0.38 0.21 1.25 15.97 17.34 5.35 60.08     2 0.82 0.08 0.33 0.30 0.14 0.92 16.67 13.50 2.90 66.01     3 1.17 0.25 0.44 0.43 0.29 4.47 14.17 13.11 6.16 62.09   Mother primary 0 0.97 0.20 0.37 0.38 0.21 4.22 14.98 15.46 4.75 60.58     1 0.85 0.18 0.33 0.33 0.16 4.74 15.15 14.90 3.62 61.60   All . 1.19 0.25 0.44 0.42 0.31 4.54 13.61 12.39 6.54 62.92   Note: for each stratifying variable, indicated in the first column, sub-groups (second column) are mutually exclusive and span the entire dataset; female is a dummy variable (i.e., 0 = male, 1 = female); age and grade levels are grouped (e.g., age level 6 indicates children aged 6-8; age level 15 is children 15 and above; grade level 0 contains never enrolled children; grade level 5 is all those with highest grade 5 and above); for SES tercile, level 1 is the poorest group; mother edu. takes a value of 1 if the mother has attended primary school; variance components are as per equations (3b)-(3c) and the individual effects are aggregated for simplicity (denoted, 2 ); absolute and relative contributions are as per earlier tables, reported in standard deviation units and percentages respectively. Source: own calculations.   52    Table B8: Variance contributions for Tanzania, by sub-group Absolute (s.d. units) Relative (in %)   Strata Level 2Σ 2Σ   Female 0 1.01 0.24 0.42 0.36 0.26 5.53 17.34 12.63 6.76 57.72     1 0.98 0.22 0.42 0.36 0.26 5.17 17.99 13.27 6.82 56.76   Age 6 0.98 0.28 0.42 0.36 0.23 8.03 18.57 13.54 5.74 54.12     9 1.00 0.20 0.42 0.37 0.27 4.06 17.72 13.93 7.32 56.97     12 1.01 0.21 0.42 0.34 0.27 4.47 17.48 11.56 7.37 59.12     15 1.02 0.25 0.42 0.32 0.27 5.95 16.76 10.04 7.15 60.10   Grade level 0 1.03 0.15 0.51 0.26 0.28 2.13 24.89 6.26 7.59 59.13     1 1.02 0.15 0.43 0.37 0.23 2.13 17.68 13.51 5.19 61.50     3 1.07 0.17 0.43 0.38 0.28 2.41 16.00 12.74 6.88 61.95     5 0.77 0.17 0.37 0.29 0.18 4.81 22.93 13.78 5.49 52.98   SES tercile 1 0.95 0.17 0.41 0.37 0.25 3.04 18.16 15.04 6.77 56.98     2 0.89 0.07 0.38 0.32 0.25 0.56 18.52 13.04 7.72 60.16     3 1.03 0.25 0.44 0.36 0.25 6.07 18.34 12.46 5.68 57.44   Mother primary 0 0.99 0.24 0.42 0.36 0.24 5.80 17.91 13.09 5.99 57.22     1 0.91 0.20 0.38 0.34 0.23 5.03 17.25 13.87 6.34 57.51   All . 1.06 0.25 0.45 0.37 0.28 5.64 17.81 12.05 6.70 57.80   Note: for each stratifying variable, indicated in the first column, sub-groups (second column) are mutually exclusive and span the entire dataset; female is a dummy variable (i.e., 0 = male, 1 = female); age and grade levels are grouped (e.g., age level 6 indicates children aged 6-8; age level 15 is children 15 and above; grade level 0 contains never enrolled children; grade level 5 is all those with highest grade 5 and above); for SES tercile, level 1 is the poorest group; mother edu. takes a value of 1 if the mother has attended primary school; variance components are as per equations (3b)-(3c) and the individual effects are aggregated for simplicity (denoted, 2 ); absolute and relative contributions are as per earlier tables, reported in standard deviation units and percentages respectively. Source: own calculations.    53    Table B9: Variance contributions for Uganda, by sub-group Absolute (s.d. units) Relative (in %)   Strata Level 2Σ 2Σ   Female 0 0.99 0.18 0.39 0.40 0.25 3.15 15.66 15.84 6.28 59.08     1 1.00 0.18 0.40 0.40 0.26 3.07 15.50 15.73 6.47 59.23   Age 6 0.99 0.18 0.40 0.35 0.25 3.44 16.43 12.59 6.29 61.26     9 1.00 0.14 0.39 0.39 0.29 2.06 15.33 15.40 8.63 58.58     12 1.00 0.16 0.39 0.38 0.28 2.53 15.16 14.63 7.89 59.80     15 1.01 0.18 0.39 0.35 0.23 3.37 14.91 11.99 5.26 64.47   Grade level 0 0.96 0.21 0.42 0.18 0.19 4.78 19.44 3.50 3.75 68.53     1 0.93 0.13 0.40 0.33 0.21 1.97 17.98 12.53 4.82 62.70     3 1.08 0.15 0.40 0.38 0.27 2.04 14.14 12.28 6.41 65.14     5 0.82 0.16 0.37 0.29 0.14 3.80 19.92 12.71 2.82 60.75   SES tercile 1 0.97 0.13 0.38 0.41 0.25 1.75 15.87 18.01 6.50 57.87     2 0.97 0.10 0.40 0.40 0.25 0.98 16.77 16.52 6.60 59.15     3 0.97 0.18 0.39 0.39 0.22 3.35 15.79 15.95 4.99 59.91   Mother primary 0 0.96 0.17 0.39 0.39 0.23 3.17 16.22 16.73 5.76 58.12     1 0.99 0.17 0.39 0.39 0.24 3.09 15.55 15.86 6.10 59.40   All . 1.01 0.19 0.40 0.39 0.24 3.58 15.44 15.12 5.77 60.09   Note: for each stratifying variable, indicated in the first column, sub-groups (second column) are mutually exclusive and span the entire dataset; female is a dummy variable (i.e., 0 = male, 1 = female); age and grade levels are grouped (e.g., age level 6 indicates children aged 6-8; age level 15 is children 15 and above; grade level 0 contains never enrolled children; grade level 5 is all those with highest grade 5 and above); for SES tercile, level 1 is the poorest group; mother edu. takes a value of 1 if the mother has attended primary school; variance components are as per equations (3b)-(3c) and the individual effects are aggregated for simplicity (denoted, 2 ); absolute and relative contributions are as per earlier tables, reported in standard deviation units and percentages respectively. Source: own calculations.       54    Table B10: Summary of absolute variance contributions, alternative choices of π, never enrolled children only Kenya Tanzania Uganda . (s.e.) . (s.e.) . (s.e.) Individual 0 0.10 (0.001)   0.06 (0.002)   0.17 (0.002) 0.5 0.10 (0.001)   0.06 (0.002)   0.17 (0.002) 1 0.10 (0.001)   0.05 (0.002)   0.17 (0.002) Household 0 0.92 (0.012)   0.76 (0.012)   0.59 (0.011) 0.5 0.64 (0.008)   0.57 (0.009)   0.47 (0.009) 1 0.51 (0.007)   0.46 (0.007)   0.40 (0.008) School 0 0.20 (0.003)   0.16 (0.003)   0.15 (0.003) 0.5 0.44 (0.007)   0.27 (0.005)   0.20 (0.004) 1 0.80 (0.012)   0.51 (0.009)   0.36 (0.008) Sorting 0 0.11 (0.001)   -0.15 (0.003)   0.03 (0.001) 0.5 0.49 (0.007)   0.35 (0.006)   0.25 (0.005) 1 0.32 (0.004)   0.30 (0.005)   0.23 (0.005) Residual . 1.21 (0.017)   0.82 (0.014)   0.83 (0.017) Total . 1.55 (0.013)   1.11 (0.011)   1.03 (0.012) Note: the table sets out the absolute variance contribution (in standard deviation units) attributable to each component of the test score, where ‘sorting’ is the contribution of the between-factor covariance; different initializations of the partitioned iterative algorithm are indicated by column π; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.       55    Table B11: Summary of relative variance contributions, alternative choices of π, never enrolled children only Kenya Tanzania Uganda . / (s.e.) . / (s.e.) . / (s.e.) Individual 0 0.44 (0.10) 0.26 (0.22) 2.66 (0.31) 0.5 0.43 (0.11) 0.30 (0.21) 2.75 (0.30) 1 0.41 (0.11) 0.24 (0.22) 2.77 (0.30) Household 0 35.23 (0.90) 46.59 (1.29) 32.30 (1.28) 0.5 17.31 (0.63) 26.17 (0.97) 20.52 (1.02) 1 10.80 (0.50) 17.18 (0.78) 14.72 (0.87) School 0 1.68 (0.23) 2.19 (0.30) 2.06 (0.37) 0.5 8.10 (0.50) 5.96 (0.49) 3.62 (0.49) 1 26.80 (0.90) 21.22 (0.92) 12.01 (0.89) Sorting 0 0.47 (0.11) -1.95 (0.27) 0.10 (0.08) 0.5 10.01 (0.51) 9.73 (0.61) 5.84 (0.58) 1 4.18 (0.33) 7.33 (0.53) 4.83 (0.53) Residual . 61.38 (1.56) 54.93 (1.85) 65.28 (2.39) Total . 100.00 (1.15) 100.00 (1.44) 100.00 (1.71) Note: the table sets out the relative variance contribution (in percent) attributable to each component of the test score, where ‘sorting’ is the contribution of the between-factor covariance; different initializations of the partitioned iterative algorithm are indicated by column π; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.     56    Table B12: Summary of absolute variance contributions, alternative choices of π, children attending school only Kenya Tanzania Uganda . (s.e.) . (s.e.) . (s.e.) Individual 0 0.05 (0.000)   -0.04 (0.000)   0.10 (0.001) 0.5 0.04 (0.000)   -0.05 (0.000)   0.09 (0.001) 1 0.04 (0.000)   -0.05 (0.000)   0.09 (0.001) Household 0 0.48 (0.002)   0.56 (0.002)   0.52 (0.002) 0.5 0.36 (0.001)   0.39 (0.002)   0.39 (0.002) 1 0.32 (0.001)   0.34 (0.001)   0.35 (0.002) School 0 0.29 (0.002)   0.25 (0.002)   0.31 (0.002) 0.5 0.37 (0.002)   0.36 (0.002)   0.40 (0.003) 1 0.51 (0.003)   0.55 (0.004)   0.56 (0.004) Sorting 0 -0.07 (0.000)   -0.11 (0.001)   0.10 (0.001) 0.5 0.20 (0.001)   0.24 (0.001)   0.25 (0.001) 1 -0.12 (0.001)   -0.16 (0.001)   -0.07 (0.000) Residual . 0.68 (0.002)   0.67 (0.003)   0.73 (0.003) Total . 0.88 (0.002)   0.90 (0.002)   0.97 (0.002) Note: the table sets out the absolute variance contribution (in standard deviation units) attributable to each component of the test score, where ‘sorting’ is the contribution of the between-factor covariance; different initializations of the partitioned iterative algorithm are indicated by column π; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.     57    Table B13: Summary of relative variance contributions, alternative choices of π, children attending school only Kenya Tanzania Uganda . / (s.e.) . / (s.e.) . / (s.e.) Individual 0 0.29 (0.06) -0.24 (0.05) 1.01 (0.09) 0.5 0.18 (0.06) -0.28 (0.05) 0.84 (0.08) 1 0.17 (0.06) -0.28 (0.05) 0.91 (0.08) Household 0 29.08 (0.21) 38.26 (0.31) 29.16 (0.27) 0.5 16.21 (0.16) 19.14 (0.22) 16.04 (0.20) 1 13.04 (0.14) 14.20 (0.19) 12.68 (0.18) School 0 11.02 (0.21) 7.93 (0.20) 10.44 (0.27) 0.5 17.22 (0.26) 16.22 (0.29) 17.30 (0.34) 1 32.87 (0.36) 37.85 (0.44) 33.03 (0.47) Sorting 0 -0.62 (0.04) -1.47 (0.07) 0.96 (0.06) 0.5 5.11 (0.11) 7.02 (0.16) 6.72 (0.16) 1 -2.01 (0.07) -2.99 (0.10) -0.50 (0.05) Residual . 59.14 (0.38) 54.88 (0.47) 57.14 (0.46) Total . 100.00 (0.29) 100.00 (0.37) 100.00 (0.35) Note: the table sets out the relative variance contribution (in percent) attributable to each component of the test score, where ‘sorting’ is the contribution of the between-factor covariance; different initializations of the partitioned iterative algorithm are indicated by column π; standard errors are reported in parentheses, calculated using the asymptotic approximation due to Ahn and Fessler (2003). Source: own calculations.     58    C List of aggregated regions Country Aggregated region Actual region Obs. KE Central Central 33,306 KE Central Nairobi 4,564 KE Coast Coast 49,921 KE Eastern Eastern 84,133 KE North Eastern North Eastern 54,132 KE Nyanza Nyanza 76,471 KE Rift Valley Rift Valley 169,672 KE Western Western 83,720 TZ Arusha Arusha 18,609 TZ Arusha Kilimanjaro 16,102 TZ Arusha Mara 18,172 TZ Dar Es Salaam Dar Es Salaam 5,889 TZ Dar Es Salaam Pwani 17,136 TZ Iringa Dodoma 16,991 TZ Iringa Iringa 15,594 TZ Iringa Morogoro 13,500 TZ Iringa Njombe 2,532 TZ Kagera Geita 4,600 TZ Kagera Kagera 20,595 TZ Kagera Mwanza 25,281 TZ Kigoma Katavi 1,994 TZ Kigoma Kigoma 14,228 TZ Kigoma Rukwa 16,240 TZ Ruvuma Lindi 10,611 TZ Ruvuma Mtwara 6,594 TZ Ruvuma Ruvuma 12,662 TZ Singida Mbeya 18,232 TZ Singida Singida 14,037 TZ Tabora Shinyanga 25,162 TZ Tabora Simiyu 4,576 TZ Tabora Tabora 20,198 TZ Tanga Manyara 16,836 TZ Tanga Tanga 23,092 UG Central Central 64,077 UG Eastern Eastern 120,142 UG Northern Northern 102,723 UG Western Western 75,186   59