w y} 17 POLICY RESEARCH WORKING PAPER 295 6 Survey Compliance and the Distribution of Income Johan A. Mistiaen Martin Ravallion The World Bank Development Research Group Poverty Team January 2003 POLICY RESEARCH WORKING PAPER 2956 Abstract While it is improbable that households with different on response rates across geographic areas. An application incomes are equally likely to participate in sample using the Current Population Survey for the United surveys, the lack of data for nonrespondents has States indicates that compliance falls as income rises. hindered efforts to correct for the bias in measures of Correcting for selective compliance appreciably increases poverty and inequality. Mistiaen and Ravallion mean income and inequality, but has only a small impact demonstrate how the latent income effect on survey on poverty incidence up to commonly used poverty lines compliance can be estimated using readily available data in the United States. This paper-a product of the Poverty Team, Development Research Group-is part of a larger effort in the group to develop better methods of measuring poverty and inequality from survey data. Copies of the paper are available free from the World Bank, 1818 H Street NW, Washington, DC 20433. Please contact Patricia Sader, room MC3-556, telephone 202-473- 3902, fax 202-522-1151, email address psader@worldbank.org. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted atjmistiaen@worldbank.org or mravallion@worldbank.org. January 2003. (31 pages) The Policy Research Working Paper Senes disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent Produced by the Research Advisory Staff Survey Compliance and the Distribution of Income Johan A. Mistiaen and Martin Ravallion' World Bank, Washington DC Keywords: Survey non-response, income distribution, poverty and inequality measurement. JEL: C42, D31, D63, 13 I For comments on an earlier draft we are grateful to Frank Cowell, Angus Deaton and Domninique van de Walle. These are the views of the authors, and should not be attributed to the World Bank or any affiliated organization. EM addresses: imistiaen()worldbank.org and mravallionAworldbank.org. 1. Introduction It is known that errors in the incomes reported in surveys have important implications for measures of poverty and inequality based on those surveys (Van Praag et al., 1983; Chakravarty and Eichhorn, 1994; Ravallion, 1994; Cowell and Victoria-Feser, 1996; Chesher and Schluter, 2002). For example, classical measurement error in the reported incomes of sampled households leads to over-estimation of standard inequality measures (Chakravarty and Eichorn, 1994). Chesher and Schluter (2002) derive formulae for correcting a number of poverty and inequality measures for multiplicative measurement error in the underlying individual welfare levels, assuming that the sample is representative of the relevant population. A measurement issue that has received less attention is the fact that it is invariably the case that some sampled households simply do not participate in surveys, either because they explicitly refuse to do so or nobody is at home. In the literature, this is often called "unit non- response" and is distinct from "item non-response," which occurs when some of the sampled households who agree to participate refuse to answer questions on their incomes. Various imputation/matching methods address item non-response by exploiting the questions that are answered (Lilard et al., 1986; Little and Rubin, 1987). However, that is not an option for unit non-response. Some surveys make efforts to avoid unit non-response, using "call-backs" to non- responding households and fees paid to those who agree to be interviewed.2 Nonetheless, the problem is practically unavoidable and non-response rates of 10% or higher are common; indeed, we know of national surveys for which 30% of those sampled did not comply.3 2 On reducing bias using call-backs see Deming (1953), Van Praag et al. (1983), Alho (1990), and Nijman and Verbeek (1992). On the economics of incentive payments see Philipson (1997). 3 Scott and Steele (2002) report non-response rates for eight countries, ranging from virtually zero to 26%. Holt and Elliot (1991) quote a range of 15-30% for surveys in the UK. Philipson (1997) reports a mean non-response rate of 21% for surveys by the National Opinion Research Center in the U.S. 2 How does unit non-response affect survey-based measures of poverty and inequality? To the extent that compliance is random, there will be no bias. However, just as income constrains almost all behavior, it undoubtedly matters to choices about compliance with sample assignments. For instance, high-income households might be less likely to participate because of a high opportunity cost of their time or concerns about intrusion in their affairs. The poor too may be underrepresented; some are homeless and hard to reach in standard household survey designs, and some may be physically or socially isolated and thus less easily interviewed. The presence of income-dependent compliance can bias survey-based estimates of the distribution of income. However, the direction of bias cannot be assessed on a priori grounds; for example, if compliance tends to be lower for both the very poor and the very rich then there will be potentially offsetting effects on measures of the incidence of poverty. Unit non-response may well have an offsetting effect on measured inequality to measurement errors in reported incomes. The possibility of selective compliance is commonly ignored in practice. There are two exceptions. The first is found in the strand of the literature on measuring poverty and inequality in which the survey mean is replaced by average incomes from national accounts.4 This approach rests on two key assumptions, namely that the national accounts give a valid estimate of mean household income and that the discrepancy between the two data sources is distribution neutral; implying one only needs to make an equi-proportionate correction at all levels. Hitherto, little or no evidence has been advanced for or against these assumptions.5 4 This is not common practice in empirical work, but there has been a flurry of recent examples, including Bhalla (2002), Bourguignon and Morrisson (2002) and Salla-i-Martin (2002). While these authors acknowledge that they are making these assumptions for computational convenience, some also defend the method on the grounds that it allows a correction for under-reporting and non-compliance in surveys (Bhalla, 2002; Sala-i-Martin, 2002). 5 For further discussion (in the context of poverty measurement for India, though the point is more general) see Ravallion (2000). On the discrepancies between estimates of mean consumption from surveys versus national accounts across countries see Ravallion (2002). 3 A second, more promising approach is based on utilizing geographic or other observable differences in survey response rates. Atkinson and Micklewright (1983) use regional differences in survey response rates to correct for differential non-response in the U.K. Family Expenditure Survey. The Current Population Survey for the U.S. uses a similar method (Census Bureau, 2000, Chapter 10). These methods assume that the non-compliance problem is ignorable within areas. However, this assumption is essentially ad hoc, with no behavioral basis, and there is no a priori reason why it would be valid; why would compliance be non-random between areas but random within them? The contribution of the present paper is to show that the ignorability assumption can be relaxed using exactly the same data used in past ad hoc corrections following the second approach. We show that it is possible to identify the latent individual probability of survey compliance as a function of income using the empirical relationship between aggregate compliance rates across areas and mean incomes by percentile groups. Our method recognizes that the empirical percentile group shares are biased given that there is selective compliance. We deal with this problem numerically, by iterating the parameter estimation after revising the empirical shares consistently with the empirical income effect identified at the previous iteration. On convergence, the identified individual compliance probability given income is used to correct for bias in the estimated income distribution. Our approach deals simultaneously with response bias within and between areas. We are thus able to present the first estimates (to our knowledge) of the bias in measured distributions due to unit non-response. While we only present estimates for one country here, the minimal data requirements of our method should allow a wide range of applications in practice. 4 We first establish why unit non-response is unlikely to be ignorable using a simple economic model of compliance choice (section 2). We then examine the model's implications for measures of income poverty and inequality (section 3). This motivates our effort to test for an income effect on compliance. We outline our empirical method in section 4 and then present results for the U.S. (section 5). We offer some conclusions in section 6. 2. Income-dependent suirvey compliance Survey participation is a matter of individual choice; nobody is obliged to comply with the statistician's randomized assignment. There is some perceived utility gain from compliance-the satisfaction of doing one's civic duty, for example-but there is a cost as well. Let ye [yp, YR] be household income per person (yp is the income of the poorest person and YR is for the richest) and c(y) the cost to the respondent of survey participation (net of any compensation received for participation). We assume that c'(y) 2 0. This can be rationalized by assuming that the opportunity cost of the time required to comply rises with income, while the time itself is roughly independent of income. More precisely, let r denote the time required for the survey interview and normalize total available time to unity. Full income is y = w + Xr where w is the wage rate and X* is non-wage income. The cost of survey participation is then c(y) = TW= T(y - ,r) with 0 < c'(y) = T < 1 . Nonlinearity of c(y) can arise when T varies with y. Let utility be u[y - c(y)d, d] where d= 1 if one chooses to comply and d=O if not. The function u is strictly increasing in both arguments. The utility gain from compliance is: g(y) = u[y - c(y), 1] - u(y, 0) (1) with slope: g'(y) = uy [y - c(y), 11 - c'(y)] - uy (y, 0) (2) 5 where subscripts denote partial derivatives. We assume that the probability of compliance is a strictly increasing common function of the utility gain. This simple model can generate a wide range of outcomes for the relationship between compliance and income. We consider some special cases. From (2), it is evident that compliance falls monotonically with income if and only if: c'(y) > 1- uy(y,O) for ally Uy[y - C(y), 1] A simple case in which this holds is when the cost of participation increases monotonically with income (c'(y) > 0) and the marginal utility of income is independent of survey participation, i.e., uY(y, 0) = uy[y - c(y), 1]. Then g'(y) = -uy(.)c'(y) < 0 for all y. However, the opposite result can also be obtained, whereby compliance rises with income. For example, suppose instead that the cost of participation is independent of income ( c'(y) = 0 ), implying that g'(y) = uy [y - c(y), 1] - uy (y, 0) . If there is diminishing marginal utility of income and utility is separable between income and compliance (uy (y, 1) = uy (y, 0)) then g'(y) > 0; the poor will be less likely to participate. Without separability, the outcome depends on whether compliance raises or lowers the marginal utility of income, which is not obvious on a priori grounds. If compliance leads to a higher marginal utility of income then again g'(y) > 0. If it lowers the marginal utility of income then the income effect could go either way. Suppose that the difference in income effect on the marginal utility of income dominates at low incomes, uy [-c(y), 1] > uy (0, 0), while the adverse effect of compliance on the marginal utility of income dominates at high y, i.e., 6 uY [1- c(y), 1] < uY (1, 0). Then one can again find an inverted-U pattern in which middle-income groups are more likely to participate than either tail of the distribution. Other special cases can deliver this inverted-U relationship. For instance, assume that: (i) the cost of compliance is a non-negative and strictly increasing and convex in income, c'(y) > 0, c0(y) > 0 with c'(yp) = 0; (ii) utility is separable between income and compliance and (iii) for the richest person, the cost of participation is negligibly small, i.e., lim uy - c(y)] = uy (y). Then separability implies that we can re-write (2) as: g'(y) = -uyAy - c(y)]c'(y) + uy[y - c(y)] - uy (y) (3) The first term on the right-hand side is negative while the second is positive, given declining marginal utility. At low incomes the second term will dominate (since c'(y) will be small) and hence g'(y) > 0 at low y. At high incomes, by contrast, the first term will dominate and hence g'(y) < 0. In other words, the gains will tend to be highest for middle-income groups. Notice that in this model, the introduction of a fixed fee paid to those who agree to participate will increase the probability of participation, but it can make the income gradient of compliance even more negative. This will happen if the cost of compliance rises less than one- to-one with income, and there is declining marginal utility of income. 3. lmplications for poverty and iiAneqjunaLUty nmeasures In exploring the theoretical implications for the distribution of income, we confine attention to the special cases discussed above in which the compliance-income relationship is either monotonic decreasing or an inverted-U shape. 7 Let F(y) denote the true (unobserved) cumulative distribution function of income y with continuous density functionj(y). The sample-based estimate is F(y) with corresponding density f (y) and we assume that F(O) = 0. The true distribution can be derived from the empirical distribution by appropriate re-weighting. The true density function is f (y) = w(y)f(y) where the "correction factors" w(y) are the inverse probabilities of compliance, so w(y) = 0[g(y)] for a strictly decreasing differentiable function 0. The corrected distribution function is: F(y) = fw(x)f(x)dx (4) yp The expected value of the correction factor is unity, i.e., fYR w(x)f (x)dx = 1. Consider first the case in which compliance falls monotonically with income, i.e., w'(y) >0. On integrating (4) by parts one obtains the following formula for the difference between the true distribution of income and the empirical distribution: F(y) - F(y) = [w(y) - l]F(y) - |w'(x)F(x)dx (5) yP It is evident that F(y) < F(y) for all y < w'1 (1). By continuity there must exist an income y defined as the minimum value of y for which F(y) = F(y). Following a result proved in Atkinson (1987), the empirical distribution will then overestimate the extent of income poverty for all poverty lines up to y and all additive poverty measures satisfying standard properties. Notice however, that first-order dominance over all y is not guaranteed by the assumptions made so far; values of y for which F(y) > F(y) are possible if compliance rates fall to a sufficiently low level at high incomes. This is an empirical question. 8 Consider instead the inverted-U relationship of compliance with income. There are two points at which no correction to the density function is needed, namely YL and Yu with YL l for YYu and w(y) 0 for all y > Yu though this can be relaxed somewhat without altering the main results. From (5): YL F(yL)-F(yL) = - w'(x)F(x)dx >O (6.1) yP Yu F(yu)-F(yU) = - JW'(x)F(x)dx < 0 (6.2) yP Intuitively, both the incidence of low-incomes (F(yL)) and high incomes (1-F(yu)) are underestimated, given the structure of the income effect on compliance. On noting that: d[F(y) - F(y)] = [w(y) - l]f(y) (7) dy it is evident that the impact of this pattem of income effects on compliance is as represented in Figure 1. By continuity, there must exist a point y* e (YL, Yu) such that F(y ) = F(y0). Again, for a broad class of poverty measures in the literature and all poverty lines up to y*, the empirical distribution will underestimate the extent of income poverty. Of course, the same holds over the entire support of the distribution if nobody has an income greater than yo (f (y) = 0 for all y > y*). On the other hand, suppose that nobody has an income less than y (f(y) = 0 for all y < ye). Then the empirical distribution will unambiguously overestimate the extent of poverty (i.e., F(y) < F(y) for all y.) 9 Though we omit the detailed analysis, similar arguments can be used to show that the impact on measured inequality of an income effect on compliance is also ambiguous, and will depend (inter alia) on the specific measure of inequality used. It is easy to see why if we consider the case in which compliance falls monotonically with income, implying that the mean is underestimated. Consider the poorest and richest persons, with incomes yp and YR. The survey yields the correct values for these incomes but underestimates the proportion of people who have income YR and overestimates the proportion with yp. Figure 2 shows how the income effect on compliance affects the Lorenz curve. The bold lines are the segments of the empirical Lorenz curve for the poor and the rich, and the bold dashed lines are the underlying true Lorenz curve. The true slope of the lower segment corresponding to the poorest person is yp /, while the slope of the upper most segment is YR IP/, where , is mean income. The slopes of both segments of the Lorenz curve will be overestimated by the survey data given that the empirical mean is underestimated (,u > ,u) since the higher income groups are underrepresented. By continuity, the true Lorenz curve must intersect the empirical Lorenz curve, implying that the effect on inequality is ambiguous, and will depend crucially on he measure of inequality used. If instead compliance rises with income then one can re-interpret Figure 2 accordingly (bold line is the true Lorenz curve) and see that again there must be an intersection. 4. Method for estimating the income effect on compliance While we do not observe the individual probabilities of compliance, we do observe both the aggregate response rates by geographic area and the incomes of complying units. The problem is to infer how individual compliance varies with income from these data. The observed 10 aggregate response rates by area are unconditional means across the (unknown) conditional response rates by level of income. However, the aggregate response rates are not simple un- weighted means, since if compliance rates vary with income then the population shares by income level in the survey data actually collected will be wrong. The fact that we only observe aggregate response rates across geographic areas implies that we must impose some aggregation structure on the problem of estimating the latent individual income effect on compliance. We make two key assumptions. Firstly we assume that the data can be aggregated in the form of a set of homogeneous income groups with a common number of groups across all geographic areas. The population is divided into n income groups and m geographical areas, called "states" hereafter. For the computational convenience of having a common data structure across all states, we impose the restriction that the number of income groups is identical across states. Since the sample size is unlikely to be constant across regions this also entails that a degree of aggregation is unavoidable. In estimating the parameters of the income effect on compliance, we further ignore income differences within a given (income-state-specific) group of sampled households. Thus the mean incomes of the n by m groups become fixed data points in our method for estimating the income effect on compliance and hence correcting the sample weights for selective compliance. The second assumption involves aggregation of the latent heterogeneity. Here we assume that the heterogeneity in compliance at given income can be captured by a common additive area-specific error term. Given that our method relies on the observation of state-specific compliance aggregates only (rather than by income group, which is of course intrinsically unobservable), it is impossible to further decompose the aggregate (state-specific) error term. 11 Let Pij denote the (unobserved) probability of compliance for a person in income group i=l,..,n living in statej=l,..,m. The probability of compliance varies with the mean income y, of group i in statej according to: PJi = P(yiJ; P) + £, (8) where P is a smooth function with one or more parameters, fl, and e, is a zero-mean error term. We assume the following parametric form: P(yij; ,B) = L(4o + Al In y, + ,82 (In yi)2 ) (9) where L(x) = ex ( + ex) is the logistic function. This specification is both sufficiently flexible to test the scenarios developed in section 3 and ensures that the observed mean response rate P is bounded within the unit interval. WhilePjj is unknown, we observe the proportion of the population in each statej that are compliant: n i15 where wj is the proportion of the population of statej who belong to income group i, and n E,-E W.,E, (11) i=l If there was no selective compliance then for equal sized groups (quintiles, say) we have W,j = 1/ n . With suitable parameterization of the function P(y, ; ,8) we can then estimate (10) using standard econometric methods. However, selective compliance complicates matters. To correct for this we should be re-weighting the data according to the differences in response rates across income groups, so that the correct weight take the form: 12 w = for all (i,j) (12) k=l We proceed iteratively. First we estimate (10) based on the assumption that compliance is distribution neutral, i.e. w0 = 1/n for all (i,j), where the superscript "O" refers to the starting WU value. This yields a vector of parameter estimates, /i0, and state-specific error terms. However, the error terms by income group are not identified. Under our assumption that the error term is common to all income groups in a given state, we obtain an initial vector of estirnated compliance probabilities: pi =P(Yv;/30)+.C (13) These in turn can be used to re-weight the data for the next iteration using: w,, u = ,( v )(14) k=l We then re-estimate (10) using (14) for these new weights, giving the regression: pj = Jv ,P(yi,;fi) + (15) i=, This gives revised estimates of the parameters and residuals. We iterate this procedure until the estimated coefficients (and hence the estimated proportions of the population in each income group and area) converge. Finally, we use the vector of parameter estimates from the last iteration and each complying household's per capita income to infer the latent compliance probability for that household. The inverse of this probability gives the household-specific correction factor that allows us to estimate the corrected income distribution fiunction defined in (4). Notice that this 13 last step does not require the first aggregation assumption, described above which is only used in estimating the parameters and state-specific error terms. 5. Application to the U.S. income distribution Data on survey response rates across geographical areas are often available from survey producers. A case in point is the March 2001 supplement of the US Current Population Survey (CPS).6 In addition to detailed data on incomes, the CPS contains geographically referenced information on compliance (Census Bureau, 2000, Chapter 7). We define non-compliance as what the Census Bureau refers to a "type A non-interviews," which refer to households assigned for interview but for which no usable data were collected because household members explicitly refused to be interviewed or were absent during the interviewing period.7 The March 2001 CPS has a sample size of 17,788 households (net of other non-interview types) of which 1,461 were classified as type A non-interviews. In addition, we also treat the 134 households that were interviewed but refused to answer the income questions as non-compliant. Together this implies an overall non-response rate of about 9%. The CPS has its own procedures in trying to adjust for non-response (described in Census Bureau, 2000, Chapter 10).8 In dealing with unit non-response, the CPS assumes that the problem is ignorable once primary sampling units with non-responding households are grouped together within other matched geographic areas (typically within the same state). The Census Bureau acknowledges that this may or may not be valid. The data set only gives one weight 6 The CPS data and survey methodology details are available for the US Census Bureau and can be accessed on-line at: http://www.census.gov/hhes/www/income.html. 7 Other types of "non-interviews" refer to cases were the residence was found to be demolished, under construction, etc. These are less likely to bias the income distribution because the household is likely to be no longer the premises for a variety of reasons that are not correlated with income. 8 For a critical assessment of the imputation methods used by the Census Bureau in correcting estimates for income non-response see Lillard, Smith and Welch (1986). 14 (called "final weight") for each household, and that weight reflects various adjustments, including for non-response and sample design. We cannot disentangle the CPS adjustment for non-response from other factors. For this reason, we chose to ignore the CPS weights. So, for the purpose of our exercise, neither our "empirical" nor "corrected" distributions of income have used the CPS weights, though both distributions are household-size weighted. The sample was designed to be representative of the US at the state level, giving j=l,. . .,51 geographical areas. We set a minimum sample size of 30 for any state-income group combination. Since the smallest sample size for any state is 150, this means that we set n=5. Thus, we divide the sample for each state into quintiles, based on the state-level per capita income distribution quintiles. We also test the sensitivity of our results to this assumption. Non-response rates vary from 3.2% in Alabama to 19.6% in the District of Columbia (Table 1). There is no significant correlation between sample size and compliance rates. State- level average income on the other hand is correlated with compliance, and this correlation is strongest for the top income quintile and weakest for the bottom quintile (see Figure 3). The mean incomes by quintile are also given in Table 1. The specification in equation (9) did not yield an estimate for P2 that was statistically significantly different from zero so we set /2=0. The linear specification did produce significant parameter estimates at each iteration (Table 2) indicating that higher income negatively affects the propensity to comply; Table 2 gives the parameter estimates.9 The estimated coefficients (Figure 4) and reweighed shares of the population in each income group in each state (Figure 5) converged up to 3 decimal places after 9 iterations. 9 For each iteration, we used the standard Gauss-Newton non-linear estimation method and all parameter estimates converged. 15 Our results indicate that ignoring selective compliance according to income appreciably understates the proportion of the population in the richest income per capita quintile and overstates the population shares in the bottom four quintiles. The highest income quintile is estimated to comprise 24% of the population after correcting for its lower probability of survey compliance. By contrast, the poorest quintile in the unadjusted data actually comprises 18% of the population. Table 3 gives the original and corrected mean incomes by 20 equal fractiles (the third column, labeled n=6, will be discussed below.) After our correction for selective non- compliance, the overall mean rises by 23%, from $21,576 per capita to $26,454. However, the correction is clearly not distribution-neutral; the proportionate adjustment rises from about 5% at the bottom to over 54% at the top. Figure 6 gives the Lorenz curves, with enlargements of the extreme lower and upper ends shown in Figure 7. (Focus on the n=5 case; we will explain the n=6 case shortly.) The Lorenz curves intersect as predicted in section 2; thus the qualitative effect on measured inequality cannot be predicted on a priori grounds. However, it is plain from Figure 6 that the predominant effect of our correction is a downward shift of the Lorenz curve, implying higher inequality by most measures. The Gini index increases appreciably from 45.05% to 50.76% on correcting for our estimates of the income effect on compliance. The effect on the levels distribution of income per capita can be seen from Figure 8. Naturally, this also reflects the impact on the mean. It can be seen that the impact on poverty incidence is small for poverty lines commonly used in the U.S., giving poverty rates around 12% (Census Bureau, 2001); Figure 9 gives a blow-up for the lower 30%. However, there is still first- 16 order dominance, implying that poverty measures are unambiguously overestimated under the standard assumption in practice of ignorable non-response. A striking feature of our findings is that so much of the impact is at the upper end of the distribution, notably the top quintile or so (Table 3). So our results may be sensitive to aggregation at this end of the distribution. To test this, we split the highest-income quintile into two and re-run the estimation method. The method converged at a lower estimate (in absolute value) for,ti8 of -1.553, with a standard error of 0.243. Table 3 gives the conditional means for this case; the pattern is similar, but the upward adjustmnent is lower. The upward adjustment needed to be consistent with selective compliance rises from only 3% at the bottom to 30% at the top. Instead of a revised mean of $26,454 we obtained $24,291. Figures 6 and 8 also give the Lorenz curves and distribution functions for this case (labeled n=6). Instead of an upward revision of the Gini index to 50.75% (from 45.05%) we now obtain 48.29%. There is negligible impact on the cumulative distribution function at the lower end. While quantitative magnitudes are somewhat sensitive to this change to the estimation method, the qualitative results are not. The problem of selective compliance is clearly not ignorable in estimating standard summary statistics from income surveys. And even if one is willing to assume that the national accounts provide a better basis for setting the mean, the bias is clearly far from distribution-neutral. 6. Con Rus$ons We have argued that there is likely to be an income effect on survey compliance, though the direction of bias in poverty or inequality measures could go either way in theory. So it is an empirical question. Past empirical work has either ignored the problem of selective compliance in surveys or made essentially ad hoc corrections. We have shown how the latent income effect 17 on compliance can be estimated consistently with the available data on average response rates and the measured distribution of income across geographic areas. Thus we are able to re-weight the raw data to correct for the problem. On implementing our method using US data, we find that the problem is not ignorable. We can also reject the assumptions made in past ad hoc correction methods. We find a highly significant negative income effect on survey compliance. While we do not find strict Lorenz dominance, inequality tends to be appreciably higher after correcting for selective compliance. Thus we find that unit non-response has the opposite impact on inequality to the problem of classical measurement error in reported incomes that has been studied in past work in the literature. A sizeable upward revision to the overall mean is also called for to correct for selective compliance. In terms of the impact on the incidence of poverty, the downward bias in the mean tends to offset the downward bias in measured inequality. The tendency for low income groups to be over-represented (because of their higher compliance probabilities) still means that the poverty rate tends to be over-estimated, though the impact on poverty incidence is small up to poverty lines normally used in the U.S. We find some sensitivity of the quantitative results to changing the number of income groups one identifies in the estimation method, though our qualitative conclusions are robust. There can be no presumption that even our qualitative results will hold elsewhere. Possibly in poorer settings one will find greater under-representation of the poor than in the US. Or one might find a less (more) steep income gradient of compliance in countries with lower (higher) inequality than the US. These are conjectures. However, the data and computational demands of the method we have proposed are not great, so other applications are possible. 18 References Alho, J.M. (1990) "Adjusting for Nonresponse Bias using Logistic Regression," Biometrica, 77(3): 617-24. Atkinson, A.B. (1987) "On the Measurement of Poverty," Econometrica 55: 749-764. Atkinson, A.B. and J. Micklewright (1983) "On the Reliability of Income Data in the Family Expenditure Survey 1970-1977," Journal of the Royal Statistical Society Series A, 146(1): 33-61. Bhalla, Surjit (2002) Imagine There's No Country: Poverty, Inequality and Growth in the Era of Globalization, Washington DC.: Institute for International Economics. Bourguignon, Francois and Christian Morrisson (2002) "Inequality Among World Citizens: 1820-1992," American Economic Review 92(4): 727-744. Census Bureau (2000) "Current Population Survey Design and Methodology," Technical Paper 63. Washington, D.C: U.S. Departnent of Commerce. (2001) "Poverty in the United States: 2001" Current Population Report P60-219, Washington, D.C: U.S. Department of Commerce. Chakravarty, S.R., and W. Eichhorn (1994) "Measurement of Income Inequality: Observed versus True Data," in W. Eichhorn (ed.) Models and Measurement of Welfare and Inequality, Heidelberg: Springer-Verlag. Chesher, A., and C. Schluter (2002) "Welfare Measurement and Measurement Error," Review of Economic Studies, 69: 357-378. Cowell, F.A., and M. Victoria-Feser (1996) "Robustness of Inequality Measures," Econometrica, 64: 77-101. Deming, W.E. (1953) "On a Probability Mechanism to Attain an Economic Balance between the Resultant Error of Response and the Bias of Nonresponse," Journal of the American Statistical Association, 48:743-72. Holt, D., and D. Elliot (1991) "Methods of Weighting for Unit Non-Response," The Statistician, 40: 333-342. Lillard, L., Smith, J.P. and F. Welch (1986) "What Do We Really Know about Wages? The Importance of Nonreporting and Census Imputation," Journal of Political Economy, 94(3):489-506. Little, R.J.A. and D.B. Rubin (1987) Statistical Analysis with Missing Data. New York: Wiley. 19 Nijman, T. and M. Verbeek (1992) "Nonresponse in Panel Data: The Impact on Estimates of a Life Cycle Consumption Function," Journal of Applied Econometrics, 7:243-57. Philipson, Tomas (1997) "Data Markets and the Production of Surveys," Review of Economic Studies 64: 47-72. Ravallion, M. (1994) "Poverty Rankings Using Noisy Data on Living Standards," Economics Letters, 45: 481-485. . 2000, "Should Poverty Measures be Anchored to the National Accounts?" Economic and Political Weekly 34 (August 26): 3245-3252. , 2002, "Measuring Aggregate Welfare in Developing Countries: How Well do National Accounts and Surveys Agree?," Review of Economics and Statistics, in press. Sala-I-Martin, Xavier (2002), "The World Distribution of Income (Estimated from Individual Country Distributions)," mimeo, Columbia University. Scott, Kinnon and Diane Steele (2002), "Measuring Welfare in Developing Countries: Living Standards Measurement Study Surveys," in UN Statistical Division, Surveys in Developing and Transition Countries, forthcoming. Van Praag, B., A. Hagenaars and W. Van Eck (1983) "The Influence of Classification and Observation Errors on the Measurement of Income Inequality," Econometrica, 51: 093-1108. 20 Table 1. Sample characteristics Mean Sample size Mean log per capita income per ii quintile State compliance Households Individuals i=I i=2 i=3 z=4 i=5 rate Alabama 0.968 250 620 8.32 9.12 9.53 10.01 10.85 Idaho 0.960 250 612 8.55 9.24 9.69 10.08 10.83 West Virginia 0.955 245 558 8.48 9.13 9.55 9.95 10.54 Utah 0.955 198 613 8.48 9.29 9.73 10.13 10.90 North Dakota 0.950 219 495 8.72 9.33 9.76 10.06 10.69 Mississippi 0.950 199 466 8.51 9.16 9.56 10.01 10.65 Louisiana 0.949 198 466 8.45 9.10 9.56 9.96 10.66 Nebraska 0.949 254 586 8.72 9.47 9.84 10.29 10.83 Montana 0.942 225 498 8.65 9.30 9.71 10.05 10.86 South Dakota 0.940 235 523 8.68 9.32 9.76 10.09 10.71 Wyoming 0.938 242 568 8.66 9.29 9.69 10.16 10.95 Iowa 0.936 219 514 8.87 9.36 9.71 10 12 10.84 Delaware 0.935 168 441 8.69 9.50 9.87 10.29 11.11 Florida 0.932 942 2,161 8.65 9.36 9.79 10.19 10.92 Minnesota 0.930 244 557 8.95 9.59 9.92 10.31 11.28 Tennessee 0.929 225 511 8.45 9.17 9.60 10.08 10.98 Virginia 0.928 263 633 8.82 9.52 9.90 10.33 11.18 Indiana 0.928 235 536 8.69 9.40 9.77 10.20 10.82 Wisconsin 0.925 268 636 8.85 9.52 9.91 10.24 11.02 Arkansas 0.925 253 576 8.38 9.08 9.47 9.91 10.85 South Carolma 0.924 171 363 8.74 9.34 9.73 10.15 10.80 Oklahoma 0.923 285 667 8.23 9.13 9.61 10.13 10.86 Vermont 0.922 192 415 8.62 9.42 9.90 10.19 11.13 Oregon 0.921 203 478 8.61 9.43 9.83 10.23 10 92 Massachusetts 0.921 403 944 8.75 9.48 9.92 10.35 11.22 Maine 0.920 188 408 8.76 9.41 9.79 10.20 10 88 Nevada 0.917 240 624 8.69 9.34 977 10.21 11.00 Kansas 0.915 235 514 8.78 9.40 9.85 10.28 10.84 Ohio 0.914 629 1,485 8.78 9.46 9.85 10.24 10.97 Washington 0.913 230 546 8.56 9.40 9.82 10.26 11.07 North Carolina 0.913 436 1,007 8.61 9.26 9.75 10.19 10.85 Missouri 0.912 239 539 8.90 9.56 9.95 10.32 11.03 Texas 0.911 961 2,439 8.29 9.17 9.63 10.13 11.10 Michigan 0.910 577 1,401 8.72 9.45 9.85 10.26 11.04 New Mexico 0.909 309 760 8.24 9.13 9.59 9.99 10.77 Georgia 0.909 253 579 8.56 9.30 9.75 10.24 11.09 Kentucky 0.909 219 503 8.67 9.22 9.69 10.23 11.07 Colorado 0.906 255 627 8.98 9.68 9.98 10 45 11.13 Arizona 0.902 287 688 8.56 9.26 9.71 10.17 11.11 Connecticut 0.901 182 412 8.73 9.61 9.99 10.36 11.05 Illinois 0.901 744 1,841 8.70 9.50 9.91 10.32 11 01 Pennsylvania 0.896 724 1,650 8.75 9.44 9.88 10.32 11.16 Alaska 0.896 193 492 8.60 9.43 9.94 10.31 11.01 California 0.888 1,583 4,177 8.41 9.26 9.75 10.28 11.19 New Jersey 0.885 582 1,340 8.76 9.54 9.96 10.35 11.09 Rhode Island 0.880 150 304 8.82 9.42 9.85 10.36 11 32 New York 0.874 1,183 2,702 8 51 9.30 9.77 10 22 11.07 Hawaii 0.866 179 426 8 72 9.54 9.98 10.46 11.10 New Hampshire 0.853 191 407 9 03 9.67 10.06 10.39 10 93 Maryland 0.842 209 432 8.85 9.57 9.96 10.41 11.19 Dist. Of Columbia 0.804 224 384 8.46 9.30 10.00 1062 11.42 21 Table 2. Parameter estimates and corrected population shares Mean proportion (%) of the population by quintile i=l i=2 i=3 i=4 i=5 Iteration (t) PO fit (richest) (poorest) 0 24.682 -2.168 20.00 20.00 2000 20.00 20.00 (3.595) (0.337) 1 18.997 -1.613 25.87 19.43 18.53 18.18 17.99 2 21.210 -1.828 23.64 19.91 19.16 18.78 18.52 (2.806) (0.263) (20.442 -1.753 24.36 19.76 18.95 18.58 18.35 (2.656) (0.249) (20.715 -1.780 24.10 19.81 19.02 18.65 18.41 (2.709) (0.254) ( 20 619 -1.770 24.19 19.79 19.00 18.63 18.39 (2.690) (0.252) 6 20.653 -1.774 24.16 19.80 19.01 18.64 18 40 (2.698) (0.253) 7 20.641 -1.773 24.17 19.80 19.00 18.63 18.40 (2.694) (0 253) 8 20.645 -1.773 24 16 19.80 19.01 18.64 18.40 (2.695) (0.253) 9 20.644 -1.773 24.17 19.80 19.00 18.63 18.40 (2.695) (0.253) 1 0 20.644 -1.773 24.17 19.80 19.00 18.63 18.40 (I 695) (0 253) 22 Table 3: Mean nimome with/without correctdoi for income-dependent co Fractile (ranked by Mean income ($/person/year) income per person) Empirical distribution Corrected distribution Corrected distribution (n=5) (n=6) 0 - 5 1,968 2,068 2,034 5 - 10 3,999 4,199 4,129 10- 15 5,543 5,845 5,745 15-20 6,863 7,198 7,087 20 - 25 8,110 8,570 8,406 25 - 30 9,389 9,941 9,746 30 - 35 10,637 11,308 11,073 35 - 40 11,995 12,829 12,540 40 - 45 13,438 14,391 14,062 45 - 50 14,877 15,876 15,513 50 - 55 16,340 17,604 17,139 55 - 60 18,046 19,579 19,015 60 - 65 19,967 21,783 21,066 65 - 70 22,172 24,433 23,578 70 - 75 24,801 27,627 26,470 75 - 80 28,071 31,811 30,252 80 - 85 32,433 37,476 35,379 85 - 90 38,636 46,740 43,119 90 - 95 49,971 64,246 57,499 95 - 100 94,234 145,466 121,895 21,576 26,454 24,287 23 Figure 1: Pattern of bias for an inverted-U relationship between compliance and income F(y) - F(y) 0 YL Y 'Yu Figure 2: Lorenz curve bias under a monotonic income effect on survey compliance ,S / I - 1I - [I - F(YR)]YR/ ,,-,. 1 - [1 - F(YR/YR ., / F(YP)YPIu ° F(YP) F(YP) F(YR) [(YR) 24 IFAguire 3: Nonm-comIpllaince oddsL siadl zftte-wibe peir capiAtai Aimomne veirages 1.55 1.50 Richest 20% 1.40 O y -0.63x + 7.98 1.35 R2 0.45 1.30 c 1.25 \0 0 tE 1205 ° + % % O@n a a ws1.15 C° o°'0oB° 0.85- O ° O R 01.15 D 0 O O3 0.70 0.6 ci 0 95 Cl~ ~ ~ ~ ~ ~ ~~~~~~~~0 O 0.60-~~~~~~ ~ ~~~~ ,023 O ,3,,0,O8 8 1 .05 O.B5~~~~~~~~~~~~~~o 0vr0 ps alaIcm 0100 0L a 0 0~~~~~~ 0.70Ely -23 30 0.65~~~~~~~~R 00 0.70 8.00 8.25 8.50 8 75 9.00 9 25 9 50 9.75 10.00 10.25 10.50 10.75 11.00 11 25 11.50 11.75 Log average per capita Income 25 Figure 4: Convergence pattern of the slope coefficient (b1) -1.60 -1 65 -1.70 -1.75 -1.80 {E -1.85 c -1.90 -1.95 -2 00 - -2.05 -2.10 -2.15 -2.20 0 1 2 3 4 5 6 7 8 9 10 iteration (t) 26 Figure 5: Convergence pattern of the estimated population shares 0.27 0.26 0.25 * 0.24 0.23 a.,~~~~~~~~~ 0.23 -u-~~~~~~~~~~~~~~~~~~ richest 20% 01 0 1 2 - 60%-80% 2 0 18x40-0 0.217 -4 porst20 0.1 1 67891 iteration (t) 27 Figure 6: Empirical and compliance re-weighted Lorenz curves 1.00 - 0 95 0.90 0.85 - 0.80 0.75 o0.70 E 0o 0.65 a 0.60 3 0.55 - *0.50 0 0.45 o0.40 0 35 E 0.30 0.25- 0.20 0.125 D_ - empirical 0.10 -corrected (n=6) 0.05 - _ corrected (n=5) 000 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 cumulative %of the population 28 ]FIgnire 7: Loweir and uippeir tais of thne lLorenmz cuiirves =ernpirical ---corrected 209 0 0000=1 ogm~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 61~~~~~~~~~~~~~~~~~~~~~~ 29~~~~~~~~~~~~~~ Figure 8: Empirical and compliance-corrected cumulative distributions of income 1.00 0.95........ 0.90 0.85 0.80 0.75 0.70 0.65 r 0.60 , 0.55 °0.5 ; 0.45 o 040 0.35 0.30 0.25 0.20 -em pirical 0.15 -corrected (n=6) 010 - corrected (n=5) 0.05 0 00 - 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 9o,ooo 100,000 110,000 120,000 130,000 140,000 150,000 income per capita In US$ 30 IFMgure 9: Lower segnm¢mt of the cumuRative diisribuntionns of income in FigFuire a 0.30 0.25 0.20 a a. CO 0.15 0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 incoms psr capita in US$ 31 Policy Research Working Paper Series Contact Title Author Date for paper WPS2938 Recurrent Expenditure Requirements Ron Hood December 2002 M Galatis of Capital Projects. Estimation for David Husband 31177 Budget Purposes Fei Yu WPS2939 School Attendance and Child Labor Gladys Lopez-Acevedo December 2002 M Geller in Ecuador 85155 WPS2940 The Potential Demand for an HIV/ Hillegonda Maria Dutilh December 2002 H Sladovich AIDS Vaccine in Brazil Novaes 37698 Expedito J A Luna Mois6s Goldbaum Samuel Kilsztajn Anaclaudia Rossbach Jose de la Rocha Carvalheiro WPS2941 Income Convergence during the Branko Milanovic January 2003 P Sader Disintegration of the World 33902 Economy, 1919-39 WPS2942 Why is Son Preference so Persistent Monica Das Gupta January 2003 M Das Gupta in East and South Asia9 A Cross- Jiang Zhenghua 31983 Country Study of China, India, and the Li Bohua Republic of Korea Xie Zhenming Woojin Chung Bae Hwa-Ok WPS2943 Capital Flows, Country Risk, Norbert Fiess January 2003 R lzquierdo and Contagion 84161 WPS2944 Regulation, Productivity, and Giuseppe Nicoletti January 2003 Social Protection Growth. OECD Evidence Stefano Scarpetta Advisory Service 85267 WPS2945 Micro-Finance and Poverty Evidence Shahidur R Khandker January 2003 D Afzal Using Panel Data from Bangladesh 36335 WPS2946 Rapid Labor Reallocation with a Jan Rutkowski January 2003 J Rutkowski Stagnant Unemployment Pool. The 84569 Puzzle of the Labor Market in Lithuania WPS2947 Tax Systems in Transition Pradeep Mitra January 2003 S Tassew Nicholas Stern 88212 WPS2948 The Impact of Contractual Savings Gregorio Impavido January 2003 P Braxton Institutions on Securities Markets Alberto R Musalem 32720 Thierry Tressel WPS2949 Intersectoral Migration in Southeast Rita Butzer January 2003 P Kokila Asia Evidence from Indonesia, Yair Mundlak 33716 Thailand, and the Philippines Donald F Larson WPS2950 Is the Emerging Nonfarm Market Dominique van de Walle January 2003 H Sladovich Economy the Route Out of Poverty Dorothyjean Cratty 37698 in Vietnam9 WPS2951 Land Allocation in Vietnam's Martin Ravallion January 2003 H Sladovich Agrarian Transition Dominique van de Walle 37698 Policy Research Working Paper Series Contact Title Author Date for paper WPS2952 The Effects of a Fee-Waiver Program Nazmul Chaudhury January 2003 N Chaudhury on Health Care Utilization among the Jeffrey Hammer 84230 Poor Evidence from Armenia Edmundo Murrugarra WPS2953 Health Facility Surveys. An Magnus Lindelow January 2003 H. Sladovich Introduction Adam Wagstaff 37698 WPS2954 Never Too Late to Get Together Bartlomiej Kaminski January 2003 P. Flewitt Again: Turning the Czech and Slovak Beata Smarzynska 32724 Customs Union into a Stepping Stone to EU Integration WPS2955 The Perversity of Preferences: The Qaglar Ozden January 2003 P Flewitt Generalized System of Preferences Eric Reinhardt 32724 and Developing Country Trade Policies, 1976-2000